GitHub - daniau23/research_assistant: Use of LLMs as a research assistant, capable of not only summarising documents but a broader range of capabilities

Research Assistant

The project leverages on Large Language Models (LLMs) and prompt engineering to create a research assistant that gives answers to your question based on what is available in the vector data base store on your deeplake account.

How the project goes

All files can only work after installing all dependencies in the environment.yml file
The notebook folder contains the jupyter notebook file for testing the project as a whole and for experimenting
The vector_database_creation.py file is for creating the vector database resource for the LLM
The rag_research_assistant_main.py file is the main driver code for the research assistant
The initial data resources for the database creation can be found in the research_articles.zip file

Replicating this project

Before installing the dependencies in the environment.yml file. Kindly do the following first

Download and install Anaconda
- Once Conda is installed, open your CMD and run the following command C:/Users/your_system_name/anaconda3/Scripts/activate
- Should see something like '(anaconda3)'C:\Users\your_system_name\Desktop\> as an output in your CMD
  
  NB: Do not close the CMD terminal, would be needed later on
Sigup for Cohere Cohere
- Once your account is created, navigate to API keys in your profile and create Trial Cohere API key. BE SURE TO COPY IT
Sigup for Active Loop (Your vector database) Active Loop
- Once your account is created, navigate to API tokens in your profile and create your API token. BE SURE TO COPY IT
Sign up for Hugging face (Access to models) Huggingface
- Once your account is created, navigate to access tokens and create an access token of read only. BE SURE TO COPY YOUR ACCESS TOKEN

Creating your virtual environment

Navigate to your desktop and create a new folder called research_assistant and paste the environment.yml file into the folder
On your cmd navigate into the research_assistant folder using cd research_assistant
Run conda env create -f environment.yml -p ../research_assistant/rag on your cmd
Run conda env list on your cmd to list all environments created using Anaconda
Run conda activate C:\Users\your_system_name\Desktop\research_assistant\rag on your cmd to activate the environment
- Should see something like '(rag)'C:\Users\your_system_name\Desktop\research_assistant> as an output in your CMD
Run conda list on your cmd to check if all dependencies have been installed

Running your research assistant

Paste all your tokens in the .env file
Activate your conda environment as previously shown
- '(anaconda3)'C:\Users\your_system_name\Desktop\>
- conda activate C:\Users\your_system_name\Desktop\research_assistant\rag
Navigate to the folder of your project; research_assistant using cd research_assistant
Navigate to the folder of your project; vector_base_creation using cd vector_base_creation
Run python vector_database_creation.py to create your vector database
Navigate to the folder of your project; rag_research_assistant using cd rag_research_assistant
Run python rag_research_assistant_main.py to run your research assistant.

For example prompts, refer to prompts.md

Example Usage

When vector_database_creation.py is ran, the following output is given when all criteria are met

When `rag_research_assistant_main.py is ran, the following example output is given

Question

Why did Mehedi Tajrian analyse child development and what was the best classifier?

Answer

Mehedi Tajrian analyzed child development due to:
    - The rapid spread of misinformation online complicating accurate decision-making, especially for parents.
    - The lack of research into distinguishing myths and facts about child development using text mining and classification models.
    - The potential risks of inaccurate information on child treatment and development.
    - To provide valuable insights for making informed decisions, thus aiding parents in handling misinformation.
    -To shed light on myths around child development and aid in making informed decisions. These include several stages, including data pre-processing through text mining techniques, and analysis with six traditional machine learning classifiers and one deep learning model using two feature extraction techniques.
    -The best performing classifier is the Logistic Regression (LR) model with a 90% accuracy rate. The model also stands out for its speed and efficiency, with very low testing times per statement, and demonstrated robust performance on both k-fold and leave-one-out cross-validation.
Source(s):
- Title: Analysis of child development facts and myths using text mining techniques and classification models, Page: 1
- Title: Analysis of child development facts and myths using text mining techniques and classification models, Page: 15
- Title: Analysis of child development facts and myths using text mining techniques and classification models, Page: 2

Here is the Publication on;

Issues faced:

I did experience an issue from the huggingface platform but was solved thanks to the open source community! Highly grateful to you all!

Happy prompting and may the RAG be with you young JEDI!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research Assistant

How the project goes

Replicating this project

Creating your virtual environment

Running your research assistant

Example Usage

Issues faced:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
img		img
notebook		notebook
prompts		prompts
rag_research_assistant		rag_research_assistant
vector_base_creation		vector_base_creation
.env		.env
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
research_articles.zip		research_articles.zip

Folders and files

Latest commit

History

Repository files navigation

Research Assistant

How the project goes

Replicating this project

Creating your virtual environment

Running your research assistant

Example Usage

Issues faced:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages