The "Agentic RAG using CrewAI" project is a Retrieval-Augmented Generation (RAG) system that leverages the CrewAI framework to orchestrate autonomous AI agents for answering user queries. It combines local document retrieval with online search capabilities, using a modular agent-based approach. The system processes uploaded documents (PDF or TXT), retrieves relevant information from them, and falls back to an online search if the information isn’t found locally. Responses are generated using a local language model (via Ollama), and users are informed whether the answer came from the document or the web.
- Document Processing: Accepts
.txtand.pdffiles, splits them into chunks, and indexes them using FAISS for efficient retrieval. - Agentic Workflow: Uses two CrewAI agents:
- Document Retriever: Searches the uploaded document for relevant information.
- Online Searcher: Performs a web search via the Firecrawl API if the document lacks relevant data.
- Conditional Search: Online searches are triggered only if the document doesn’t contain relevant information.
- Source Attribution: Informs users whether the response came from the document or an online search.
- Local LLM: Uses Ollama to run a local language model (e.g., Llama3) for response generation.
- Streamlit UI: Provides a user-friendly interface for uploading documents and interacting via a chat system.
The project aims to demonstrate an agentic RAG system where AI agents collaborate to provide accurate, context-aware responses. It’s designed for users who want to query private documents locally while supplementing with online data when needed, all without relying on external API-based LLMs.
- Operating System: Linux or macOS (Ollama isn’t natively supported on Windows without WSL).
- Python: Version 3.8 or higher.
- Hardware: Sufficient RAM (8GB+ recommended) for running local LLMs via Ollama.
-
Clone the Repository:
git clone https://github.com/Mr-PU/agentic-RAG-using-crewAI.git cd agentic-RAG-using-crewAI -
Install Python Dependencies:
- Create a
requirements.txtfile with the following content (based on your code):faiss-cpu numpy streamlit requests langchain sentence-transformers crewai ollama pypdf python-dotenv - Install the dependencies:
pip install -r requirements.txt
- Create a
-
Set Up Environment Variables:
- Create a
.envfile in the project root:touch .env
- Add the following variables:
FIRECRAWL_API_KEY=your-firecrawl-api-key MODEL_NAME=ollama/llama3 - Replace
your-firecrawl-api-keywith your actual Firecrawl API key (get it from firecrawl.dev).
- Create a
-
Install and Configure Ollama:
- Install Ollama:
- On Linux/macOS, run:
curl -fsSL https://ollama.com/install.sh | sh - This installs Ollama as a service.
- On Linux/macOS, run:
- Pull the Llama3 Model:
- Download the
llama3model (or another model specified in.env):ollama pull llama3
- Verify it’s installed:
ollama list
- Download the
- Start Ollama Server:
- Run the server in a separate terminal:
ollama serve
- It runs on
localhost:11434by default.
- Run the server in a separate terminal:
- Install Ollama:
-
Run the Application:
- Start the Streamlit app:
streamlit run app.py
- Open your browser to
http://localhost:8501to interact with the UI.
- Start the Streamlit app:
Ollama is a platform for running large language models (LLMs) locally on your machine. It provides a simple interface to download, manage, and query models like Llama3, Mistral, etc., without requiring cloud-based APIs.
- Model Hosting:
- Ollama hosts the
llama3model (or another specified in.env) locally. Theollama pull llama3command downloads it to your system.
- Ollama hosts the
- Server Operation:
- The
ollama servecommand starts a local server atlocalhost:11434, exposing an API for model inference.
- The
- Python Integration:
- The
ollamaPython package (import ollama) acts as a client to communicate with this server. Theollama.generatefunction sends prompts to the model and retrieves responses.
- The
- Role in the Project:
- Ollama powers the
generate_response_with_llama3function, generating natural language answers based on retrieved context (from documents or online search).
- Ollama powers the
- Privacy: Keeps data local, avoiding external API calls.
- Cost: Free to use with open-source models.
- Customization: Allows tweaking model parameters or using custom models.
The script (app.py) is organized into several functions, each handling a specific part of the RAG pipeline. Here’s a detailed breakdown:
-
Imports and Setup:
- Libraries like
faiss,numpy,streamlit,requests,langchain,sentence_transformers,crewai,ollama, anddotenvare imported. - Environment variables are loaded with
load_dotenv(). - The Sentence Transformer model (
all-MiniLM-L6-v2) is initialized for embeddings. - Streamlit UI is set up with
st.title.
Key Lines:
from dotenv import load_dotenv load_dotenv() embedding_model = SentenceTransformer('all-MiniLM-L6-v2') MODEL_NAME = os.getenv("MODEL_NAME", "ollama/llama3")
- Libraries like
-
Document Processing:
-
load_and_process_documents(file_path):- Determines file type (
.txtor.pdf) and usesTextLoaderorPyPDFLoader. - Splits text into 500-character chunks with 100-character overlap using
RecursiveCharacterTextSplitter. - Returns a list of document chunks.
- Determines file type (
-
create_faiss_index(texts):- Encodes document chunks into embeddings using the Sentence Transformer.
- Creates a FAISS
IndexIVFFlatindex with 100 clusters for fast similarity search. - Trains and adds embeddings to the index.
Key Logic:
embeddings = embedding_model.encode([text.page_content for text in texts]) index = faiss.IndexIVFFlat(quantizer, dimension, min(100, len(embeddings))) index.train(np.array(embeddings)) index.add(np.array(embeddings))
-
-
Retrieval Functions:
-
search_faiss_index(query, index, texts, k=3, threshold=0.7):- Encodes the query into an embedding.
- Searches the FAISS index for the top 3 similar chunks.
- Filters results with a distance threshold of 0.7 (lower is more relevant).
- Returns relevant text chunks.
-
search_online(query):- Makes a POST request to the Firecrawl API with the query.
- Returns the content of the first result or an error message.
Key Logic:
query_embedding = embedding_model.encode([query])[0] distances, indices = index.search(np.array([query_embedding]), k) if distance < threshold: relevant_results.append(texts[i].page_content)
-
-
CrewAI Agents:
create_crewai_agents():- Defines two agents:
- Retriever Agent: Searches the document database.
- Online Searcher Agent: Searches the web if needed.
- Both use the model specified in
MODEL_NAME(from.env).
- Defines two agents:
Key Lines:
retriever_agent = Agent(role="Document Retriever", ..., llm=MODEL_NAME) online_searcher_agent = Agent(role="Online Searcher", ..., llm=MODEL_NAME)
-
Response Generation:
generate_response_with_llama3(query, context):- Combines the query and context into a prompt.
- Sends it to Ollama’s
llama3model (or the specified model) for generation. - Returns the response.
Key Logic:
prompt = f"Query: {query}\n\nContext: {context}\n\nAnswer:" response = ollama.generate(model="llama3", prompt=prompt)
-
Query Handling:
handle_query(query, index, texts):- Creates agents and defines tasks for document retrieval and online search.
- Executes the retrieval task first using
retriever_crew. - Checks if
search_faiss_indexfinds relevant chunks:- If yes, uses document chunks as context and sets source to "Retrieved from document".
- If no, runs
online_crewand sets source to "Retrieved from online search".
- Generates a response with Llama3 and appends the source.
Key Logic:
relevant_chunks = search_faiss_index(query, index, texts) if relevant_chunks: context = "\n".join(relevant_chunks) source = "Retrieved from document" else: online_result = online_crew.kickoff() context = online_result source = "Retrieved from online search" full_response = f"{response}\n\n**Source:** {source}"
-
Streamlit UI:
main():- Sidebar: Uploads a
.txtor.pdffile, processes it, and shows a preview. - Chat Interface: Displays chat history and accepts user queries.
- On query submission, calls
handle_queryand updates the chat with the response.
- Sidebar: Uploads a
Key Logic:
if uploaded_file: texts = load_and_process_documents(file_path) index, texts = create_faiss_index(texts) query = st.chat_input("Enter your query:") if query: response = handle_query(query, index, texts) st.session_state.chat_history.append({"role": "assistant", "content": response})
- User Uploads Document: Uploaded file is saved to
/tmp, processed into chunks, and indexed with FAISS. - User Enters Query: Query is sent to
handle_query. - Document Retrieval: Retriever agent searches the FAISS index.
- Conditional Online Search: If no relevant info is found, the online searcher queries Firecrawl.
- Response Generation: Context (document or online) is fed to Llama3 via Ollama, and the response is returned with source info.
- UI Update: Response is displayed in the chat with source attribution.
This project showcases a robust agentic RAG system using CrewAI, FAISS, and Ollama. It balances local document retrieval with online search, powered by a local LLM for privacy and cost-efficiency. The Streamlit UI makes it accessible, while the source attribution enhances transparency. To extend it, you could add support for more file types, tune the FAISS threshold, or integrate additional tools for the agents.
Let me know if you need clarification or help with specific parts!
