A multi-PDF conversational chatbot powered by Retrieval-Augmented Generation (RAG), running entirely on your local machine with Ollama and LangChain.
Upload one or more PDF documents, and the chatbot will answer questions grounded in their content — with source citations, semantic answer caching, and real-time streaming.
- Multi-PDF ingestion — upload and query several documents at once.
- Local LLM inference — no API keys, no cloud services; everything runs on-device via Ollama.
- Streaming responses — tokens are rendered in real time for a responsive chat experience.
- Semantic answer cache — repeated or similar questions are served instantly from an embedding-based cache with configurable similarity threshold.
- Multi-question detection — compound questions are automatically split and answered individually.
- Source citations — every answer references the originating document and page number.
- Debug mode — toggle an on-screen diagnostic panel for retrieval timing, similarity scores, and processing details.
| Requirement | Details |
|---|---|
| Python | 3.10 or later |
| Ollama | Installed and running locally — see ollama.ai |
| LLM model | Pulled into Ollama (default: phi4-mini) |
| Embedding model | Pulled into Ollama (default: all-minilm:l6-v2) |
Pull the required models before first use:
ollama pull phi4-mini
ollama pull all-minilm:l6-v2-
Clone the repository:
git clone https://github.com/PenSul/RAG-Chatbot.git cd rag-chatbot -
Create and activate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # Linux / macOS venv\Scripts\activate # Windows
-
Install the package in editable mode:
pip install -e .This installs all dependencies listed in
pyproject.tomland makes therag_chatbotpackage importable.
-
Make sure Ollama is running (e.g.
ollama servein a separate terminal). -
Start the application:
streamlit run src/rag_chatbot/app.py
-
Open the URL printed to the terminal (typically
http://localhost:8501). -
Upload one or more PDFs via the sidebar, click Process PDFs, and start chatting.
src/rag_chatbot/
├── __init__.py # Package metadata
├── app.py # Streamlit UI entry point
├── cache_manager.py # Semantic question-answer cache (disk + in-memory)
├── config.py # Centralised constants and prompt templates
├── conversation.py # LangChain conversational chain setup
├── document_processor.py # PDF loading, chunking, and vector-store creation
├── models.py # Cached Ollama LLM and embedding resources
├── question_parser.py # Multi-question detection and text cleaning
├── response_processor.py # Post-processing and citation formatting
├── session_state.py # Streamlit session-state initialisation
└── stream_handler.py # Token-by-token streaming callback
All tunable parameters live in src/rag_chatbot/config.py, including model names, chunking sizes, cache paths, similarity thresholds, and prompt templates. Modify that single file to adapt the chatbot to different models or retrieval strategies.
This project is licensed under the MIT License.