A simple Retrieval-Augmented Generation (RAG) chatbot that lets you upload PDF documents and ask questions about their contents. Built with Flask, OpenAI, and an in-memory vector store.
- Upload a PDF — the app extracts text, splits it into overlapping chunks, and generates embeddings via OpenAI.
- Ask a question — the app embeds your query, finds the most relevant chunks by cosine similarity, and sends them as context to GPT-4o-mini.
- Get a grounded answer — the LLM responds using only the retrieved context.
The Road Not Taken — Robert Frost
Two roads diverged in a yellow wood, And sorry I could not travel both And be one traveler, long I stood And looked down one as far as I could To where it bent in the undergrowth;
Then took the other, as just as fair, And having perhaps the better claim, Because it was grassy and wanted wear; Though as for that the passing there Had worn them really about the same,
And both that morning equally lay In leaves no step had trodden black. Oh, I kept the first for another day! Yet knowing how way leads on to way, I doubted if I should ever come back.
I shall be telling this with a sigh Somewhere ages and ages hence: Two roads diverged in a wood, and I — I took the one less traveled by, And that has made all the difference.
- An OpenAI API key with access to
gpt-4o-miniandtext-embedding-3-small. - Python 3.10+ (for local setup) or Docker (for containerised setup).
Export the key in your terminal before running the app:
export OPENAI_API_KEY=sk-your-key-hereOn Windows (PowerShell):
$env:OPENAI_API_KEY = "sk-your-key-here"# 1. Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # macOS / Linux
# venv\Scripts\activate # Windows
# 2. Install dependencies
pip install -r requirements.txt
# 3. Set your API key (see above)
# 4. Start the app
python app.pyOpen http://localhost:5000 in your browser.
# Build the image
docker build -t rag-chatbot .
# Run the container (pass your API key)
docker run -p 5000:5000 -e OPENAI_API_KEY=sk-your-key-here rag-chatbotOpen http://localhost:5000 in your browser.
- Click Choose PDF and select a PDF file.
- Click Upload — wait for the "Successfully processed" message.
- Type a question in the input bar and press Enter or click Send.
- Upload additional PDFs at any time to expand the knowledge base.
- Click Clear docs to remove all indexed documents and start fresh.
rag-chatbot/
├── app.py # Flask application and RAG logic
├── templates/
│ └── index.html # Chat web interface
├── requirements.txt # Python dependencies
├── Dockerfile # Container definition
└── README.md # This file