<<<<<<< HEAD
=======
A complete end-to-end web application that allows users to chat with an AI assistant restricted only to the documents they upload. Built with Angular frontend, Spring Boot backend, and Gemini Pro API.
- Document Upload: Support for PDF, DOCX, and TXT files
- Text Extraction: Automatic text extraction from uploaded documents
- Vector Search: Document chunks are embedded and stored for semantic search
- AI Chat: Chat interface that answers only from uploaded documents
- Smart Responses:
- Answers from documents when relevant
- "Out of scope" for unrelated questions
- "No documents available" when no documents are uploaded
- Highlights conflicts when documents contain conflicting information
- Frontend: Angular 17
- Backend: Spring Boot 3.2.0
- Database: H2 (in-memory)
- AI: Google Gemini Pro API
- Document Processing: Apache PDFBox, Apache POI
- Vector Storage: Custom implementation with cosine similarity
- Java 17 or higher
- Node.js 18 or higher
- npm or yarn
- Google Gemini API key (optional - app works without it using dummy responses)
cd backend
./mvnw spring-boot:runThe backend will start on http://localhost:8080
cd frontend
npm install
npm startThe frontend will start on http://localhost:4200
./setup-api-key.sh-
Get your API key:
- Go to: https://makersuite.google.com/app/apikey
- Create a new API key
- Copy the API key
-
Set the API key (choose one method):
Option A - Environment Variable (Recommended):
export GEMINI_API_KEY=your_api_key_hereOption B - Add to application.properties:
gemini.api.key=your_api_key_hereOption C - Set temporarily for testing:
GEMINI_API_KEY=your_api_key_here ./start-backend.sh
-
Restart the backend after setting the API key:
cd backend && mvn spring-boot:run
- Document Upload: Users upload PDF, DOCX, or TXT files through the web interface
- Text Extraction: Backend extracts text from documents using Apache PDFBox and POI
- Chunking: Text is split into manageable chunks (1000 characters with 200 character overlap)
- Embedding: Each chunk is converted to a vector embedding using Gemini's embedding API
- Storage: Embeddings are stored in the database with the original text
- Query Processing: When users ask questions:
- Query is converted to an embedding
- Similar chunks are found using cosine similarity
- Relevant context is sent to Gemini Pro for response generation
- Response is returned with source information
POST /api/documents/upload- Upload a documentGET /api/documents- Get all documentsDELETE /api/documents/{id}- Delete a document
POST /api/chat/message- Send a chat message
# Server
server.port=8080
# Database
spring.datasource.url=jdbc:h2:mem:testdb
spring.datasource.username=sa
spring.datasource.password=password
# File upload
spring.servlet.multipart.max-file-size=10MB
spring.servlet.multipart.max-request-size=10MB
# Gemini API
gemini.api.key=${GEMINI_API_KEY:}cd backend
./mvnw spring-boot:runcd frontend
npm startBackend:
cd backend
./mvnw clean package
java -jar target/document-chat-backend-0.0.1-SNAPSHOT.jarFrontend:
cd frontend
npm run build- Port already in use: Change the port in
application.properties - CORS errors: Ensure the frontend URL is correct in the CORS configuration
- File upload fails: Check file size limits and supported formats
- No AI responses or "Out of scope" for valid questions:
- Most common cause: Missing or invalid Gemini API key
- Run
./setup-api-key.shto check your API key setup - Verify the API key is set:
echo $GEMINI_API_KEY - Restart the backend after setting the API key
- Poor response quality: The dummy embedding system has limitations; use a real Gemini API key for better results
Backend logs are available in the console. For more detailed logging, modify logback-spring.xml.
This project is for educational purposes. Please ensure you comply with Google's Gemini API terms of service when using the AI features.
0822860 (RAG Chatbot)