A production-grade, fully asynchronous, multi-tenant Retrieval-Augmented Generation (RAG) API. Built with FastAPI, Neon Serverless Postgres (pgvector), and Ollama.
- OAuth2 Password Bearer with JWT access tokens and bcrypt password hashing.
- OTP-based email verification via async SMTP (
aiosmtplib) for registration and password reset. - Every document and vector is scoped to a
user_id— queries are isolated at the database level.
- Deduplication — SHA-256 file fingerprinting prevents redundant embedding computations.
- Non-blocking parsing — CPU-bound PyMuPDF work is offloaded via
asyncio.to_thread. - Chunking — Zero-dependency, O(N) sliding-window algorithm that snaps to natural punctuation boundaries.
- Bulk insert —
asyncpg.executemanyover binary protocol for optimized vector storage.
- HNSW vector search — Cosine similarity via
pgvector(<=>operator), filtered by authenticated user. - Grounded generation — Retrieved chunks are compiled into a system prompt that constrains the LLM to provided context only.
- Streaming — Token-by-token LLM output piped directly into
StreamingResponse.
- 468 tests across unit, integration, and API route layers — 100% line coverage.
- Real database tests — GitHub Actions spins up ephemeral
pgvector/pgvector:pg16containers. No mocking SQL. - Lint gate — Ruff linter + formatter enforced before tests run.
structlogJSON logging with rotating file handler and optional Better Stack cloud aggregation.
| Layer | Technology |
|---|---|
| Framework | FastAPI, Uvicorn, Python 3.12+ |
| Database | Neon PostgreSQL 16, pgvector, asyncpg (raw SQL) |
| AI / LLM | Ollama (mxbai-embed-large, deepseek-r1:8b), httpx |
| Auth | PyJWT, bcrypt, aiosmtplib |
| Parsing | PyMuPDF, python-multipart |
| Validation | Pydantic v2, pydantic-settings |
| Testing | pytest, pytest-asyncio, pytest-cov, pytest-mock |
| CI/CD | GitHub Actions (lint → test with ephemeral pgvector DB) |
| Packaging | uv, Docker (multi-stage, non-root) |
| Observability | structlog, Better Stack |
All /documents and /chat endpoints require a valid Bearer token. Interactive docs available at /docs.
| Category | Method | Endpoint | Description |
|---|---|---|---|
| Auth | POST |
/api/v1/auth/request-otp |
Email a 6-digit registration OTP |
| Auth | POST |
/api/v1/auth/register |
Verify OTP and create account |
| Auth | POST |
/api/v1/auth/login |
Authenticate and issue JWT |
| Auth | POST |
/api/v1/auth/request-reset-otp |
Email a password reset OTP |
| Auth | POST |
/api/v1/auth/reset-password |
Verify OTP and update password |
| Docs | POST |
/api/v1/documents/ingest |
Upload, chunk, embed, and store a PDF |
| Docs | GET |
/api/v1/documents/ |
List user's uploaded files |
| Docs | DELETE |
/api/v1/documents/{file_id} |
Delete file and all associated chunks |
| Chat | POST |
/api/v1/chat/search |
Semantic vector search against user's documents |
| Chat | POST |
/api/v1/chat/completions |
Full RAG — retrieve context and stream LLM response |
| Health | GET |
/health |
Database connectivity check |
src/fastapi_ollama_rag/
├── main.py # Lifespan hooks, health check
├── api/
│ ├── dependencies.py # JWT token validation
│ └── routes/ # Auth, Chat, Documents
├── core/
│ ├── config.py # Pydantic settings (env-driven)
│ ├── database.py # asyncpg connection pool
│ ├── migrations.py # Idempotent SQL schema runner
│ ├── security.py # bcrypt + JWT helpers
│ ├── logger.py # structlog configuration
│ └── sql/ # Raw SQL (schema, queries)
├── models/ # Pydantic request/response models
├── repository/ # Database access layer
└── services/ # Business logic (auth, chunker, embeddings, etc.)
| Decision | Why |
|---|---|
asyncpg raw SQL over SQLAlchemy |
Fastest Python Postgres driver. Direct control over pgvector operators and bulk binary insertions. No ORM overhead. |
| Custom chunker over LangChain | Zero dependencies. Deterministic. Testable. No framework bloat for a simple sliding-window algorithm. |
| Real DB integration tests over mocks | Mocking raw SQL hides syntax errors. Ephemeral pgvector containers catch real failures. |
| Custom JWT/OTP over Auth0 | Full ownership of user data and auth flows. No vendor lock-in. Precise control over multi-tenant schema. |
| Single Postgres over separate vector DB | pgvector unifies relational data and vectors in one ACID-compliant store. No split-brain infrastructure. |
- Docker & Docker Compose
- Ollama
- A Neon Serverless Postgres database (or any Postgres 16+ with
pgvector) - SMTP credentials for OTP emails (e.g., Gmail App Password)
cp .env.example .env# Database
DATABASE_URL="postgres://user:password@ep-your-db.region.aws.neon.tech/neondb?sslmode=require"
# AI / LLM
OLLAMA_BASE_URL="http://host.docker.internal:11434" # Docker → host Ollama
# Security
SECRET_KEY="generate-a-strong-random-key"
# SMTP (any provider — Gmail, SES, SendGrid, etc.)
SMTP_SERVER="smtp.gmail.com"
SMTP_PORT=587
SMTP_USERNAME="you@gmail.com"
SMTP_PASSWORD="your-app-password"Note:
host.docker.internalroutes from the Docker container to the host machine's Ollama instance.
ollama pull mxbai-embed-large
ollama pull deepseek-r1:8bdocker compose up --build -dThe API is available at http://localhost:8000. Migrations run automatically on startup.
uv run pytest tests/ -v --cov=src --cov-report=term-missing