FastAPI RAG System

A production-grade, fully asynchronous, multi-tenant Retrieval-Augmented Generation (RAG) API. Built with FastAPI, Neon Serverless Postgres (pgvector), and Ollama.

System Architecture

Authentication & Multi-Tenancy

OAuth2 Password Bearer with JWT access tokens and bcrypt password hashing.
OTP-based email verification via async SMTP (aiosmtplib) for registration and password reset.
Every document and vector is scoped to a user_id — queries are isolated at the database level.

Document Ingestion

Deduplication — SHA-256 file fingerprinting prevents redundant embedding computations.
Non-blocking parsing — CPU-bound PyMuPDF work is offloaded via asyncio.to_thread.
Chunking — Zero-dependency, O(N) sliding-window algorithm that snaps to natural punctuation boundaries.
Bulk insert — asyncpg.executemany over binary protocol for optimized vector storage.

Retrieval & Generation

HNSW vector search — Cosine similarity via pgvector (<=> operator), filtered by authenticated user.
Grounded generation — Retrieved chunks are compiled into a system prompt that constrains the LLM to provided context only.
Streaming — Token-by-token LLM output piped directly into StreamingResponse.

Testing & CI/CD

468 tests across unit, integration, and API route layers — 100% line coverage.
Real database tests — GitHub Actions spins up ephemeral pgvector/pgvector:pg16 containers. No mocking SQL.
Lint gate — Ruff linter + formatter enforced before tests run.

Observability

structlog JSON logging with rotating file handler and optional Better Stack cloud aggregation.

Tech Stack

Layer	Technology
Framework	FastAPI, Uvicorn, Python 3.12+
Database	Neon PostgreSQL 16, `pgvector`, `asyncpg` (raw SQL)
AI / LLM	Ollama (`mxbai-embed-large`, `deepseek-r1:8b`), `httpx`
Auth	PyJWT, bcrypt, `aiosmtplib`
Parsing	PyMuPDF, `python-multipart`
Validation	Pydantic v2, `pydantic-settings`
Testing	pytest, `pytest-asyncio`, `pytest-cov`, `pytest-mock`
CI/CD	GitHub Actions (lint → test with ephemeral pgvector DB)
Packaging	`uv`, Docker (multi-stage, non-root)
Observability	`structlog`, Better Stack

API Endpoints

All /documents and /chat endpoints require a valid Bearer token. Interactive docs available at /docs.

Category	Method	Endpoint	Description
Auth	`POST`	`/api/v1/auth/request-otp`	Email a 6-digit registration OTP
Auth	`POST`	`/api/v1/auth/register`	Verify OTP and create account
Auth	`POST`	`/api/v1/auth/login`	Authenticate and issue JWT
Auth	`POST`	`/api/v1/auth/request-reset-otp`	Email a password reset OTP
Auth	`POST`	`/api/v1/auth/reset-password`	Verify OTP and update password
Docs	`POST`	`/api/v1/documents/ingest`	Upload, chunk, embed, and store a PDF
Docs	`GET`	`/api/v1/documents/`	List user's uploaded files
Docs	`DELETE`	`/api/v1/documents/{file_id}`	Delete file and all associated chunks
Chat	`POST`	`/api/v1/chat/search`	Semantic vector search against user's documents
Chat	`POST`	`/api/v1/chat/completions`	Full RAG — retrieve context and stream LLM response
Health	`GET`	`/health`	Database connectivity check

Project Structure

src/fastapi_ollama_rag/
├── main.py                  # Lifespan hooks, health check
├── api/
│   ├── dependencies.py      # JWT token validation
│   └── routes/              # Auth, Chat, Documents
├── core/
│   ├── config.py            # Pydantic settings (env-driven)
│   ├── database.py          # asyncpg connection pool
│   ├── migrations.py        # Idempotent SQL schema runner
│   ├── security.py          # bcrypt + JWT helpers
│   ├── logger.py            # structlog configuration
│   └── sql/                 # Raw SQL (schema, queries)
├── models/                  # Pydantic request/response models
├── repository/              # Database access layer
└── services/                # Business logic (auth, chunker, embeddings, etc.)

Design Decisions

Decision	Why
`asyncpg` raw SQL over SQLAlchemy	Fastest Python Postgres driver. Direct control over `pgvector` operators and bulk binary insertions. No ORM overhead.
Custom chunker over LangChain	Zero dependencies. Deterministic. Testable. No framework bloat for a simple sliding-window algorithm.
Real DB integration tests over mocks	Mocking raw SQL hides syntax errors. Ephemeral `pgvector` containers catch real failures.
Custom JWT/OTP over Auth0	Full ownership of user data and auth flows. No vendor lock-in. Precise control over multi-tenant schema.
Single Postgres over separate vector DB	`pgvector` unifies relational data and vectors in one ACID-compliant store. No split-brain infrastructure.

Getting Started

Prerequisites

Docker & Docker Compose
Ollama
A Neon Serverless Postgres database (or any Postgres 16+ with pgvector)
SMTP credentials for OTP emails (e.g., Gmail App Password)

1. Configure Environment

cp .env.example .env

# Database
DATABASE_URL="postgres://user:password@ep-your-db.region.aws.neon.tech/neondb?sslmode=require"

# AI / LLM
OLLAMA_BASE_URL="http://host.docker.internal:11434"  # Docker → host Ollama

# Security
SECRET_KEY="generate-a-strong-random-key"

# SMTP (any provider — Gmail, SES, SendGrid, etc.)
SMTP_SERVER="smtp.gmail.com"
SMTP_PORT=587
SMTP_USERNAME="you@gmail.com"
SMTP_PASSWORD="your-app-password"

Note: host.docker.internal routes from the Docker container to the host machine's Ollama instance.

2. Pull AI Models

ollama pull mxbai-embed-large
ollama pull deepseek-r1:8b

3. Run

docker compose up --build -d

The API is available at http://localhost:8000. Migrations run automatically on startup.

4. Run Tests

uv run pytest tests/ -v --cov=src --cov-report=term-missing

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
docs		docs
src/fastapi_ollama_rag		src/fastapi_ollama_rag
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
openapi.json		openapi.json
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastAPI RAG System

System Architecture

Authentication & Multi-Tenancy

Document Ingestion

Retrieval & Generation

Testing & CI/CD

Observability

Tech Stack

API Endpoints

Project Structure

Design Decisions

Getting Started

Prerequisites

1. Configure Environment

2. Pull AI Models

3. Run

4. Run Tests

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FastAPI RAG System

System Architecture

Authentication & Multi-Tenancy

Document Ingestion

Retrieval & Generation

Testing & CI/CD

Observability

Tech Stack

API Endpoints

Project Structure

Design Decisions

Getting Started

Prerequisites

1. Configure Environment

2. Pull AI Models

3. Run

4. Run Tests

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages