langchain-doc-summarizer

FastAPI pipeline for LangChain doc ingestion & summarization

1. Overview

langchain-doc-summarizer is a lightweight, production‑ready API that ingests documents (PDF, DOCX, TXT, etc.), processes them with LangChain’s powerful retrieval and embedding tools, and returns concise summaries. The service can be used as a microservice inside larger applications or run locally for quick experimentation.

2. Features

FastAPI backend – Fast, async request handling with automatic OpenAPI docs.
LangChain integration – Uses LangChain’s loaders, embeddings, and chain logic.
File ingestion pipeline – Supports PDF, DOCX, TXT and CSV via app/utils/file_handler.py.
Summarization service – Generates summaries using a pre‑trained LLM (OpenAI/Anthropic) through app/services/summarizer.py.
Asynchronous task queue – Celery worker (tasks/celery_worker.py) for long‑running summarizations.
Auth middleware – Simple token‑based authentication in middlewares/auth_middleware.py.
CLI helper – cli/cli.py to seed data and run the server.
Unit tests – Tests for document processing and summarization logic.

3. Tech Stack

Layer	Technology
API	FastAPI, Uvicorn
Background Jobs	Celery + Redis
LLM & Retrieval	LangChain (OpenAI / Anthropic)
Data Storage	Chroma DB (in‑memory by default)
Testing	Pytest
Packaging	Poetry

4. Installation

# Clone the repository
git clone https://github.com/yourorg/langchain-doc-summarizer.git
cd langchain-doc-summarizer

# Create virtual environment & install dependencies
poetry install

# Set required environment variables (example)
export OPENAI_API_KEY="sk-..."
export REDIS_URL="redis://localhost:6379"

5. Running the Service

# Start Redis locally or point to an existing instance
docker run -p 6379:6379 redis

# Launch Celery worker (in a separate terminal)
poetry run celery -A tasks.celery_worker worker --loglevel=info

# Run FastAPI server
poetry run uvicorn app.main:app --reload

Open the Swagger UI at http://localhost:8000/docs to try out endpoints.

6. Usage Examples

Upload a Document & Get Summary (synchronous)

curl -X POST "http://localhost:8000/api/v1/summarize" \
     -H "Authorization: Bearer YOUR_TOKEN" \
     -F "file=@/path/to/report.pdf"

Response:

{
  "summary": "The report highlights key findings about market trends..."
}

Upload a Document & Get Summary (asynchronous)

curl -X POST "http://localhost:8000/api/v1/summarize_async" \
     -H "Authorization: Bearer YOUR_TOKEN" \
     -F "file=@/path/to/report.pdf"

Response:

{
  "task_id": "123e4567-e89b-12d3-a456-426614174000",
  "status_url": "/api/v1/tasks/123e4567-e89b-12d3-a456-426614174000/status"
}

Poll the status endpoint until completed:

curl -X GET "http://localhost:8000/api/v1/tasks/123e4567-e89b-12d3-a456-426614174000/status" \
     -H "Authorization: Bearer YOUR_TOKEN"

7. API Endpoints

Method	Path	Description
POST	`/api/v1/summarize`	Upload file and receive immediate summary
POST	`/api/v1/summarize_async`	Upload file, returns task ID for background processing
GET	`/api/v1/tasks/{task_id}/status`	Get status of async summarization
DELETE	`/api/v1/files/{file_id}`	Remove stored file and embeddings

All endpoints require an Authorization: Bearer <token> header.

8. CLI Commands

# Seed sample documents into the vector store
poetry run python cli/cli.py seed

# Run the API server (alias for uvicorn)
poetry run python cli/cli.py serve

See cli/cli.py for additional options.

9. References and Resources

10. Contributing

Fork the repository and create a feature branch (git checkout -b feature/<name>).
Run tests locally: poetry run pytest.
Submit a pull request with clear description of changes.

All contributions must follow PEP‑8 coding style and include unit tests where applicable.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
app		app
cli		cli
scripts		scripts
tasks		tasks
tests		tests
workers		workers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

langchain-doc-summarizer

1. Overview

2. Features

3. Tech Stack

4. Installation

5. Running the Service

6. Usage Examples

Upload a Document & Get Summary (synchronous)

Upload a Document & Get Summary (asynchronous)

7. API Endpoints

8. CLI Commands

9. References and Resources

10. Contributing

11. License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

langchain-doc-summarizer

1. Overview

2. Features

3. Tech Stack

4. Installation

5. Running the Service

6. Usage Examples

Upload a Document & Get Summary (synchronous)

Upload a Document & Get Summary (asynchronous)

7. API Endpoints

8. CLI Commands

9. References and Resources

10. Contributing

11. License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages