Resume Normalizer Service

A small FastAPI service that converts resumes (PDF, DOCX, or raw text) into a canonical JSON format using a Google Gemini (Generative AI) model. The service extracts text from uploaded files, prompts an LLM to produce a strict JSON resume schema, then normalizes and validates the result with Pydantic models.

Features

Accepts resume uploads (PDF, DOCX, TXT) and raw text input
Uses pypdf and python-docx for robust text extraction
Calls Google Gemini (google-generativeai) to generate structured JSON
Normalizes and validates LLM output with Pydantic models
Dockerfile included for containerized deployments

Repository Layout

Dockerfile - container image build instructions
requirements.txt - Python dependencies
app/
- main.py - FastAPI app and route handlers
- parser.py - file type detection and text extraction helpers
- llm_client.py - Gemini prompt builder and call wrapper
- normalizer.py - normalize LLM output and Pydantic validation
- models.py - Pydantic schema for canonical resume JSON

Quickstart (Development)

Prerequisites

Python 3.9+ (this repo uses 3.12 in the Dockerfile)
pip
(Optional) Docker

Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate

Install dependencies

pip install -r requirements.txt

Set the Gemini API key (required for LLM calls)

export GEMINI_API_KEY="your_gemini_api_key"

If you don't have a Gemini key yet and want to run the service locally for testing, consider adding a lightweight mock in app/llm_client.py that returns a sample JSON when the env var is missing (not included by default).

Run the app with Uvicorn

uvicorn app.main:app --reload --port 8000

Open http://localhost:8000/docs for the interactive Swagger UI.

Docker

Build the image:

docker build -t resume-normalizer:latest .

Run the container (forward port 8000 and set the environment variable):

docker run --rm -p 8000:8000 -e GEMINI_API_KEY="$GEMINI_API_KEY" resume-normalizer:latest

Environment variables

GEMINI_API_KEY (required) — API key for Google Generative AI (Gemini).

API Reference

All endpoints return a ResumeWrapper object (see schema below) on success.

POST /parse-resume-file
- Content type: multipart/form-data
- Form fields: file (required) — upload a resume file (.pdf, .docx, .txt)
- Returns: canonical ResumeWrapper JSON
POST /parse-resume-text
- Content type: application/json
- Body: { "text": "<resume text>", "user_id": "optional" }
- Returns: canonical ResumeWrapper JSON
GET /health
- Returns: { "status": "ok" }

Canonical Resume Schema

The LLM is instructed to return JSON in this exact shape. The service will fill missing fields with empty strings or empty lists when necessary.

Top-level shape:

{
  "resume": {
    "header": {
      "name": "string",
      "phone": "string",
      "email": "string",
      "location": "string",
      "links": {
        "portfolio": "string",
        "linkedin": "string",
        "github": "string"
      }
    },
    "summary": "string",
    "education": [
      {
        "school": "string",
        "degree": "string",
        "startDate": "string",
        "endDate": "string",
        "gpa": "string",
        "location": "string"
      }
    ],
    "experience": [
      {
        "title": "string",
        "company": "string",
        "location": "string",
        "startDate": "string",
        "endDate": "string",
        "bullets": ["string", "string"]
      }
    ],
    "projects": [
      {
        "name": "string",
        "stack": "string",
        "year": "string",
        "bullets": ["string", "string"]
      }
    ],
    "skills": {
      "languages": ["string"],
      "frameworks": ["string"],
      "databases": ["string"],
      "cloud": ["string"],
      "concepts": ["string"]
    }
  }
}

Rules used by the prompt:

Use exactly these keys.
If information is missing, include the field with an empty string or empty list.
Do not add extra keys.
The service attempts to add missing keys before validating.

Troubleshooting

RuntimeError: GEMINI_API_KEY environment variable is not set.
- Set GEMINI_API_KEY or implement a local mock for development.
LLM returns non-JSON or invalid JSON
- The LLM is instructed to return strict JSON. If parsing fails, inspect logs or llm_client to see the raw output. Implement a safe-extraction heuristic if the model returns leading/trailing text.
Pydantic validation errors
- The normalizer will add missing keys but will not remove unexpected keys. If the LLM includes extra fields, you may need to sanitize the object before validation or configure the Pydantic models to allow/forbid extras as desired.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Resume Normalizer Service

Features

Repository Layout

Quickstart (Development)

Docker

Environment variables

API Reference

Canonical Resume Schema

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Resume Normalizer Service

Features

Repository Layout

Quickstart (Development)

Docker

Environment variables

API Reference

Canonical Resume Schema

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages