Skip to content

Aryan1718/resume-normalizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Resume Normalizer Service

A small FastAPI service that converts resumes (PDF, DOCX, or raw text) into a canonical JSON format using a Google Gemini (Generative AI) model. The service extracts text from uploaded files, prompts an LLM to produce a strict JSON resume schema, then normalizes and validates the result with Pydantic models.

Features

  • Accepts resume uploads (PDF, DOCX, TXT) and raw text input
  • Uses pypdf and python-docx for robust text extraction
  • Calls Google Gemini (google-generativeai) to generate structured JSON
  • Normalizes and validates LLM output with Pydantic models
  • Dockerfile included for containerized deployments

Repository Layout

  • Dockerfile - container image build instructions
  • requirements.txt - Python dependencies
  • app/
    • main.py - FastAPI app and route handlers
    • parser.py - file type detection and text extraction helpers
    • llm_client.py - Gemini prompt builder and call wrapper
    • normalizer.py - normalize LLM output and Pydantic validation
    • models.py - Pydantic schema for canonical resume JSON

Quickstart (Development)

Prerequisites

  • Python 3.9+ (this repo uses 3.12 in the Dockerfile)
  • pip
  • (Optional) Docker
  1. Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt
  1. Set the Gemini API key (required for LLM calls)
export GEMINI_API_KEY="your_gemini_api_key"

If you don't have a Gemini key yet and want to run the service locally for testing, consider adding a lightweight mock in app/llm_client.py that returns a sample JSON when the env var is missing (not included by default).

  1. Run the app with Uvicorn
uvicorn app.main:app --reload --port 8000

Open http://localhost:8000/docs for the interactive Swagger UI.

Docker

Build the image:

docker build -t resume-normalizer:latest .

Run the container (forward port 8000 and set the environment variable):

docker run --rm -p 8000:8000 -e GEMINI_API_KEY="$GEMINI_API_KEY" resume-normalizer:latest

Environment variables

  • GEMINI_API_KEY (required) — API key for Google Generative AI (Gemini).

API Reference

All endpoints return a ResumeWrapper object (see schema below) on success.

  • POST /parse-resume-file

    • Content type: multipart/form-data
    • Form fields: file (required) — upload a resume file (.pdf, .docx, .txt)
    • Returns: canonical ResumeWrapper JSON
  • POST /parse-resume-text

    • Content type: application/json
    • Body: { "text": "<resume text>", "user_id": "optional" }
    • Returns: canonical ResumeWrapper JSON
  • GET /health

    • Returns: { "status": "ok" }

Canonical Resume Schema

The LLM is instructed to return JSON in this exact shape. The service will fill missing fields with empty strings or empty lists when necessary.

Top-level shape:

{
  "resume": {
    "header": {
      "name": "string",
      "phone": "string",
      "email": "string",
      "location": "string",
      "links": {
        "portfolio": "string",
        "linkedin": "string",
        "github": "string"
      }
    },
    "summary": "string",
    "education": [
      {
        "school": "string",
        "degree": "string",
        "startDate": "string",
        "endDate": "string",
        "gpa": "string",
        "location": "string"
      }
    ],
    "experience": [
      {
        "title": "string",
        "company": "string",
        "location": "string",
        "startDate": "string",
        "endDate": "string",
        "bullets": ["string", "string"]
      }
    ],
    "projects": [
      {
        "name": "string",
        "stack": "string",
        "year": "string",
        "bullets": ["string", "string"]
      }
    ],
    "skills": {
      "languages": ["string"],
      "frameworks": ["string"],
      "databases": ["string"],
      "cloud": ["string"],
      "concepts": ["string"]
    }
  }
}

Rules used by the prompt:

  • Use exactly these keys.
  • If information is missing, include the field with an empty string or empty list.
  • Do not add extra keys.
  • The service attempts to add missing keys before validating.

Troubleshooting

  • RuntimeError: GEMINI_API_KEY environment variable is not set.

    • Set GEMINI_API_KEY or implement a local mock for development.
  • LLM returns non-JSON or invalid JSON

    • The LLM is instructed to return strict JSON. If parsing fails, inspect logs or llm_client to see the raw output. Implement a safe-extraction heuristic if the model returns leading/trailing text.
  • Pydantic validation errors

    • The normalizer will add missing keys but will not remove unexpected keys. If the LLM includes extra fields, you may need to sanitize the object before validation or configure the Pydantic models to allow/forbid extras as desired.

About

FastAPI microservice that converts resumes (PDF, DOCX, or text) into a validated, canonical JSON schema using Google Gemini, with LLM-driven extraction and Pydantic-based normalization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors