A small FastAPI service that converts resumes (PDF, DOCX, or raw text) into a canonical JSON format using a Google Gemini (Generative AI) model. The service extracts text from uploaded files, prompts an LLM to produce a strict JSON resume schema, then normalizes and validates the result with Pydantic models.
- Accepts resume uploads (PDF, DOCX, TXT) and raw text input
- Uses
pypdfandpython-docxfor robust text extraction - Calls Google Gemini (
google-generativeai) to generate structured JSON - Normalizes and validates LLM output with Pydantic models
- Dockerfile included for containerized deployments
Dockerfile- container image build instructionsrequirements.txt- Python dependenciesapp/main.py- FastAPI app and route handlersparser.py- file type detection and text extraction helpersllm_client.py- Gemini prompt builder and call wrappernormalizer.py- normalize LLM output and Pydantic validationmodels.py- Pydantic schema for canonical resume JSON
Prerequisites
- Python 3.9+ (this repo uses 3.12 in the Dockerfile)
- pip
- (Optional) Docker
- Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate- Install dependencies
pip install -r requirements.txt- Set the Gemini API key (required for LLM calls)
export GEMINI_API_KEY="your_gemini_api_key"If you don't have a Gemini key yet and want to run the service locally for testing, consider adding a lightweight mock in app/llm_client.py that returns a sample JSON when the env var is missing (not included by default).
- Run the app with Uvicorn
uvicorn app.main:app --reload --port 8000Open http://localhost:8000/docs for the interactive Swagger UI.
Build the image:
docker build -t resume-normalizer:latest .Run the container (forward port 8000 and set the environment variable):
docker run --rm -p 8000:8000 -e GEMINI_API_KEY="$GEMINI_API_KEY" resume-normalizer:latestGEMINI_API_KEY(required) — API key for Google Generative AI (Gemini).
All endpoints return a ResumeWrapper object (see schema below) on success.
-
POST
/parse-resume-file- Content type:
multipart/form-data - Form fields:
file(required) — upload a resume file (.pdf,.docx,.txt) - Returns: canonical
ResumeWrapperJSON
- Content type:
-
POST
/parse-resume-text- Content type:
application/json - Body:
{ "text": "<resume text>", "user_id": "optional" } - Returns: canonical
ResumeWrapperJSON
- Content type:
-
GET
/health- Returns:
{ "status": "ok" }
- Returns:
The LLM is instructed to return JSON in this exact shape. The service will fill missing fields with empty strings or empty lists when necessary.
Top-level shape:
{
"resume": {
"header": {
"name": "string",
"phone": "string",
"email": "string",
"location": "string",
"links": {
"portfolio": "string",
"linkedin": "string",
"github": "string"
}
},
"summary": "string",
"education": [
{
"school": "string",
"degree": "string",
"startDate": "string",
"endDate": "string",
"gpa": "string",
"location": "string"
}
],
"experience": [
{
"title": "string",
"company": "string",
"location": "string",
"startDate": "string",
"endDate": "string",
"bullets": ["string", "string"]
}
],
"projects": [
{
"name": "string",
"stack": "string",
"year": "string",
"bullets": ["string", "string"]
}
],
"skills": {
"languages": ["string"],
"frameworks": ["string"],
"databases": ["string"],
"cloud": ["string"],
"concepts": ["string"]
}
}
}Rules used by the prompt:
- Use exactly these keys.
- If information is missing, include the field with an empty string or empty list.
- Do not add extra keys.
- The service attempts to add missing keys before validating.
-
RuntimeError: GEMINI_API_KEY environment variable is not set.- Set
GEMINI_API_KEYor implement a local mock for development.
- Set
-
LLM returns non-JSON or invalid JSON
- The LLM is instructed to return strict JSON. If parsing fails, inspect logs or
llm_clientto see the raw output. Implement a safe-extraction heuristic if the model returns leading/trailing text.
- The LLM is instructed to return strict JSON. If parsing fails, inspect logs or
-
Pydantic validation errors
- The normalizer will add missing keys but will not remove unexpected keys. If the LLM includes extra fields, you may need to sanitize the object before validation or configure the Pydantic models to allow/forbid extras as desired.