Extract individual photos from album page scans and enhance them using AI.
Got a stack of old photo albums? Snap a picture of each page, point this tool at the folder, and get back individual photos — cropped, rotated, dated, geotagged, colorized, and restored.
Album page scan ──▶ ai_parse.py ──▶ autocrop_meta.json ──▶ editor.py ──▶ crop_exif.py ──▶ ai_enhance.py
(Vision AI) (bbox + metadata) (web editor) (crop+EXIF) (restore)
ai_parse.pysends each page to a Vision AI model (Gemini or GPT-4o) which returns bounding boxes, orientation, dates, locations, and captions — saved toautocrop_meta.jsoneditor.pyserves a web-based editor to visually review and adjust bounding boxes, rotation, dates, locations (with map), and captionscrop_exif.pyreads the metadata file and produces cropped photos with EXIF metadataai_enhance.pysends each cropped photo to Gemini which restores it: removes album corners, fixes scratches, improves contrast, and colorizes B&W photos
pip install -r requirements.txt
export GEMINI_API_KEY="your-key"
# Step 1: Analyze pages and create metadata
python ai_parse.py ./album_pages/ -p gemini --default-location "New York"
# Step 2: Review and adjust in web editor
python editor.py ./album_pages/
# Step 3: Apply metadata to produce cropped photos (editor does this via Apply button, or run manually)
python crop_exif.py ./album_pages/
# Step 4: Enhance extracted photos
python ai_enhance.py ./album_pages/cropped/ -o ./enhanced/Or use --auto-apply for the legacy one-shot workflow:
python ai_parse.py ./album_pages/ -p gemini --auto-applyIf your photos are already scanned or photographed one by one, skip the AI detection step and go straight to the editor:
# Step 1: Review and adjust in web editor
python editor.py ./my_photos/
# Step 2: Enhance
python ai_enhance.py ./my_photos/cropped/ -o ./enhanced/The editor will treat each image as a single full-page photo and pre-fill date, location, and caption from EXIF. You can fine-tune the crop, fix rotation, and edit metadata — then hit Apply. Originals stay untouched; results go to cropped/.
Default: create-metadata — analyze pages and save results to autocrop_meta.json (no cropping):
python ai_parse.py ./album_pages/ -p gemini --default-location "Berlin"Auto-apply — legacy one-shot mode (analyze + crop + save in one step):
python ai_parse.py ./album_pages/ -p gemini --auto-apply# Single page
python ai_parse.py page.jpg -p gemini
# Directory of pages, 4 in parallel
python ai_parse.py ./album_pages/ -p gemini -j 4
# With fallback location for geocoding
python ai_parse.py ./album_pages/ -p gemini --default-location "Berlin"
# Restrict geocoding to specific countries
python ai_parse.py ./album_pages/ -p gemini --default-location "Karaganda" --countries "kz,ru"Features:
- Photo detection via Gemini or GPT-4o Vision API
- Reads handwritten dates, locations, and captions
- GPS geocoding via OpenStreetMap Nominatim
- Date inheritance: undated photos get the date from the nearest dated page
- Parallel page processing (
-j N)
Options:
| Option | Description |
|---|---|
--auto-apply |
Analyze + crop + save in one shot (legacy behavior) |
-o, --output DIR |
Output directory (default: <input>/cropped) |
-p, --provider |
openai or gemini (default: openai) |
-m, --model |
Vision model name |
--api-key |
API key (default: from env var) |
--default-location |
Fallback city/region for geocoding |
--countries |
Restrict geocoding to these countries (comma-separated ISO codes, e.g. kz,ru,ua,us) |
--no-location-spread |
Don't apply a recognized location from one photo to other photos on the same page |
-j, --jobs N |
Parallel pages (default: 4) |
Reads autocrop_meta.json and produces cropped, rotated JPEG files with EXIF metadata (date, GPS, caption).
python crop_exif.py ./album_pages/
python crop_exif.py ./album_pages/ -o ./output/No AI calls, no API key needed. Works with the metadata file created by ai_parse.py or editor.py.
Options:
| Option | Description |
|---|---|
-o, --output DIR |
Output directory (default: <input>/cropped) |
Visual editor for reviewing and adjusting photo detection results before cropping.
python editor.py ./album_pages/
python editor.py ./album_pages/ -o ./output/ --port 8080Works with or without autocrop_meta.json — if no metadata exists, each image is treated as a single full-page photo with date, location, and caption pre-filled from the source file's EXIF. GPS coordinates are automatically resolved to place names in the background via reverse geocoding.
When autocrop_meta.json exists but photos have empty fields, the editor also backfills date, location, and caption from the source image EXIF.
Opens a web app at http://localhost:8080 with two modes accessible via header toggle:
Cut photos — adjust bounding boxes:
- Drag corners and edges to resize photo boundaries
- Drag inside a box to move it
- Double-click to add a new photo
- Delete key to remove selected photo (at least 1 photo per page required)
- First photo auto-selected on page load
Edit photos — edit photo details:
- Preview cropped and rotated photos
- Rotate photos left/right
- Edit date and caption (changing a value suggests applying it to other photos with the same old value)
- Select location on an interactive map (OpenStreetMap)
- Save Draft or Apply to produce final cropped photos
- Warns about unsaved changes on page close
Navigation: Google-style page selector in header, arrow keys for prev/next page.
Options:
| Option | Description |
|---|---|
-o, --output DIR |
Output directory for cropped photos (default: <input>/cropped) |
--port N |
HTTP server port (default: 8080) |
--no-browser |
Don't open browser automatically |
Sends each photo to Gemini which restores it in one API call: removes album corner holders, fixes scratches and defects, improves contrast, and colorizes B&W photos.
# Single photo
python ai_enhance.py photo.jpg
# Directory, 3 parallel workers
python ai_enhance.py ./cropped/ -j 3 --rpm 10
# Batch mode (50% cheaper, slower)
python ai_enhance.py ./cropped/ --batch
# Pro model for higher quality
python ai_enhance.py ./cropped/ -m gemini-3-pro-image-previewTwo modes:
- Real-time (default): parallel workers with rate limiting, results immediately
- Batch (
--batch): submits all photos as one job, 50% cheaper, waits for completion. Failed photos are automatically retried.
Options:
| Option | Description |
|---|---|
-o, --output DIR |
Output directory (default: <input>/enhanced) |
-m, --model |
Gemini image model (default: gemini-3.1-flash-image-preview) |
--api-key |
Gemini API key (default: GEMINI_API_KEY env var) |
-j, --jobs N |
Parallel workers, real-time mode only (default: 3) |
--rpm N |
Max requests per minute, real-time mode only (default: 10) |
--batch |
Use Batch API (50% cheaper). -j and --rpm are ignored |
--poll N |
Batch poll interval in seconds (default: 30) |
Terminal-based metadata editor for already-cropped photos.
python edit_meta.py ./cropped/
python edit_meta.py ./enhanced/ --countries "kz,ru"The script displays a numbered table of all photos with their metadata, then accepts commands:
# File Date Loc Caption
1 198905_page1_photo_01.jpg 1989:05 ✓ Walking with mom
2 198905_page1_photo_02.jpg 1989:05 ✓ —
3 page2_photo_01.jpg — — Think...
Enter commands: <number><D|L|C> <value> (empty line to apply)
> 3D 1989:06
> 3L Berlin, Germany
> 3L 49.5186, 72.8238
> 1C Walking in the park
>
| Command | Description |
|---|---|
3D 1989:06 |
Set date for photo #3 (also renames the file) |
3L Berlin, Germany |
Set location by name (geocodes to GPS coordinates) |
3L 49.5186, 72.8238 |
Set location by coordinates (writes GPS directly, no geocoding) |
1C Walking in the park |
Set caption |
Options:
| Option | Description |
|---|---|
--countries |
Restrict geocoding to these countries (comma-separated ISO codes) |
All processing happens via API calls — no local GPU needed.
ai_parse.py (Gemini Flash):
- ~$0.0003 per page (~3K input tokens)
- 100-page album: ~$0.03
ai_enhance.py (Gemini Flash Image):
- ~$0.04 per photo (image generation)
- With
--batch: ~$0.02 per photo (50% off) - 200 photos: ~$4-8
| Tier | RPM | RPD |
|---|---|---|
| Free | 2 | 50 |
| Tier 1 (paid) | 10 | 500 |
| Tier 2 | 20 | 2,000 |
| Provider | Default Model | Env Variable |
|---|---|---|
| OpenAI | gpt-4o |
OPENAI_API_KEY |
| Gemini | gemini-3-flash-preview |
GEMINI_API_KEY |
Any model name can be passed via -m.
Google Photos captions: There is no way to programmatically set a photo description that Google Photos will display in its UI. We write captions as EXIF ImageDescription and XMP dc:description, but Google Photos shows them under "Other" instead of the description field. The Google Photos Library API accepts a description field, but it is only visible via the API — not in the web or mobile UI. Apple Photos displays XMP descriptions correctly.
pip install -r requirements.txtRequires Python 3.9+. No GPU needed.
MIT