Search Gmail, read email and attachment content, extract trip details, and export only matching results as PDFs.
This tool is built for workflows like travel bills, invoices, and receipts where you need a clear filter rule and auditable output.
⚠️ Vibe Coding Disclaimer: This project was built using AI-assisted development ("vibe coding"). While functional, it may contain rough edges, unconventional patterns, or areas that could benefit from refinement. Use at your own discretion and feel free to contribute improvements.
git clone https://github.com/rokernel/gmail-ai-processor.git
cd gmail-ai-processor
uv sync
uv run playwright install chromium
cp .env.example .envThen add your keys to .env, place credentials.json in the project root, and run:
uv run python -m gmail_processor \
-s 2026-01-01 -e 2026-01-31 \
-t "receipts" \
--has-attachments --dry-run --verbose- Connects to Gmail with OAuth2 and searches by date range, subject, and attachment presence
- Downloads full message content and full attachment payloads
- Parses attachment text (PDF, DOCX, XLSX, images via OCR, nested
.eml) - Extracts structured fields like
Ticket-ID,Valid:ranges, amount, and order/reference numbers - Uses MiniMax AI to summarize content and classify matches against your command
- Exports only matched artifacts to PDF and writes an analysis report
The pipeline is hybrid by default for reliability and speed:
- Deterministic extraction and rule checks (weekday/time windows, ticket ranges)
- AI summary and classification for attachments that need semantic understanding
- Final export only for matched items
If you want AI on every unique attachment, use --ai-all-attachments.
- Python 3.10+
- Google Cloud project with Gmail API enabled
- MiniMax API key
- Playwright Chromium runtime
Optional:
- Tesseract OCR for better image/scanned PDF extraction
git clone <repository-url>
cd gmail-ai-processor
uv sync
uv run playwright install chromium- Go to Google Cloud Console
- Create or select a project
- Enable the Gmail API
- Create OAuth client credentials for a Desktop app
- Download the credentials file
- Rename it to
credentials.jsonand place it in the project root
Security Note: The
credentials.jsonfile is in.gitignoreand will NOT be committed to GitHub.
Copy the example environment file and add your API keys:
cp .env.example .envEdit .env with your actual values:
MINIMAX_API_KEY=your_minimax_api_key_here
MINIMAX_BASE_URL=https://api.minimax.io/v1
MINIMAX_MODEL=MiniMax-M2.5
# Optional: OpenAI fallback
OPENAI_API_KEY=your_openai_key_here
OPENAI_MODEL=gpt-4o-mini
# Optional: Override default paths
GMAIL_CREDENTIALS_PATH=./credentials.json
GMAIL_TOKEN_PATH=./token.json
DEFAULT_OUTPUT_DIR=./exports
AI_TIMEOUT=120Security Note: The
.envfile is in.gitignoreand will NOT be committed to GitHub.
Run once to trigger the OAuth flow and create your token:
uv run python -m gmail_processor --helpOr start processing:
uv run python -m gmail_processor -s 2026-01-01 -e 2026-01-31 -t "test"A browser window will open asking you to authenticate with Google. After authorization, a token.json file will be created for future runs.
Security Note: The
token.jsonfile is in.gitignoreand will NOT be committed to GitHub.
| Option | Short | Description |
|---|---|---|
--start-date |
-s |
Start date (YYYY-MM-DD) required |
--end-date |
-e |
End date (YYYY-MM-DD) required |
--topic |
-t |
Topic string for AI matching required |
--subject |
Filter by Gmail subject line | |
--ai-command |
Custom natural-language extraction instruction | |
--has-attachments |
Only process emails with attachments | |
--ai-all-attachments |
Run AI on every attachment (slower, more thorough) | |
--max-results |
Maximum emails to process (default: 100) | |
--output-dir |
-o |
Output directory for exports |
--dry-run |
Analyze only, do not save files | |
--verbose |
-v |
Enable verbose logging |
uv run python -m gmail_processor \
-s 2026-01-01 -e 2026-01-31 \
-t "invoice" \
--subject "billing"uv run python -m gmail_processor \
-s 2026-01-01 -e 2026-02-26 \
-t "receipts" \
--has-attachmentsuv run python -m gmail_processor \
-s 2026-01-01 -e 2026-02-26 \
-t "bills" \
--subject "emails" \
--has-attachments \
--ai-command "open for SBB CFF bills and export the ones that are from Monday to Friday from 8AM to 12AM" \
--max-results 300uv run python -m gmail_processor \
-s 2026-01-01 -e 2026-02-26 \
-t "bills" \
--subject "emails" \
--has-attachments \
--ai-all-attachments \
--ai-command "find all SBB CFF trips and summarize each ticket"uv run python -m gmail_processor \
-s 2026-01-01 -e 2026-02-26 \
-t "bills" \
--has-attachments \
--ai-command "extract trip details and list ticket ids" \
--dry-run --verboseuv run python -m gmail_processor \
-s 2025-01-01 -e 2026-02-26 \
-t "travel" \
--subject "SBB" \
--has-attachments \
--max-results 1000See a full sample run output in docs/demo/cli-dry-run.txt.
uv run python -m gmail_processor \
-s 2026-01-01 -e 2026-01-31 \
-t "receipts" \
--has-attachments \
--dry-run --verboseSee a sample export tree in docs/demo/output-tree.txt.
When matches are found, the following structure is created:
exports/
<topic>/
<date>_<subject>_<id>/
email.eml # Original email
email.pdf # Email rendered as PDF
analysis.json # AI analysis results
<attachment-1>.pdf # Converted/saved attachment
<attachment-2>.pdf
The analysis.json file contains:
{
"command": "the AI command used",
"email_subject": "Email subject",
"email_sender": "sender@example.com",
"email_date": "2026-01-15",
"attachments_analyzed": [
{
"index": 1,
"filename": "ticket.pdf",
"matched": true,
"confidence": 0.95,
"summary": "Trip from Zurich to Geneva",
"trip_hours": ["08:30", "09:15"],
"trip_details": {
"ticket_id": "12345",
"valid_from": "15.01.2026 08:30",
"valid_to": "15.01.2026 09:15",
"amount": "25.00 CHF"
},
"reasoning": "Matches weekday morning criteria",
"needs_review": false
}
]
}The AI understands natural language time constraints:
from 8AM to 12AM→ Morning window (08:00 to 12:00)from 8AM to 6PM→ Full workday (08:00 to 18:00)weekdays→ Monday through Fridayweekends→ Saturday and Sunday
Note: If you mean midnight, explicitly say
to midnightorto 11:59PM.
- Run with
--verboseto see the Gmail query being used - Try adding or removing
--subjectfilter - Verify your date range is correct
If you authenticated with the wrong account:
rm token.json
# Re-run and authenticate with the correct accountuv run playwright install chromium- Verify
MINIMAX_API_KEYis set correctly in.env - Check
MINIMAX_BASE_URLmatches your provider - Try setting a different
MINIMAX_MODELif the default isn't available
- Use
--subjectto filter more specifically - Lower
--max-resultsfor smaller batches - Avoid
--ai-all-attachmentsunless necessary (default mode is faster)
uv run pytestuv run ruff check .
uv run black .
uv run mypy src/uv build- Never commit credential files:
credentials.json,token.json, and.envare in.gitignoreby default - Rotate exposed credentials: If you accidentally committed credentials, revoke them immediately in Google Cloud Console
- Token security: The
token.jsonfile contains OAuth tokens - treat it like a password - API keys: Keep your MiniMax/OpenAI API keys in
.envonly - Scope limitation: This app only requests
gmail.readonlyscope - it cannot send emails or modify your inbox
| File | Purpose | Should Commit? |
|---|---|---|
credentials.json |
Google OAuth client credentials | ❌ NO |
token.json |
OAuth access/refresh tokens | ❌ NO |
.env |
API keys and configuration | ❌ NO |
.env.example |
Template for .env |
✅ YES |
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
│ CLI Input │────▶│ Gmail Client │────▶│ Search Emails │
└─────────────┘ └──────────────┘ └─────────────────┘
│
┌──────────────┐ ▼
│ Export PDFs │◀────┌──────────────────┐
└──────────────┘ │ Process Attachm. │
▲ └──────────────────┘
│ │
┌──────┴──────┐ ▼
│ Storage │ ┌──────────────────┐
└─────────────┘◀────│ AI Analyzer │
└──────────────────┘
MIT License - See LICENSE file for details
This is a vibe-coded project - contributions are welcome! Feel free to:
- Open issues for bugs or feature requests
- Submit pull requests with improvements
- Suggest better patterns or refactoring
Built with:
- Typer for CLI
- Rich for terminal output
- Playwright for PDF generation
- MiniMax for AI analysis