Note: This is a Google-owned project.
Generates a unified Study Pack Markdown note from multiple course tabs using Google ADK + Gemini 2.5 Flash + Playwright.
Opens your browser tabs, captures screenshots, runs visual reasoning, and synthesises everything into a structured Obsidian-compatible note.
- Visual Reasoning Pipeline — Gemini analyses each tab (classify, prioritise, detect gaps, compare sources, navigate by vision)
- Multi-tab browser capture — Playwright opens all tabs and captures full-page screenshots + text
- ADK Agent — SequentialAgent: TabClassifier → ContentExtractor → NoteSynthesizer
- Obsidian Integration — Saves notes to your vault with YAML frontmatter, tags, and objectives
- Telegram Bot — Trigger the pipeline via Telegram message
- Web UI — React chat interface at
/chat
# Install dependencies
uv sync
# Install Playwright browsers
uv run playwright install chromium
# Configure environment
cp .env.example .env # add your keys.env required keys:
GOOGLE_API_KEY=...
OBSIDIAN_VAULT_PATH=/path/to/your/vault
TELEGRAM_BOT_TOKEN=... # optional
# Web server (FastAPI + chat UI at http://localhost:8080/chat)
uv run python main.py
# Telegram bot (standalone)
uv run python scripts/telegram_bot.py# Stages 1-4: search + browser capture (no LLM)
python scripts/test_e2e.py
# Full pipeline with LLM
python scripts/test_e2e.py --full
# Visual content tests (images, GIF, video thumbnail)
python scripts/test_e2e.py --visual
# All 5 Gemini visual reasoning features
python scripts/test_e2e.py --vision-reasoning
# FastAPI /run_sse round-trip
python scripts/test_e2e.py --httpmain.py FastAPI server (:8080)
scripts/telegram_bot.py Telegram bot
src/agent/
__init__.py ADK root_agent
agent.py generate_study_pack tool
pipeline.py SequentialAgent pipeline
src/tools/
browser_capture.py Playwright tab capture + visual navigation
gemini_vision.py 5 visual reasoning features
web_search.py DuckDuckGo search (ddgs)
markdown_writer.py Obsidian note writer
frontend/index.html React chat UI
fixtures/ HTML fixture pages for testing
- Google ADK — agent framework
- Gemini 2.5 Flash — LLM + vision
- Playwright — browser automation
- FastAPI — web server
- python-telegram-bot — Telegram integration