Common questions about Bite-Size Reader.
- General
- Installation
- Configuration
- Features
- YouTube Support
- Web Search
- Performance
- Security
- Integration
- Cost Optimization
Bite-Size Reader is an AI-powered Telegram bot that transforms long web articles, YouTube videos, forwarded Telegram posts, and mixed-source bundles into structured, searchable outputs. It uses:
- Firecrawl for clean content extraction
- OpenRouter (or OpenAI/Anthropic) for LLM-powered summarization
- yt-dlp for YouTube video downloads and transcript extraction
Key Features:
- Strict JSON summary contract (35+ fields)
- Multi-language support (English, Russian)
- Semantic search (ChromaDB, vector embeddings)
- Mobile API (JWT auth, multi-device sync)
- YouTube video + transcript support
- Mixed-source aggregation across X, Threads, Instagram, YouTube, web, and Telegram-native sources
- Optional web search enrichment
- Self-hosted, privacy-focused
- Information Workers: Researchers, analysts, students who read many articles daily
- Content Curators: People who save and organize knowledge
- Privacy-Conscious Users: Self-hosting ensures your reading history stays private
- Developers: Extensible architecture (hexagonal, multi-agent), MCP server for AI agents
The software is free and open-source (BSD 3-Clause license), but you'll need API keys:
- Content extraction: Scrapling (default, free, in-process) or self-hosted Firecrawl (free). Cloud Firecrawl is optional: 500 free credits/month (~500 articles), then $25/month for 5,000 credits
- OpenRouter: Pay-per-use ($0.01-0.05 per summary depending on model)
- Alternative: Use free models (Google Gemini 2.0, some DeepSeek R1 providers offer free tier)
- YouTube: Free (uses yt-dlp, no API costs)
Estimated Monthly Cost: $10-30 for moderate use (50-100 summaries/month).
See Cost Optimization for ways to minimize costs.
- You send a URL, multiple URLs, or forwarded content to the Telegram bot (or call the API).
- Content extraction: Multi-provider scraper chain (Scrapling, Firecrawl, Playwright, Crawlee, direct HTML) extracts articles; platform extractors handle X, Threads, Instagram, and YouTube; Telegram-native submissions preserve message/media provenance.
- LLM summarization or synthesis: OpenRouter sends extracted content to an LLM (e.g., DeepSeek, Qwen, Kimi).
- Structured output: The system returns either a strict summary JSON object or a provenance-aware aggregation bundle result.
- Storage: Requests, source items, crawl artifacts, LLM calls, and outputs are stored in SQLite.
- Reply: Bot sends formatted results back to Telegram, and the API returns the same workflow through
/v1/*.
- Structured Output: 35+ JSON fields (TLDR, key ideas, topic tags, entities, readability scores) vs free-form text
- Persistent Storage: All summaries saved and searchable (semantic + full-text search)
- Multi-Interface: Telegram, mobile app, CLI, MCP server access the same data
- Self-Hosted: Your data never leaves your server
- YouTube Support: Extract and summarize video transcripts
- Bundle Synthesis: Compare and combine one or many mixed sources into one aggregation output
- Optimized for Reading: Designed specifically for article summarization, not general chat
Minimum:
- Python 3.13+
- 512 MB RAM (1 GB recommended with ChromaDB)
- 5 GB disk space (more if storing YouTube videos)
- Linux, macOS, or Windows (WSL recommended on Windows)
Optional (for YouTube):
- ffmpeg (video/audio merging)
Optional (for semantic search):
- ChromaDB server (or use embedded mode)
Yes, but with caveats:
- Pi 4 (4GB+): Works well, but disable ChromaDB or use CPU-only mode
- Pi 3 or older: Too slow, ChromaDB embeddings will struggle
- Docker: Recommended for easy deployment on Pi
# Pi-optimized config
CHROMA_DEVICE=cpu
CHROMA_REQUIRED=false # Or use lightweight embedding model
YOUTUBE_VIDEO_QUALITY=720 # Lower quality for smaller downloadsYes. Some dependencies (chromadb, sentence-transformers) may need compilation:
# macOS M1/M2
brew install cmake pkg-config
pip install -r requirements.txt
# Raspberry Pi (Debian/Raspbian)
sudo apt-get install build-essential python3-dev
pip install -r requirements.txtYes. Use Python virtual environment:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python bot.pySee tutorials/local-development.md for full guide.
Absolutely required:
API_ID=... # Telegram API ID (from https://my.telegram.org/apps)
API_HASH=... # Telegram API hash
BOT_TOKEN=... # Bot token (from @BotFather)
ALLOWED_USER_IDS=... # Your Telegram user ID
OPENROUTER_API_KEY=... # OpenRouter API keyOptional but common:
FIRECRAWL_API_KEY=... # Only needed for cloud Firecrawl or web search (Scrapling is the free default)Optional but recommended:
OPENROUTER_MODEL=deepseek/deepseek-v3.2
OPENROUTER_FALLBACK_MODELS=qwen/qwen3-max,moonshotai/kimi-k2.5
DB_PATH=/data/app.db
LOG_LEVEL=INFOSee environment_variables.md for full reference (250+ variables).
- Message
@userinfoboton Telegram - Copy the numeric ID
- Add to
ALLOWED_USER_IDSenvironment variable - Restart bot
Yes. Set LLM_PROVIDER=openai:
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o
OPENAI_FALLBACK_MODELS=gpt-4o-miniOr Anthropic:
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-5-20250929Add multiple user IDs to ALLOWED_USER_IDS:
ALLOWED_USER_IDS=123456789,987654321,555666777All users share the same database (no per-user isolation). This is designed for personal use or small teams, not multi-tenant SaaS.
Supported:
- ✅ Web articles (news sites, blogs, documentation)
- ✅ X/Twitter posts and X article links
- ✅ Threads posts
- ✅ Instagram posts, carousels, and reels
- ✅ YouTube videos (any format: watch, shorts, live, music)
- ✅ Forwarded Telegram channel posts
- ✅ Mixed-source aggregation bundles (one or many URLs, plus Telegram-native content in Telegram flows)
- ✅ PDFs (with embedded image analysis)
- ✅ Channel digest summaries (scheduled digests of subscribed Telegram channels)
- ✅ Long-form content (up to 256k tokens with long-context models)
Not Supported:
- ❌ Paywalled content (WSJ, NYT, Medium members-only)
- ❌ Sites with CAPTCHA challenges (most get bypassed by Firecrawl proxies)
- ❌ Videos without transcripts (unless Whisper transcription enabled)
Yes. Supports English and Russian out of the box:
- Language detection (
PREFERRED_LANG=autoby default) - Separate prompts for English (
app/prompts/en/) and Russian (app/prompts/ru/) - Russian content gets Russian summary and vice versa
Adding new languages: Copy app/prompts/en/summary_system.txt to new language directory, translate prompt, update app/core/lang.py.
Yes. Three search modes:
-
Full-Text Search (FTS5): Fast keyword search
SELECT * FROM topic_search_index WHERE topic_search_index MATCH 'python tutorial';
-
Semantic Search (ChromaDB): Natural language queries
# Via CLI python -m app.cli.search --query "machine learning basics" # Via Telegram /search machine learning
-
Hybrid Search: Combines full-text + semantic + reranking
See SPEC.md § Search for details.
Every summary follows a strict schema with 35+ fields:
- Core:
summary_250(≤250 chars),summary_1000,tldr,key_ideas - Metadata:
title,url,word_count,estimated_reading_time_min - Semantic:
topic_tags,entities,semantic_chunks,seo_keywords - Quality:
confidence,readability,hallucination_risk
See reference/summary-contract.md for full specification.
Yes. Multiple export formats:
- JSON: Via mobile API (
GET /v1/summaries) - PDF: Via
weasyprint(roadmap: not yet implemented) - Markdown: Via CLI export (roadmap: not yet implemented)
- SQLite: Direct database access (
data/app.db)
Yes. The Telegram bot exposes /aggregate, and the API exposes POST /v1/aggregations.
- Telegram:
/aggregateaccepts one or more links and can include the current forwarded/attached message context when present. - API:
POST /v1/aggregationsaccepts a bundle of 1-25 URL items. - Output: the result includes per-item extraction status plus one synthesized aggregation payload with source coverage, duplicates, contradictions, and provenance-aware claims.
Yes. All URLs normalized and hashed (sha256) before processing:
https://example.com/article?utm_source=twitter→ deduplicatedhttp://example.com/article→ deduplicated (HTTPS normalized)- Same article reposted won't be processed twice (returns cached summary)
All major formats:
- ✅ Standard videos (
youtube.com/watch?v=...) - ✅ Shorts (
youtube.com/shorts/...) - ✅ Live streams (
youtube.com/live/...) - ✅ Embedded videos (
youtube.com/embed/...) - ✅ Mobile links (
m.youtube.com/...,youtu.be/...) - ✅ YouTube Music (
music.youtube.com/...)
-
Try youtube-transcript-api (fast, no download)
- Fetches auto-generated or manual captions
- Works for 90%+ of videos
-
Fallback to yt-dlp (slower, downloads video)
- Downloads video + extracts audio
- Sends audio to Whisper API for transcription (if
WHISPER_API_KEYset)
Default behavior: Fails with error message.
Workaround: Enable Whisper transcription (requires API key or local Whisper model):
ENABLE_WHISPER_TRANSCRIPTION=true
WHISPER_API_KEY=... # Or leave empty for local WhisperDepends on video quality and length:
- 1080p, 10-minute video: ~200 MB
- 720p, 10-minute video: ~100 MB
- Audio-only (if video not needed): ~10 MB
Storage management:
YOUTUBE_CLEANUP_AFTER_DAYS=7 # Delete after 7 days
YOUTUBE_MAX_STORAGE_GB=10 # Max 10 GB totalNot yet, but planned. Current workaround: Set low quality
YOUTUBE_VIDEO_QUALITY=480 # Smaller filesOr disable YouTube entirely:
ENABLE_YOUTUBE=falseOptional feature that queries the web for current context before summarizing:
- Extract keywords from article (e.g., "climate change 2025")
- Search DuckDuckGo (or Google if API key provided)
- Add top 3 results to LLM prompt as additional context
- LLM generates summary with up-to-date information
Benefits: Corrects outdated info, adds recent developments, fact-checks claims.
Enable for:
- News articles (time-sensitive topics)
- Research papers (need latest findings)
- Tutorial articles (check if still relevant)
Disable for:
- Timeless content (classic literature, historical docs)
- Privacy-sensitive content (internal docs, private blogs)
- Cost sensitivity (adds ~500 tokens per summary)
# Enable
WEB_SEARCH_ENABLED=true
WEB_SEARCH_TIMEOUT_SEC=10Yes, but minimal:
- Web search API: DuckDuckGo is free, Google costs ~$0.005 per query
- LLM tokens: Adds
500 tokens ($0.005-0.01 depending on model)
Total extra cost: ~$0.01 per summary.
Typical:
- Articles: 5-10 seconds (2-3s Firecrawl + 3-5s LLM)
- YouTube videos: 10-20 seconds (transcript extraction + LLM)
- Long articles (20k+ words): 15-30 seconds (chunking + longer LLM processing)
Factors:
- Model speed (DeepSeek fast, GPT-4 slower)
- Network latency
- Article length
- Web search enabled/disabled
Yes. Several optimizations:
-
Use faster model:
OPENROUTER_MODEL=qwen/qwen3-max # Faster than DeepSeek -
Increase concurrency:
MAX_CONCURRENT_CALLS=5 # Default: 4 -
Disable optional features:
WEB_SEARCH_ENABLED=false SUMMARY_TWO_PASS_ENABLED=false
-
Reduce content length:
MAX_CONTENT_LENGTH_TOKENS=30000 # Default: 50000
Common causes:
- ChromaDB embeddings (sentence-transformers model): 500 MB - 1 GB
- YouTube downloads in memory before writing to disk
- LLM response buffering
Solutions:
# Use smaller embedding model
CHROMA_EMBEDDING_MODEL=all-MiniLM-L6-v2 # ~100 MB
# Disable ChromaDB if not using search
CHROMA_REQUIRED=false
# Limit ChromaDB memory
CHROMA_MAX_MEMORY_MB=512Yes, via CLI:
# From file (one URL per line)
python -m app.cli.summary --accept-multiple --url-file urls.txt
# Output to JSON
python -m app.cli.summary --accept-multiple --url-file urls.txt --json-path summaries.jsonNote: Respects rate limits and concurrency settings.
Yes, if self-hosted:
- No data leaves your server (except API calls to Firecrawl/OpenRouter)
- API calls redacted: Authorization headers never logged
- SQLite database: Stored locally (
data/app.db) - No telemetry: No usage analytics sent anywhere
Privacy considerations:
- Firecrawl sees your URLs (use trafilatura fallback for sensitive sites)
- OpenRouter/OpenAI see article content (use on-premise LLM if needed)
- Telegram sees your bot interactions (use Telegram's privacy settings)
Shared-instance access (supported):
- Multiple allowed users can use the same deployment through Telegram, the JWT API, the CLI, and request-scoped MCP access.
- Aggregation bundle API and MCP operations are scoped to the authenticated user.
- This is suitable for personal use or small trusted teams.
What is still not provided:
- This is not a fully isolated multi-tenant SaaS deployment.
- The app still runs as one shared instance and database.
- If you need strict tenant isolation, run separate deployments or add row-level isolation around the remaining shared surfaces.
Environment variables only (.env file):
# .env file (not committed to git)
FIRECRAWL_API_KEY=fc-...
OPENROUTER_API_KEY=sk-or-...
BOT_TOKEN=1234567890:ABCDEF...Never stored in:
- Database
- Logs (Authorization headers redacted)
- Git repository (
.envin.gitignore)
Impact: Attacker can send messages as your bot, but can't:
- See your summaries (database access required)
- Trigger summarization (ALLOWED_USER_IDS whitelist blocks them)
- Access Mobile API (separate JWT authentication)
Response:
- Revoke token via @BotFather on Telegram
- Generate new token
- Update
BOT_TOKENin.env - Restart bot
Yes. Mobile API (FastAPI) provides:
- JWT authentication (Telegram login exchange)
- Summary fetching (
GET /v1/summaries) - Multi-device sync (full + delta modes)
- Collection management
- Offline-first support
See MOBILE_API_SPEC.md for API reference.
Note: Mobile app UI not included (API only). Build your own client or use Telegram bot.
Yes. Multiple integration options:
- REST API (FastAPI): Build custom clients
- MCP Server: Expose to Claude Desktop (or any MCP client)
- SQLite Database: Direct database access for custom scripts
- CLI Tools: Batch processing, search, export
See mcp_server.md for MCP integration.
Not directly, but you can:
- Export to Markdown (CLI tool, roadmap)
- Sync via Mobile API (build custom sync script)
- Direct Database Access (query SQLite, convert to Markdown)
Not out-of-box, but adaptable:
- Replace Telegram adapter (
app/adapters/telegram/) with Slack adapter - Use hexagonal architecture (core logic unchanged)
- Implement Slack OAuth (instead of Telegram access control)
See HEXAGONAL_ARCHITECTURE_QUICKSTART.md for architecture guide.
Free tier strategies:
-
Use free models (OpenRouter):
OPENROUTER_MODEL=google/gemini-2.0-flash-001:free OPENROUTER_FALLBACK_MODELS=deepseek/deepseek-r1:free
-
Use free content extraction (no Firecrawl costs):
- Scrapling is the default provider (free, in-process, no API key)
- Self-hosted Firecrawl is another free option (
FIRECRAWL_SELF_HOSTED_ENABLED=true) - Cloud Firecrawl is only needed for sites that resist local scraping
SCRAPER_ENABLED=true SCRAPER_SCRAPLING_ENABLED=true SCRAPER_PROVIDER_ORDER=["scrapling", "direct_html"] # Skip cloud Firecrawl entirely
-
Cache aggressively:
REDIS_ENABLED=true REDIS_LLM_TTL_SECONDS=86400 # 24 hours -
Disable optional features:
WEB_SEARCH_ENABLED=false SUMMARY_TWO_PASS_ENABLED=false
Ranked by cost/quality (as of Feb 2026):
- Free tier:
google/gemini-2.0-flash-001:free(best free option) - Ultra-cheap:
deepseek/deepseek-v3.2(~$0.01/summary) - Cheap + good:
qwen/qwen3-max(~$0.02/summary) - Balanced:
moonshotai/kimi-k2.5(~$0.03/summary, great for long content)
Not recommended (too expensive for this use case):
gpt-4-turbo: ~$0.20/summaryclaude-opus-4: ~$0.30/summary
Yes, but requires setup:
-
Run local LLM (Ollama, LM Studio, vLLM):
ollama run llama3.2:70b
-
Point bot to local endpoint:
LLM_PROVIDER=openai # Use OpenAI-compatible API OPENAI_API_KEY=dummy # Not needed for local OPENAI_BASE_URL=http://localhost:11434/v1 OPENAI_MODEL=llama3.2:70b
-
Disable self-hosted Firecrawl provider (keep local extraction only):
FIRECRAWL_SELF_HOSTED_ENABLED=false SCRAPER_PROVIDER_ORDER=["scrapling", "playwright", "crawlee", "direct_html"]
Breaking rename note: legacy scraper vars SCRAPLING_* and SCRAPER_DIRECT_HTTP_ENABLED now fail fast at startup.
Hardware requirements: 70B model needs 40+ GB VRAM (A100, H100, or multiple GPUs).
Yes, if you re-summarize URLs often:
- Same URL sent twice → cached summary returned (0 API cost)
- Redis cache hit rate: ~30-40% for news aggregators who share links
Not worth it if:
- You never re-read articles
- Redis adds complexity you don't want
Enable caching:
REDIS_ENABLED=true
REDIS_LLM_TTL_SECONDS=604800 # 7 days- TROUBLESHOOTING.md - Debugging guide
- environment_variables.md - Configuration reference
- DEPLOYMENT.md - Setup and deployment
- MOBILE_API_SPEC.md - REST API specification
- SPEC.md - Technical specification
- README.md - Project overview
Last Updated: 2026-03-28
Have a question not answered here? Open an issue or check TROUBLESHOOTING.md.