Frequently Asked Questions (FAQ)

Common questions about Bite-Size Reader.

General
Installation
Configuration
Features
YouTube Support
Web Search
Performance
Security
Integration
Cost Optimization

General

What is Bite-Size Reader?

Bite-Size Reader is an AI-powered Telegram bot that transforms long web articles, YouTube videos, forwarded Telegram posts, and mixed-source bundles into structured, searchable outputs. It uses:

Firecrawl for clean content extraction
OpenRouter (or OpenAI/Anthropic) for LLM-powered summarization
yt-dlp for YouTube video downloads and transcript extraction

Key Features:

Strict JSON summary contract (35+ fields)
Multi-language support (English, Russian)
Semantic search (ChromaDB, vector embeddings)
Mobile API (JWT auth, multi-device sync)
YouTube video + transcript support
Mixed-source aggregation across X, Threads, Instagram, YouTube, web, and Telegram-native sources
Optional web search enrichment
Self-hosted, privacy-focused

Who is it for?

Information Workers: Researchers, analysts, students who read many articles daily
Content Curators: People who save and organize knowledge
Privacy-Conscious Users: Self-hosting ensures your reading history stays private
Developers: Extensible architecture (hexagonal, multi-agent), MCP server for AI agents

Is it free?

The software is free and open-source (BSD 3-Clause license), but you'll need API keys:

Content extraction: Scrapling (default, free, in-process) or self-hosted Firecrawl (free). Cloud Firecrawl is optional: 500 free credits/month (~500 articles), then $25/month for 5,000 credits
OpenRouter: Pay-per-use ($0.01-0.05 per summary depending on model)
- Alternative: Use free models (Google Gemini 2.0, some DeepSeek R1 providers offer free tier)
YouTube: Free (uses yt-dlp, no API costs)

Estimated Monthly Cost: $10-30 for moderate use (50-100 summaries/month).

See Cost Optimization for ways to minimize costs.

How does it work?

You send a URL, multiple URLs, or forwarded content to the Telegram bot (or call the API).
Content extraction: Multi-provider scraper chain (Scrapling, Firecrawl, Playwright, Crawlee, direct HTML) extracts articles; platform extractors handle X, Threads, Instagram, and YouTube; Telegram-native submissions preserve message/media provenance.
LLM summarization or synthesis: OpenRouter sends extracted content to an LLM (e.g., DeepSeek, Qwen, Kimi).
Structured output: The system returns either a strict summary JSON object or a provenance-aware aggregation bundle result.
Storage: Requests, source items, crawl artifacts, LLM calls, and outputs are stored in SQLite.
Reply: Bot sends formatted results back to Telegram, and the API returns the same workflow through /v1/*.

What makes it different from ChatGPT?

Structured Output: 35+ JSON fields (TLDR, key ideas, topic tags, entities, readability scores) vs free-form text
Persistent Storage: All summaries saved and searchable (semantic + full-text search)
Multi-Interface: Telegram, mobile app, CLI, MCP server access the same data
Self-Hosted: Your data never leaves your server
YouTube Support: Extract and summarize video transcripts
Bundle Synthesis: Compare and combine one or many mixed sources into one aggregation output
Optimized for Reading: Designed specifically for article summarization, not general chat

Installation

What are the system requirements?

Minimum:

Python 3.13+
512 MB RAM (1 GB recommended with ChromaDB)
5 GB disk space (more if storing YouTube videos)
Linux, macOS, or Windows (WSL recommended on Windows)

Optional (for YouTube):

ffmpeg (video/audio merging)

Optional (for semantic search):

ChromaDB server (or use embedded mode)

Can I run it on a Raspberry Pi?

Yes, but with caveats:

Pi 4 (4GB+): Works well, but disable ChromaDB or use CPU-only mode
Pi 3 or older: Too slow, ChromaDB embeddings will struggle
Docker: Recommended for easy deployment on Pi

# Pi-optimized config
CHROMA_DEVICE=cpu
CHROMA_REQUIRED=false  # Or use lightweight embedding model
YOUTUBE_VIDEO_QUALITY=720  # Lower quality for smaller downloads

Does it support ARM (M1/M2 Macs, Raspberry Pi)?

Yes. Some dependencies (chromadb, sentence-transformers) may need compilation:

# macOS M1/M2
brew install cmake pkg-config
pip install -r requirements.txt

# Raspberry Pi (Debian/Raspbian)
sudo apt-get install build-essential python3-dev
pip install -r requirements.txt

Can I run it without Docker?

Yes. Use Python virtual environment:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python bot.py

See tutorials/local-development.md for full guide.

Configuration

Which environment variables are required?

Absolutely required:

API_ID=...              # Telegram API ID (from https://my.telegram.org/apps)
API_HASH=...            # Telegram API hash
BOT_TOKEN=...           # Bot token (from @BotFather)
ALLOWED_USER_IDS=...    # Your Telegram user ID
OPENROUTER_API_KEY=...  # OpenRouter API key

Optional but common:

FIRECRAWL_API_KEY=...   # Only needed for cloud Firecrawl or web search (Scrapling is the free default)

Optional but recommended:

OPENROUTER_MODEL=deepseek/deepseek-v3.2
OPENROUTER_FALLBACK_MODELS=qwen/qwen3-max,moonshotai/kimi-k2.5
DB_PATH=/data/app.db
LOG_LEVEL=INFO

See environment_variables.md for full reference (250+ variables).

How do I find my Telegram user ID?

Message @userinfobot on Telegram
Copy the numeric ID
Add to ALLOWED_USER_IDS environment variable
Restart bot

Can I use OpenAI instead of OpenRouter?

Yes. Set LLM_PROVIDER=openai:

LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o
OPENAI_FALLBACK_MODELS=gpt-4o-mini

Or Anthropic:

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-5-20250929

How do I enable multi-user access?

Add multiple user IDs to ALLOWED_USER_IDS:

ALLOWED_USER_IDS=123456789,987654321,555666777

All users share the same database (no per-user isolation). This is designed for personal use or small teams, not multi-tenant SaaS.

Features

What types of content can it summarize?

Supported:

✅ Web articles (news sites, blogs, documentation)
✅ X/Twitter posts and X article links
✅ Threads posts
✅ Instagram posts, carousels, and reels
✅ YouTube videos (any format: watch, shorts, live, music)
✅ Forwarded Telegram channel posts
✅ Mixed-source aggregation bundles (one or many URLs, plus Telegram-native content in Telegram flows)
✅ PDFs (with embedded image analysis)
✅ Channel digest summaries (scheduled digests of subscribed Telegram channels)
✅ Long-form content (up to 256k tokens with long-context models)

Not Supported:

❌ Paywalled content (WSJ, NYT, Medium members-only)
❌ Sites with CAPTCHA challenges (most get bypassed by Firecrawl proxies)
❌ Videos without transcripts (unless Whisper transcription enabled)

Does it support languages other than English?

Yes. Supports English and Russian out of the box:

Language detection (PREFERRED_LANG=auto by default)
Separate prompts for English (app/prompts/en/) and Russian (app/prompts/ru/)
Russian content gets Russian summary and vice versa

Adding new languages: Copy app/prompts/en/summary_system.txt to new language directory, translate prompt, update app/core/lang.py.

Can I search my summaries?

Yes. Three search modes:

Full-Text Search (FTS5): Fast keyword search

SELECT * FROM topic_search_index WHERE topic_search_index MATCH 'python tutorial';

Semantic Search (ChromaDB): Natural language queries

# Via CLI
python -m app.cli.search --query "machine learning basics"

# Via Telegram
/search machine learning

Hybrid Search: Combines full-text + semantic + reranking

See SPEC.md § Search for details.

What is the summary JSON contract?

Every summary follows a strict schema with 35+ fields:

Core: summary_250 (≤250 chars), summary_1000, tldr, key_ideas
Metadata: title, url, word_count, estimated_reading_time_min
Semantic: topic_tags, entities, semantic_chunks, seo_keywords
Quality: confidence, readability, hallucination_risk

See reference/summary-contract.md for full specification.

Can I export summaries?

Yes. Multiple export formats:

JSON: Via mobile API (GET /v1/summaries)
PDF: Via weasyprint (roadmap: not yet implemented)
Markdown: Via CLI export (roadmap: not yet implemented)
SQLite: Direct database access (data/app.db)

Can I combine multiple links or forwarded posts into one result?

Yes. The Telegram bot exposes /aggregate, and the API exposes POST /v1/aggregations.

Telegram: /aggregate accepts one or more links and can include the current forwarded/attached message context when present.
API: POST /v1/aggregations accepts a bundle of 1-25 URL items.
Output: the result includes per-item extraction status plus one synthesized aggregation payload with source coverage, duplicates, contradictions, and provenance-aware claims.

Does it deduplicate URLs?

Yes. All URLs normalized and hashed (sha256) before processing:

https://example.com/article?utm_source=twitter → deduplicated
http://example.com/article → deduplicated (HTTPS normalized)
Same article reposted won't be processed twice (returns cached summary)

YouTube Support

What YouTube platforms are supported?

All major formats:

✅ Standard videos (youtube.com/watch?v=...)
✅ Shorts (youtube.com/shorts/...)
✅ Live streams (youtube.com/live/...)
✅ Embedded videos (youtube.com/embed/...)
✅ Mobile links (m.youtube.com/..., youtu.be/...)
✅ YouTube Music (music.youtube.com/...)

How does transcript extraction work?

Try youtube-transcript-api (fast, no download)
- Fetches auto-generated or manual captions
- Works for 90%+ of videos
Fallback to yt-dlp (slower, downloads video)
- Downloads video + extracts audio
- Sends audio to Whisper API for transcription (if WHISPER_API_KEY set)

What if a video has no transcript?

Default behavior: Fails with error message.

Workaround: Enable Whisper transcription (requires API key or local Whisper model):

ENABLE_WHISPER_TRANSCRIPTION=true
WHISPER_API_KEY=...  # Or leave empty for local Whisper

How much storage do YouTube downloads use?

Depends on video quality and length:

1080p, 10-minute video: ~200 MB
720p, 10-minute video: ~100 MB
Audio-only (if video not needed): ~10 MB

Storage management:

YOUTUBE_CLEANUP_AFTER_DAYS=7      # Delete after 7 days
YOUTUBE_MAX_STORAGE_GB=10        # Max 10 GB total

Can I disable video download (audio only)?

Not yet, but planned. Current workaround: Set low quality

YOUTUBE_VIDEO_QUALITY=480  # Smaller files

Or disable YouTube entirely:

ENABLE_YOUTUBE=false

Web Search

What is web search enrichment?

Optional feature that queries the web for current context before summarizing:

Extract keywords from article (e.g., "climate change 2025")
Search DuckDuckGo (or Google if API key provided)
Add top 3 results to LLM prompt as additional context
LLM generates summary with up-to-date information

Benefits: Corrects outdated info, adds recent developments, fact-checks claims.

When should I enable it?

Enable for:

News articles (time-sensitive topics)
Research papers (need latest findings)
Tutorial articles (check if still relevant)

Disable for:

Timeless content (classic literature, historical docs)
Privacy-sensitive content (internal docs, private blogs)
Cost sensitivity (adds ~500 tokens per summary)

# Enable
WEB_SEARCH_ENABLED=true
WEB_SEARCH_TIMEOUT_SEC=10

Does it cost extra?

Yes, but minimal:

Web search API: DuckDuckGo is free, Google costs ~$0.005 per query
LLM tokens: Adds ~~500 tokens (~~$0.005-0.01 depending on model)

Total extra cost: ~$0.01 per summary.

Performance

How long does a summary take?

Typical:

Articles: 5-10 seconds (2-3s Firecrawl + 3-5s LLM)
YouTube videos: 10-20 seconds (transcript extraction + LLM)
Long articles (20k+ words): 15-30 seconds (chunking + longer LLM processing)

Factors:

Model speed (DeepSeek fast, GPT-4 slower)
Network latency
Article length
Web search enabled/disabled

Can I make it faster?

Yes. Several optimizations:

Use faster model:

OPENROUTER_MODEL=qwen/qwen3-max  # Faster than DeepSeek

Increase concurrency:
```
MAX_CONCURRENT_CALLS=5  # Default: 4
```

Disable optional features:

WEB_SEARCH_ENABLED=false
SUMMARY_TWO_PASS_ENABLED=false

Reduce content length:

MAX_CONTENT_LENGTH_TOKENS=30000  # Default: 50000

Why is memory usage high?

Common causes:

ChromaDB embeddings (sentence-transformers model): 500 MB - 1 GB
YouTube downloads in memory before writing to disk
LLM response buffering

Solutions:

# Use smaller embedding model
CHROMA_EMBEDDING_MODEL=all-MiniLM-L6-v2  # ~100 MB

# Disable ChromaDB if not using search
CHROMA_REQUIRED=false

# Limit ChromaDB memory
CHROMA_MAX_MEMORY_MB=512

Can I batch-process multiple URLs?

Yes, via CLI:

# From file (one URL per line)
python -m app.cli.summary --accept-multiple --url-file urls.txt

# Output to JSON
python -m app.cli.summary --accept-multiple --url-file urls.txt --json-path summaries.json

Note: Respects rate limits and concurrency settings.

Security

Is my data private?

Yes, if self-hosted:

No data leaves your server (except API calls to Firecrawl/OpenRouter)
API calls redacted: Authorization headers never logged
SQLite database: Stored locally (data/app.db)
No telemetry: No usage analytics sent anywhere

Privacy considerations:

Firecrawl sees your URLs (use trafilatura fallback for sensitive sites)
OpenRouter/OpenAI see article content (use on-premise LLM if needed)
Telegram sees your bot interactions (use Telegram's privacy settings)

Can multiple users access it safely?

Shared-instance access (supported):

Multiple allowed users can use the same deployment through Telegram, the JWT API, the CLI, and request-scoped MCP access.
Aggregation bundle API and MCP operations are scoped to the authenticated user.
This is suitable for personal use or small trusted teams.

What is still not provided:

This is not a fully isolated multi-tenant SaaS deployment.
The app still runs as one shared instance and database.
If you need strict tenant isolation, run separate deployments or add row-level isolation around the remaining shared surfaces.

How are API keys stored?

Environment variables only (.env file):

# .env file (not committed to git)
FIRECRAWL_API_KEY=fc-...
OPENROUTER_API_KEY=sk-or-...
BOT_TOKEN=1234567890:ABCDEF...

Never stored in:

Database
Logs (Authorization headers redacted)
Git repository (.env in .gitignore)

What if someone gets my bot token?

Impact: Attacker can send messages as your bot, but can't:

See your summaries (database access required)
Trigger summarization (ALLOWED_USER_IDS whitelist blocks them)
Access Mobile API (separate JWT authentication)

Response:

Revoke token via @BotFather on Telegram
Generate new token
Update BOT_TOKEN in .env
Restart bot

Integration

Does it have a mobile app?

Yes. Mobile API (FastAPI) provides:

JWT authentication (Telegram login exchange)
Summary fetching (GET /v1/summaries)
Multi-device sync (full + delta modes)
Collection management
Offline-first support

See MOBILE_API_SPEC.md for API reference.

Note: Mobile app UI not included (API only). Build your own client or use Telegram bot.

Can I integrate with other apps?

Yes. Multiple integration options:

REST API (FastAPI): Build custom clients
MCP Server: Expose to Claude Desktop (or any MCP client)
SQLite Database: Direct database access for custom scripts
CLI Tools: Batch processing, search, export

See mcp_server.md for MCP integration.

Does it integrate with Notion/Obsidian/Roam?

Not directly, but you can:

Export to Markdown (CLI tool, roadmap)
Sync via Mobile API (build custom sync script)
Direct Database Access (query SQLite, convert to Markdown)

Can I use it as a Slack bot?

Not out-of-box, but adaptable:

Replace Telegram adapter (app/adapters/telegram/) with Slack adapter
Use hexagonal architecture (core logic unchanged)
Implement Slack OAuth (instead of Telegram access control)

See HEXAGONAL_ARCHITECTURE_QUICKSTART.md for architecture guide.

Cost Optimization

How can I minimize API costs?

Free tier strategies:

Use free models (OpenRouter):

OPENROUTER_MODEL=google/gemini-2.0-flash-001:free
OPENROUTER_FALLBACK_MODELS=deepseek/deepseek-r1:free

Use free content extraction (no Firecrawl costs):
- Scrapling is the default provider (free, in-process, no API key)
- Self-hosted Firecrawl is another free option (FIRECRAWL_SELF_HOSTED_ENABLED=true)
- Cloud Firecrawl is only needed for sites that resist local scraping
```
SCRAPER_ENABLED=true
SCRAPER_SCRAPLING_ENABLED=true
SCRAPER_PROVIDER_ORDER=["scrapling", "direct_html"]  # Skip cloud Firecrawl entirely
```

Cache aggressively:

REDIS_ENABLED=true
REDIS_LLM_TTL_SECONDS=86400  # 24 hours

Disable optional features:

WEB_SEARCH_ENABLED=false
SUMMARY_TWO_PASS_ENABLED=false

What are the cheapest models that work well?

Ranked by cost/quality (as of Feb 2026):

Free tier: google/gemini-2.0-flash-001:free (best free option)
Ultra-cheap: deepseek/deepseek-v3.2 (~$0.01/summary)
Cheap + good: qwen/qwen3-max (~$0.02/summary)
Balanced: moonshotai/kimi-k2.5 (~$0.03/summary, great for long content)

Not recommended (too expensive for this use case):

gpt-4-turbo: ~$0.20/summary
claude-opus-4: ~$0.30/summary

Can I run an on-premise LLM?

Yes, but requires setup:

Run local LLM (Ollama, LM Studio, vLLM):
```
ollama run llama3.2:70b
```

Point bot to local endpoint:

LLM_PROVIDER=openai  # Use OpenAI-compatible API
OPENAI_API_KEY=dummy  # Not needed for local
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_MODEL=llama3.2:70b

Disable self-hosted Firecrawl provider (keep local extraction only):

FIRECRAWL_SELF_HOSTED_ENABLED=false
SCRAPER_PROVIDER_ORDER=["scrapling", "playwright", "crawlee", "direct_html"]

Breaking rename note: legacy scraper vars SCRAPLING_* and SCRAPER_DIRECT_HTTP_ENABLED now fail fast at startup.

Hardware requirements: 70B model needs 40+ GB VRAM (A100, H100, or multiple GPUs).

Is caching worth it?

Yes, if you re-summarize URLs often:

Same URL sent twice → cached summary returned (0 API cost)
Redis cache hit rate: ~30-40% for news aggregators who share links

Not worth it if:

You never re-read articles
Redis adds complexity you don't want

Enable caching:

REDIS_ENABLED=true
REDIS_LLM_TTL_SECONDS=604800  # 7 days

FilesExpand file tree

FAQ.md

Latest commit

History

FAQ.md

File metadata and controls

Frequently Asked Questions (FAQ)

Table of Contents

General

What is Bite-Size Reader?

Who is it for?

Is it free?

How does it work?

What makes it different from ChatGPT?

Installation

What are the system requirements?

Can I run it on a Raspberry Pi?

Does it support ARM (M1/M2 Macs, Raspberry Pi)?

Can I run it without Docker?

Configuration

Which environment variables are required?

How do I find my Telegram user ID?

Can I use OpenAI instead of OpenRouter?

How do I enable multi-user access?

Features

What types of content can it summarize?

Does it support languages other than English?

Can I search my summaries?

What is the summary JSON contract?

Can I export summaries?

Can I combine multiple links or forwarded posts into one result?

Does it deduplicate URLs?

YouTube Support

What YouTube platforms are supported?

How does transcript extraction work?

What if a video has no transcript?

How much storage do YouTube downloads use?

Can I disable video download (audio only)?

Web Search

What is web search enrichment?

When should I enable it?

Does it cost extra?

Performance

How long does a summary take?

Can I make it faster?

Why is memory usage high?

Can I batch-process multiple URLs?

Security

Is my data private?

Can multiple users access it safely?

How are API keys stored?

What if someone gets my bot token?

Integration

Does it have a mobile app?

Can I integrate with other apps?

Does it integrate with Notion/Obsidian/Roam?

Can I use it as a Slack bot?

Cost Optimization

How can I minimize API costs?

What are the cheapest models that work well?

Can I run an on-premise LLM?

Is caching worth it?

Related Documentation