🧪 Fun fact: This bot is proudly hosted on my old Android phone using Termux.
If it ever goes offline, the phone probably needed charging 🔋😄
A production-grade, job-based Telegram bot that scrapes web novels chapter-by-chapter, generates professional PDFs, and delivers them directly to users with real-time progress tracking and automatic fault recovery.
This bot provides a seamless experience for downloading web novels as properly formatted PDFs. Built with reliability and scalability in mind, it handles long-running scraping operations while providing users with real-time feedback through an intuitive Telegram interface.
- 🤖 Interactive Telegram Interface - Inline buttons and commands for effortless navigation
- 📖 Chapter-wise Scraping - Efficient, granular content retrieval
- 📄 High-Quality PDF Generation - Professional formatting and layout
- 📊 Real-time Progress Tracking - Visual progress bars via
/statuscommand - 🔁 Intelligent Retry Logic - Automatic recovery from network failures
- ♻️ Crash-safe Architecture - Jobs survive bot restarts
- 🗂️ Multi-user Support - Concurrent job processing per user
- 🧹 Automatic Cleanup - Self-managing file system
- 📦 Zero-database Design - File-based persistence for simplicity
┌─────────────────┐
│ Telegram Bot │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Job Creation │
│ (novel_flow.py) │
└────────┬────────┘
│
▼
┌──────────────────────┐
│ Background Scraper │
│ (subprocess) │
│ │
│ • Scrapes chapters │
│ • Updates progress │
│ • Generates PDF │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Auto-Send Worker │
│ │
│ • Detects new PDFs │
│ • Sends to users │
│ • Cleanup files │
└──────────────────────┘
Design Principles:
- Restart-safe operations
- Database-free architecture
- Event-driven PDF delivery
- Comprehensive error handling
webnovel_pdf_bot/
│
├── main.py # Application entry point
│
├── bot/
│ ├── bot.py # Telegram bot initialization
│ ├── handlers.py # Command and message handlers
│ ├── state.py # User state management
│ └── auto_send.py # Automated PDF delivery service
│
├── scraper/
│ └── chapter_scraper.py # Chapter scraping and PDF generation
│
├── registry/
│ └── novel_registry.py # Novel catalog management
│
├── flow/
│ └── novel_flow.py # Job orchestration layer
│
├── config/
│ └── settings.py # Configuration and environment loading
│
├── utils/
│ ├── logger.py # Centralized logging
│ └── validator.py # Input validation utilities
│
├── jobs/ # Job state tracking (JSON)
├── outputs/ # Generated PDF storage
├── backups/ # Optional PDF archiving
│
├── requirements.txt # Python dependencies
├── .env # Environment configuration
└── README.md # Project documentation
- Python 3.8 or higher
- Telegram Bot Token (obtain from @BotFather)
- pip package manager
-
Clone the repository
git clone https://github.com/yourusername/webnovel-pdf-bot.git cd webnovel-pdf-bot -
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Configure environment variables
Create a
.envfile in the project root:BOT_TOKEN=your_telegram_bot_token_here OUTPUT_DIR=outputs JOB_DIR=jobs CHECK_OUTPUT_INTERVAL=5
⚠️ Security Note: Never commit.envto version control. Add it to.gitignore.
-
Launch the main bot process
python main.py
This initializes:
- Telegram polling service
- Job handler
- Message router
-
Start the auto-send worker (in a separate terminal)
python -m bot.auto_send
This worker:
- Monitors for completed PDFs
- Delivers files to users
- Performs cleanup operations
- Recovers unsent PDFs on restart
Begin interaction and access the novel selection menu.
- Select a novel from the inline menu
- Enter start chapter number
- Enter end chapter number
- Job launches in background
Monitor progress of active jobs.
Example output for running job:
📖 My Werewolf System
Job: 1768572917
[██████░░░░] 63%
Chapter 1520 / 1687
Status: ⏳ running
Example output for failed job:
📖 My Werewolf System
Job: 1768572917
Status: ❌ failed
Error: Chapter text not found
Jobs progress through the following states:
running- Actively scraping and generating PDFcompleted- Successfully finished, PDF readyfailed- Encountered unrecoverable error
Each job is tracked via jobs/<job_id>.json:
{
"job_id": "1768572917",
"chat_id": "7511978276",
"novel": "My Werewolf System",
"start": 1,
"end": 25,
"current": 14,
"status": "running"
}The system is designed for reliability:
| Scenario | Behavior |
|---|---|
| Scraper crashes | Job marked as failed |
| Bot restarts | Active jobs remain queryable |
| Auto-send restarts | Unsent PDFs automatically delivered |
| Network failures | Automatic retry with exponential backoff |
| Partial job files | Safely ignored, no corruption |
No job progress is lost during failures or restarts.
-
Check job states
ls jobs/ cat jobs/<job_id>.json
-
Verify PDF generation
ls outputs/
-
Review logs
- Look for
[JOB <id>]entries - Check error messages and stack traces
- Look for
-
**Monitor active