A Plebbit crawler, indexer and UI.
- Crawler: Indexes posts from all known subplebbits and exposes them via a REST API.
- Plebindex: Next.js frontend to search and view indexed posts.
- Features
- Recent Updates
- Project Structure
- Quickstart (with Docker Compose)
- How it Works
- Configuration
- SSL Certificate Configuration
- Development
- API
- Notes
- License
- Comprehensive Reply System: Full reply threading with parent-child relationships and nested reply support
- Advanced Search: Full-text search across posts, replies, authors, and subplebbit addresses with filtering
- Multiple Sort Options: Sort by new, top (score), replies count, or old for both posts and replies
- Reply Control: Toggle to include or exclude replies from search and browse results
- Time-based Filtering: Filter content by hour, day, week, month, year, or all time
- Pagination: Efficient pagination for both posts and replies with customizable limits
- Queue Management: Intelligent subplebbit processing queue with retry logic and error tracking
- Parent Context: Replies show context from their parent posts with author information
- Modern Frontend: Built with Next.js 15 and React 19 for optimal performance
- Comprehensive Testing: Full test suite covering reply threading, search, and API functionality
- Crawls all known subplebbit addresses and indexes their posts into a local SQLite database
- Exposes a REST API (
/api/posts) to fetch indexed posts - Dockerized for easy deployment
.
├── crawler/ # Node.js backend: indexer and REST API
├── plebindex/ # Next.js frontend
├── plebbit-cli/ # Plebbit node daemon and configuration
├── nginx/ # Nginx configuration and SSL setup
├── certbot/ # Let's Encrypt certificate management
├── data/ # Persistent data storage (certificates, nginx config, plebbit-cli db)
├── .env # Environment config (API URLs, server names, node settings)
├── init-letsencrypt.sh # SSL certificate initialization script
├── docker-compose.yml
git clone https://github.com/NiKrause/plebbit-indexer.git
cd plebbit-indexerdocker-compose up --buildThis will:
- Build and start the crawler (backend/indexer) on port
3001 - Build and start the plebindex (frontend) on port
3000 - Build and start the plebbit-cli node (Plebbit daemon) on port
9138, which the crawler connects to via WebSocket
- Frontend: http://localhost:3000
- API: http://localhost:3001/api/posts
- Fetches subplebbit addresses from multiple sources:
- A public JSON file on GitHub
- Dune Analytics query results (executed weekly)
- For each subplebbit:
- Fetches all posts and stores them in a local SQLite database
- Listens for updates and re-indexes as needed
- Maintains a
known_subplebbitstable to track all discovered subplebbits and their sources - Exposes a REST API at
/api/poststo retrieve all indexed posts - Includes content moderation capabilities (optional)
- Features a queue system for processing subplebbits with retry logic and error tracking
The crawler integrates with Dune Analytics to discover new subplebbit communities:
- Weekly Query Execution: Executes a Dune query once a week to refresh the list of
.ethand.solplebbit communities - Daily Results Processing: Fetches and processes the query results once a day to check for new communities
- Duplicate Prevention: Maintains a record of all known subplebbits to avoid processing duplicates
- Configuration:
DUNE_API_KEY: Your Dune Analytics API keyDUNE_QUERY_EXECUTE_INTERVAL_HOURS: Interval for query execution (default: 168 hours/1 week)DUNE_QUERY_FETCH_INTERVAL_HOURS: Interval for fetching results (default: 24 hours/1 day)
- Next.js app that fetches posts from the backend API.
- Displays posts with links to their original subplebbit and author.
- The crawler requires a running
plebbit-clinode reachable via WebSocket (PLEBBIT_WS_URL). When you launch the stack with Docker Compose this service is started automatically and shares its auth-key through the mounteddata/plebbit/auth-keyvolume, so no manual setup is needed. - By default, the crawler/indexing API runs on port
3001and the frontend on port3000. - Environment variables can be set in the
docker-compose.ymlor via.envfiles. - When running on a public node nginx is automatically configured by the domain names (comma separated) in the .env file and a ssl-certificate is generated by letsenrypt after running ./init-letsencrypt
To enable HTTPS with Let's Encrypt certificates:
- Configure your domain(s) in the
.envfile:
SERVER_NAME=example.com,www.example.com # Comma-separated list of domains- Initialize Let's Encrypt certificates:
./init-letsencrypt.shThe initialization process:
- Creates dummy certificates and a nginx.conf for each domain to start nginx
- Deletes dummy certificates
- Requests real certificates from Let's Encrypt
- Creates symbolic links for all domains
- Reloads nginx with the new certificates
The setup uses:
nginx/nginx.conf.template: Template for nginx configurationnginx/docker-entrypoint.d/001-parse-template.sh: Script to generate nginx configs- Docker containers:
nginx: Serves the application and handles SSLcertbot: Manages Let's Encrypt certificates
- Required volume mounts:
./data/certbot/conf:/etc/letsencrypt: Stores certificates./data/certbot/www:/var/www/certbot: Webroot for Let's Encrypt validation./nginx/nginx.conf.template:/etc/nginx/nginx.conf.template: Nginx template./nginx/docker-entrypoint.d/:/docker-entrypoint.d/: Entrypoint scripts
Certificates will auto-renew every 12 days controlled by certbot container.
cd crawler
npm install
PLEBBIT_WS_URL=ws://localhost:9138/AUTH-KEY_FROM_DATA_PLEBBIT_DIRECTORY npm run crawlercd plebindex
npm install
npm run devThe project includes a zero downtime deployment system using dual-instance architecture:
- Each service (crawler and plebindex) runs two separate instances (01 and 02)
- The included
deploy.shscript intelligently detects which components need updating - When deploying changes, it builds and starts the inactive instance
- Once the new instance is running, it updates the Nginx configuration to route traffic to the new instance
- This blue/green deployment approach ensures continuous service availability during updates
- To deploy, simply run
./deploy.shwhich will pull the latest changes and handle the upgrade process
-
GET /api/posts
Returns all indexed posts as JSON.Supports sorting and time filtering:
?sort=<sort_type>- Sort by:new(default),top,replies, orold?t=<time_filter>- Filter by time:all(default),hour,day,week,month,year?page=<page_number>- For pagination (default: 1)?limit=<count>- Number of results per page (default: 20, set to 0 for all posts)
-
GET /api/posts/search?q=<search_term>
Search posts by title, content, author name, or subplebbit address. Returns matching posts as JSON.Also supports the same sorting and filtering options:
?sort=<sort_type>- Sort by:new(default),top,replies, orold?t=<time_filter>- Filter by time:all(default),hour,day,week,month,year?page=<page_number>- For pagination (default: 1)?limit=<count>- Number of results per page (default: 20, set to 0 for all posts)
The response includes pagination metadata:
{ "posts": [...], "pagination": { "total": 123, // Total number of matching posts "page": 1, // Current page number "limit": 20, // Posts per page "pages": 7 // Total number of pages }, "filters": { "sort": "new", // Current sort method "timeFilter": "all" // Current time filter } } -
GET /api/posts/:id
Returns a specific post by its ID.Response format:
{ "post": { "id": "QmAbC123...", "title": "Post title", "content": "Post content", "subplebbitAddress": "example.eth", "authorAddress": "12D3KooW...", "authorDisplayName": "Username", "timestamp": 1234567890, "upvoteCount": 10, "downvoteCount": 2, "replyCount": 5 } }Returns 404 if the post with the given ID doesn't exist.
-
GET /api/replies/:parentCid
Returns replies to a specific post or comment.Supports pagination and sorting:
?sort=<sort_type>- Sort by:new(default),top, orold?page=<page_number>- For pagination (default: 1)?limit=<count>- Number of results per page (default: 20)
The response includes pagination metadata:
{ "replies": [...], "pagination": { "total": 123, // Total number of replies "page": 1, // Current page number "limit": 20, // Replies per page "pages": 7 // Total number of pages }, "filters": { "sort": "new" // Current sort method } }
The following endpoints require authentication using either:
-
Bearer token in the Authorization header:
Authorization: Bearer <token> -
Auth key as a query parameter:
?auth=<auth_key> -
GET /api/queue
Returns the current subplebbit queue status. Optional query parameter:?status=<status>to filter by status. -
GET /api/queue/stats
Returns statistics about the queue including counts by status. -
GET /api/queue/errors
Returns detailed error information for failed subplebbit processing attempts. -
POST /api/queue/add
Add a new subplebbit address to the queue. Body:{ "address": "<subplebbit_address>" } -
POST /api/queue/retry
Retry processing a failed subplebbit. Body:{ "address": "<subplebbit_address>" }Example:curl -X POST "https://plebscan.org/api/queue/retry?auth=xyz" \ -H "Content-Type: application/json" \ -d '{"address": "leblore.eth"}' -
POST /api/queue/refresh
Refresh the subplebbit queue with new addresses. -
POST /api/queue/process
Process items from the queue. Optional body:{ "limit": <number> }to specify batch size (default: 5)
- The crawler will automatically re-index if restarted but skips each already indexed post
- The index database is persisted inside crawler container under /app/data which is mounted to the host computer under /crawler/data
- Make sure a plebbit-cli node is running and accessible to the crawler with respected auth-key
MIT