plebbit-indexer

A Plebbit crawler, indexer and UI.

Crawler: Indexes posts from all known subplebbits and exposes them via a REST API.
Plebindex: Next.js frontend to search and view indexed posts.

Features

Comprehensive Reply System: Full reply threading with parent-child relationships and nested reply support
Advanced Search: Full-text search across posts, replies, authors, and subplebbit addresses with filtering
Multiple Sort Options: Sort by new, top (score), replies count, or old for both posts and replies
Reply Control: Toggle to include or exclude replies from search and browse results
Time-based Filtering: Filter content by hour, day, week, month, year, or all time
Pagination: Efficient pagination for both posts and replies with customizable limits
Queue Management: Intelligent subplebbit processing queue with retry logic and error tracking
Parent Context: Replies show context from their parent posts with author information
Modern Frontend: Built with Next.js 15 and React 19 for optimal performance
Comprehensive Testing: Full test suite covering reply threading, search, and API functionality
Crawls all known subplebbit addresses and indexes their posts into a local SQLite database
Exposes a REST API (/api/posts) to fetch indexed posts
Dockerized for easy deployment

Project Structure

.
├── crawler/      # Node.js backend: indexer and REST API
├── plebindex/    # Next.js frontend
├── plebbit-cli/  # Plebbit node daemon and configuration
├── nginx/        # Nginx configuration and SSL setup
├── certbot/      # Let's Encrypt certificate management
├── data/         # Persistent data storage (certificates, nginx config, plebbit-cli db)
├── .env          # Environment config (API URLs, server names, node settings)
├── init-letsencrypt.sh  # SSL certificate initialization script
├── docker-compose.yml

Quickstart (with Docker Compose)

Prerequisites

1. Clone the repository

git clone https://github.com/NiKrause/plebbit-indexer.git
cd plebbit-indexer

2. Start all services

docker-compose up --build

This will:

Build and start the crawler (backend/indexer) on port 3001
Build and start the plebindex (frontend) on port 3000
Build and start the plebbit-cli node (Plebbit daemon) on port 9138, which the crawler connects to via WebSocket

3. Access the app

Frontend: http://localhost:3000
API: http://localhost:3001/api/posts

How it Works

Crawler (Backend)

Fetches subplebbit addresses from multiple sources:
- A public JSON file on GitHub
- Dune Analytics query results (executed weekly)
For each subplebbit:
- Fetches all posts and stores them in a local SQLite database
- Listens for updates and re-indexes as needed
Maintains a known_subplebbits table to track all discovered subplebbits and their sources
Exposes a REST API at /api/posts to retrieve all indexed posts
Includes content moderation capabilities (optional)
Features a queue system for processing subplebbits with retry logic and error tracking

Dune Analytics Integration

The crawler integrates with Dune Analytics to discover new subplebbit communities:

Weekly Query Execution: Executes a Dune query once a week to refresh the list of .eth and .sol plebbit communities
Daily Results Processing: Fetches and processes the query results once a day to check for new communities
Duplicate Prevention: Maintains a record of all known subplebbits to avoid processing duplicates
Configuration:
- DUNE_API_KEY: Your Dune Analytics API key
- DUNE_QUERY_EXECUTE_INTERVAL_HOURS: Interval for query execution (default: 168 hours/1 week)
- DUNE_QUERY_FETCH_INTERVAL_HOURS: Interval for fetching results (default: 24 hours/1 day)

Plebindex (Frontend)

Next.js app that fetches posts from the backend API.
Displays posts with links to their original subplebbit and author.

Configuration

The crawler requires a running plebbit-cli node reachable via WebSocket (PLEBBIT_WS_URL). When you launch the stack with Docker Compose this service is started automatically and shares its auth-key through the mounted data/plebbit/auth-key volume, so no manual setup is needed.
By default, the crawler/indexing API runs on port 3001 and the frontend on port 3000.
Environment variables can be set in the docker-compose.yml or via .env files.
When running on a public node nginx is automatically configured by the domain names (comma separated) in the .env file and a ssl-certificate is generated by letsenrypt after running ./init-letsencrypt

SSL Certificate Configuration

To enable HTTPS with Let's Encrypt certificates:

Configure your domain(s) in the .env file:

SERVER_NAME=example.com,www.example.com  # Comma-separated list of domains

Initialize Let's Encrypt certificates:

./init-letsencrypt.sh

The initialization process:

Creates dummy certificates and a nginx.conf for each domain to start nginx
Deletes dummy certificates
Requests real certificates from Let's Encrypt
Creates symbolic links for all domains
Reloads nginx with the new certificates

The setup uses:

nginx/nginx.conf.template: Template for nginx configuration
nginx/docker-entrypoint.d/001-parse-template.sh: Script to generate nginx configs
Docker containers:
- nginx: Serves the application and handles SSL
- certbot: Manages Let's Encrypt certificates
Required volume mounts:
- ./data/certbot/conf:/etc/letsencrypt: Stores certificates
- ./data/certbot/www:/var/www/certbot: Webroot for Let's Encrypt validation
- ./nginx/nginx.conf.template:/etc/nginx/nginx.conf.template: Nginx template
- ./nginx/docker-entrypoint.d/:/docker-entrypoint.d/: Entrypoint scripts

Certificates will auto-renew every 12 days controlled by certbot container.

Development

Run Crawler Locally

cd crawler
npm install
PLEBBIT_WS_URL=ws://localhost:9138/AUTH-KEY_FROM_DATA_PLEBBIT_DIRECTORY npm run crawler

Run Frontend Locally

cd plebindex
npm install
npm run dev

Zero Downtime Deployment

The project includes a zero downtime deployment system using dual-instance architecture:

Each service (crawler and plebindex) runs two separate instances (01 and 02)
The included deploy.sh script intelligently detects which components need updating
When deploying changes, it builds and starts the inactive instance
Once the new instance is running, it updates the Nginx configuration to route traffic to the new instance
This blue/green deployment approach ensures continuous service availability during updates
To deploy, simply run ./deploy.sh which will pull the latest changes and handle the upgrade process

API

Public Endpoints

GET /api/posts
Returns all indexed posts as JSON.

Supports sorting and time filtering:
- ?sort=<sort_type> - Sort by: new (default), top, replies, or old
- ?t=<time_filter> - Filter by time: all (default), hour, day, week, month, year
- ?page=<page_number> - For pagination (default: 1)
- ?limit=<count> - Number of results per page (default: 20, set to 0 for all posts)
GET /api/posts/search?q=<search_term>
Search posts by title, content, author name, or subplebbit address. Returns matching posts as JSON.

Also supports the same sorting and filtering options:
- ?sort=<sort_type> - Sort by: new (default), top, replies, or old
- ?t=<time_filter> - Filter by time: all (default), hour, day, week, month, year
- ?page=<page_number> - For pagination (default: 1)
- ?limit=<count> - Number of results per page (default: 20, set to 0 for all posts)
The response includes pagination metadata:
```
{
  "posts": [...],
  "pagination": {
    "total": 123,       // Total number of matching posts
    "page": 1,          // Current page number
    "limit": 20,        // Posts per page
    "pages": 7          // Total number of pages
  },
  "filters": {
    "sort": "new",      // Current sort method
    "timeFilter": "all" // Current time filter
  }
}
```

GET /api/posts/:id
Returns a specific post by its ID.

Response format:

{
  "post": {
    "id": "QmAbC123...",
    "title": "Post title",
    "content": "Post content",
    "subplebbitAddress": "example.eth",
    "authorAddress": "12D3KooW...",
    "authorDisplayName": "Username",
    "timestamp": 1234567890,
    "upvoteCount": 10,
    "downvoteCount": 2,
    "replyCount": 5
  }
}

Returns 404 if the post with the given ID doesn't exist.

GET /api/replies/:parentCid
Returns replies to a specific post or comment.

Supports pagination and sorting:

?sort=<sort_type> - Sort by: new (default), top, or old
?page=<page_number> - For pagination (default: 1)
?limit=<count> - Number of results per page (default: 20)

The response includes pagination metadata:

{
  "replies": [...],
  "pagination": {
    "total": 123,       // Total number of replies
    "page": 1,          // Current page number
    "limit": 20,        // Replies per page
    "pages": 7          // Total number of pages
  },
  "filters": {
    "sort": "new"       // Current sort method
  }
}

Protected Endpoints

The following endpoints require authentication using either:

Bearer token in the Authorization header: Authorization: Bearer <token>
Auth key as a query parameter: ?auth=<auth_key>
GET /api/queue
Returns the current subplebbit queue status. Optional query parameter: ?status=<status> to filter by status.
GET /api/queue/stats
Returns statistics about the queue including counts by status.
GET /api/queue/errors
Returns detailed error information for failed subplebbit processing attempts.
POST /api/queue/add
Add a new subplebbit address to the queue. Body: { "address": "<subplebbit_address>" }
POST /api/queue/retry
Retry processing a failed subplebbit. Body: { "address": "<subplebbit_address>" } Example: curl -X POST "https://plebscan.org/api/queue/retry?auth=xyz" \ -H "Content-Type: application/json" \ -d '{"address": "leblore.eth"}'
POST /api/queue/refresh
Refresh the subplebbit queue with new addresses.
POST /api/queue/process
Process items from the queue. Optional body: { "limit": <number> } to specify batch size (default: 5)

Notes

The crawler will automatically re-index if restarted but skips each already indexed post
The index database is persisted inside crawler container under /app/data which is mounted to the host computer under /crawler/data
Make sure a plebbit-cli node is running and accessible to the crawler with respected auth-key

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
.github		.github
contrib/media		contrib/media
crawler		crawler
nginx		nginx
plebbit-cli		plebbit-cli
plebindex		plebindex
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deploy.sh		deploy.sh
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
init-letsencrypt.sh		init-letsencrypt.sh
toggle_plebindex.sh		toggle_plebindex.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

plebbit-indexer

Table of Contents

Features

Project Structure

Quickstart (with Docker Compose)

Prerequisites

1. Clone the repository

2. Start all services

3. Access the app

How it Works

Crawler (Backend)

Dune Analytics Integration

Plebindex (Frontend)

Configuration

SSL Certificate Configuration

Development

Run Crawler Locally

Run Frontend Locally

Zero Downtime Deployment

API

Public Endpoints

Protected Endpoints

Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

plebbit-indexer

Table of Contents

Features

Project Structure

Quickstart (with Docker Compose)

Prerequisites

1. Clone the repository

2. Start all services

3. Access the app

How it Works

Crawler (Backend)

Dune Analytics Integration

Plebindex (Frontend)

Configuration

SSL Certificate Configuration

Development

Run Crawler Locally

Run Frontend Locally

Zero Downtime Deployment

API

Public Endpoints

Protected Endpoints

Notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages