We provide a Neo4j crawls the Moltbook network (agents, submolts, posts, comments, and feed snapshots) and stores it as a temporal graph in Neo4j. Read the graph schema here, the database specific example queries here, and database maintaining script is available here. Full dataset is available in HuggingFace and research paper is available in arXiv.
It supports:
- Smoke test (≈30s) to validate the pipeline ensuring the API + Neo4j writes end-to-end
- Full crawl (one-time historical ingest up to “now”)
- Temporal evolution via
first_seen_at,last_seen_at,ended_at, and crawl/feed snapshots
.
├── docker-compose.yml # Neo4j + crawler services
├── credentials.json # (local) creds (keep secret)
├── graph-schema.md # readme about the Neo4j graph schema
├── database.md # readme about the Neo4j database
├── db-maintaining.md # readme about backfilling to maintaining Neo4j database
├── autorun.sh # autorun script to the full crawler
├── images/ # contain example images of the Neo4j graph
├── moltbook-registration
│ ├── bot_register.md # notes / registration info
│ └── example_query_response.md # making post or querying regarding post
└── crawler/
├── Dockerfile # crawler container image
├── requirements.txt # python deps
├── moltbook_client.py # Moltbook API client (rate limit + retries)
├── neo4j_store.py # Neo4j schema + upsert logic
├── html_scrape.py # UI-only scrape (similar agents + owner X)
├── cypher/
│ └── schema.cypher # constraints + indexes
└── scripts/
├── backfill/
│ ├── comments.py # backfill script to captures all the comments contents
│ ├── post_comments.py # backfill script to captures all the posts and its contents
│ ├── is_deleted.py # backfill script to captures if posts/comments are deleted
│ ├── is_spam.py # backfill script to captures if posts/comments are marked as spam
│ └── x_accounts.py # backfill script to captures all info about agent such as X handle
├── init_db.py # applies schema.cypher
├── smoke_test.py # 30s end-to-end validation
└── full_crawl.py # one-time full ingest
- Docker + Docker Compose
- A Moltbook API key (
MOLTBOOK_API_KEY) - Ports open (locally):
- Neo4j Browser:
7474 - Bolt:
7687
- Neo4j Browser:
Copy .env.example file in the repo root and rename it as .env (same directory as docker-compose.yml):
# Update API Key
MOLTBOOK_API_KEY=YOUR_KEY_HERENotes
REQUESTS_PER_MINUTEcontrols client-side throttling.FETCH_POST_DETAILS=1calls/posts/:idfor each post (slower).SCRAPE_AGENT_HTML=1enables UI-only scraping (slower / brittle).ENRICH_SUBMOLTS=1can be very expensive for large numbers of submolts.
docker compose build crawler
docker compose up -d neo4jNeo4j Browser:
Login:
- user:
neo4j - password:
NEO4J_PASSWORD
Apply crawler/cypher/schema.cypher:
docker compose run --rm crawler python -m scripts.init_dbVerify in Neo4j Browser:
SHOW CONSTRAINTS;
SHOW INDEXES;Run:
docker compose run --rm crawler python -m scripts.smoke_testSmoke test validates:
- Moltbook API connectivity
- Neo4j connectivity/writes
- Ingestion of at least
Agent,Post,Submolt(andCommentif available) - Relationships:
AUTHORED,IN_SUBMOLT,ON_POST
Verify counts:
MATCH (n) RETURN labels(n) AS label, count(*) AS cnt ORDER BY cnt DESC;
MATCH ()-[r]->() RETURN type(r) AS rel, count(*) AS cnt ORDER BY cnt DESC;A full crawl ingests “everything discoverable” up to the crawl cutoff (UTC now).
docker compose run --rm \
-e USER_AGENT="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" \
-e DEBUG_HTTP=1 \
-e REQUESTS_PER_MINUTE=60 \
-e CRAWL_COMMENTS=1 \
-e COMMENTS_LIMIT_PER_POST=1000 \
-e FETCH_AGENT_PROFILES=1 \
-e PROFILE_LIMIT=100000 \
-e FETCH_POST_DETAILS=1 \
-e SCRAPE_AGENT_HTML=0 \
-e SUBMOLT_TOP_LIMIT=100000 \
-e MODERATOR_SUBMOLTS_LIMIT=100000 \
-e ENRICH_SUBMOLTS=1 \
-e ENRICH_SUBMOLTS_LIMIT=100000 \
crawler python -m scripts.full_crawl
docker compose run --rm \
-e CRAWL_COMMENTS=0 \
crawler python -m scripts.full_crawlThe crawl writes a :Crawl node with checkpoints:
MATCH (cr:Crawl)
RETURN cr.id, cr.mode, cr.started_at, cr.submolts_offset, cr.posts_offset, cr.last_updated_at
ORDER BY cr.started_at DESC
LIMIT 5;- Moltbook endpoints may rate-limit or occasionally return 502/503/504; the client includes retries + exponential backoff.
- HTML scraping is brittle by nature (UI changes may break parsing). Use it only if you need Similar/Owner-X edges.
- Full enrichment of all submolts/posts can be expensive; prefer staged enrichment.
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
If you use this crawler in your work, please cite the paper.
@article{mukherjee2026moltgraph,
title={MoltGraph: A Longitudinal Temporal Graph Dataset of Moltbook for Coordinated-Agent Detection},
author={XXXX},
journal={arXiv preprint arXiv:2603.00646},
year={2026}
}If you use this crawler in your work, please cite it.
@software{mukherjee_moltbook_neo4j_crawler_2026,
author = {Mukherjee, Kunal},
title = {MoltGraph: Moltbook Social Network Graph},
year = {2026},
month = {2},
version = {0.1},
note = {GitHub repository},
url = {\url{https://github.com/kunmukh/moltgraph}}
}Acknowledging the efforts of @giordano-demarzo for creating moltbook-api-crawler.
MIT.

