MoltGraph: Moltbook Social Network Graph

We provide a Neo4j crawls the Moltbook network (agents, submolts, posts, comments, and feed snapshots) and stores it as a temporal graph in Neo4j. Read the graph schema here, the database specific example queries here, and database maintaining script is available here. Full dataset is available in HuggingFace and research paper is available in arXiv.

It supports:

Smoke test (≈30s) to validate the pipeline ensuring the API + Neo4j writes end-to-end
Full crawl (one-time historical ingest up to “now”)
Temporal evolution via first_seen_at, last_seen_at, ended_at, and crawl/feed snapshots

Example Graphs

Repo Layout

.
├── docker-compose.yml              # Neo4j + crawler services
├── credentials.json                # (local) creds (keep secret)
├── graph-schema.md                 # readme about the Neo4j graph schema
├── database.md                     # readme about the Neo4j database
├── db-maintaining.md               # readme about backfilling to maintaining Neo4j database
├── autorun.sh                      # autorun script to the full crawler
├── images/                         # contain example images of the Neo4j graph
├── moltbook-registration
│   ├── bot_register.md             # notes / registration info
│   └── example_query_response.md   # making post or querying regarding post
└── crawler/
    ├── Dockerfile                  # crawler container image
    ├── requirements.txt            # python deps
    ├── moltbook_client.py          # Moltbook API client (rate limit + retries)
    ├── neo4j_store.py              # Neo4j schema + upsert logic
    ├── html_scrape.py              # UI-only scrape (similar agents + owner X)
    ├── cypher/
    │   └── schema.cypher           # constraints + indexes
    └── scripts/
        ├── backfill/
        │  ├── comments.py          # backfill script to captures all the comments contents
        │  ├── post_comments.py     # backfill script to captures all the posts and its contents
        │  ├── is_deleted.py        # backfill script to captures if posts/comments are deleted
        │  ├── is_spam.py           # backfill script to captures if posts/comments are marked as spam
        │  └── x_accounts.py        # backfill script to captures all info about agent such as X handle
        ├── init_db.py              # applies schema.cypher
        ├── smoke_test.py           # 30s end-to-end validation
        └── full_crawl.py           # one-time full ingest

Requirements

Docker + Docker Compose
A Moltbook API key (MOLTBOOK_API_KEY)
Ports open (locally):
- Neo4j Browser: 7474
- Bolt: 7687

Setup

1) Create `.env` (repo root)

Copy .env.example file in the repo root and rename it as .env (same directory as docker-compose.yml):

# Update API Key
MOLTBOOK_API_KEY=YOUR_KEY_HERE

Notes

REQUESTS_PER_MINUTE controls client-side throttling.
FETCH_POST_DETAILS=1 calls /posts/:id for each post (slower).
SCRAPE_AGENT_HTML=1 enables UI-only scraping (slower / brittle).
ENRICH_SUBMOLTS=1 can be very expensive for large numbers of submolts.

Run Neo4j

docker compose build crawler
docker compose up -d neo4j

Neo4j Browser:

http://localhost:7474

Login:

user: neo4j
password: NEO4J_PASSWORD

Initialize Schema (constraints + indexes)

Apply crawler/cypher/schema.cypher:

docker compose run --rm crawler python -m scripts.init_db

Verify in Neo4j Browser:

SHOW CONSTRAINTS;
SHOW INDEXES;

Smoke Test (≈30 seconds)

Run:

docker compose run --rm crawler python -m scripts.smoke_test

Smoke test validates:

Moltbook API connectivity
Neo4j connectivity/writes
Ingestion of at least Agent, Post, Submolt (and Comment if available)
Relationships: AUTHORED, IN_SUBMOLT, ON_POST

Verify counts:

MATCH (n) RETURN labels(n) AS label, count(*) AS cnt ORDER BY cnt DESC;
MATCH ()-[r]->() RETURN type(r) AS rel, count(*) AS cnt ORDER BY cnt DESC;

Full Crawl (One-Time)

A full crawl ingests “everything discoverable” up to the crawl cutoff (UTC now).

Recommended run

docker compose run --rm \
  -e USER_AGENT="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" \
  -e DEBUG_HTTP=1 \
  -e REQUESTS_PER_MINUTE=60  \
  -e CRAWL_COMMENTS=1 \
  -e COMMENTS_LIMIT_PER_POST=1000 \
  -e FETCH_AGENT_PROFILES=1  \
  -e PROFILE_LIMIT=100000 \
  -e FETCH_POST_DETAILS=1 \
  -e SCRAPE_AGENT_HTML=0  \
  -e SUBMOLT_TOP_LIMIT=100000 \
  -e MODERATOR_SUBMOLTS_LIMIT=100000 \
  -e ENRICH_SUBMOLTS=1 \
  -e ENRICH_SUBMOLTS_LIMIT=100000 \
  crawler python -m scripts.full_crawl

Faster first full crawl (no comments)

docker compose run --rm \
  -e CRAWL_COMMENTS=0 \
  crawler python -m scripts.full_crawl

Track progress while running

The crawl writes a :Crawl node with checkpoints:

MATCH (cr:Crawl)
RETURN cr.id, cr.mode, cr.started_at, cr.submolts_offset, cr.posts_offset, cr.last_updated_at
ORDER BY cr.started_at DESC
LIMIT 5;

Notes / Caveats

Moltbook endpoints may rate-limit or occasionally return 502/503/504; the client includes retries + exponential backoff.
HTML scraping is brittle by nature (UI changes may break parsing). Use it only if you need Similar/Owner-X edges.
Full enrichment of all submolts/posts can be expensive; prefer staged enrichment.

Contributing

Fork the repository
Create your feature branch
Commit your changes
Push to the branch
Create a Pull Request

Cite Paper

If you use this crawler in your work, please cite the paper.

@article{mukherjee2026moltgraph,
  title={MoltGraph: A Longitudinal Temporal Graph Dataset of Moltbook for Coordinated-Agent Detection},
  author={XXXX},
  journal={arXiv preprint arXiv:2603.00646},
  year={2026}
}

Cite This Repo

If you use this crawler in your work, please cite it.

@software{mukherjee_moltbook_neo4j_crawler_2026,
  author       = {Mukherjee, Kunal},
  title        = {MoltGraph: Moltbook Social Network Graph},
  year         = {2026},
  month        = {2},
  version      = {0.1},
  note         = {GitHub repository},
  url          = {\url{https://github.com/kunmukh/moltgraph}}
}

Acknowledgment

Acknowledging the efforts of @giordano-demarzo for creating moltbook-api-crawler.

License

MIT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoltGraph: Moltbook Social Network Graph

Example Graphs

Repo Layout

Requirements

Setup

1) Create `.env` (repo root)

Run Neo4j

Initialize Schema (constraints + indexes)

Smoke Test (≈30 seconds)

Full Crawl (One-Time)

Recommended run

Faster first full crawl (no comments)

Track progress while running

Notes / Caveats

Contributing

Cite Paper

Cite This Repo

Acknowledgment

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
crawler		crawler
images		images
moltbook-registration		moltbook-registration
.env.example		.env.example
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
autorun.sh		autorun.sh
credentials.json.example		credentials.json.example
database.md		database.md
db-maintaining.md		db-maintaining.md
docker-compose.yml		docker-compose.yml
graph-schema.md		graph-schema.md

Folders and files

Latest commit

History

Repository files navigation

MoltGraph: Moltbook Social Network Graph

Example Graphs

Repo Layout

Requirements

Setup

1) Create .env (repo root)

Run Neo4j

Initialize Schema (constraints + indexes)

Smoke Test (≈30 seconds)

Full Crawl (One-Time)

Recommended run

Faster first full crawl (no comments)

Track progress while running

Notes / Caveats

Contributing

Cite Paper

Cite This Repo

Acknowledgment

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1) Create `.env` (repo root)

Packages