Skip to content

mjacobs/serverless-memex

Repository files navigation

serverless-memex

A serverless Model Context Protocol server for a personal AI "second brain", running entirely on Cloudflare's edge in roughly 1,000 lines of TypeScript.

Built on Cloudflare Workers + D1 + Vectorize + Workers AI — no containers, no dedicated database, no cold starts to manage. AI agents (Claude Code, Codex, custom clients) capture raw thoughts and retrieve them semantically through 5 MCP tools.

What it is

A capture-first knowledge store. Drop in markdown — voice memos transcribed from your phone, web clips, CLI snippets, half-formed ideas from an agent session — and the Worker:

  1. Deduplicates by SHA-256 of content (re-capture is a no-op).
  2. Chunks paragraph-aware at ~512 tokens with ~64-token overlap.
  3. Embeds via Workers AI (@cf/baai/bge-base-en-v1.5, 768-dim).
  4. Enriches in parallel via Llama 3 — a one-line summary + 3-7 kebab-case tags.
  5. Writes to D1, upserts to Vectorize.

Retrieval is semantic_search(query, top_k, tags?) → top-K vector hits hydrated back to D1 → structured results with full document context.

Design stance: captures-only

Memex is the capture-side surface — raw, in-the-moment thoughts. It is not a mirror of your Obsidian vault or your synthesized long-form notes. The vault is the canonical long-term store; memex is what feeds it.

This split is load-bearing, not incidental:

  • Search results stay clean. Mixing raw captures with edited vault notes pollutes retrieval — every query starts returning duplicates ("here's the raw thought and the polished version").
  • The enrichment vocabulary converges. Llama 3 sees only capture-side text, so tags converge on the capture corpus's own ontology instead of drifting toward whatever ad-hoc tagging exists in the vault.
  • The trust boundary is simple. One source of truth for the AI, with a clear human-in-the-loop promotion step (manual copy to vault) for anything load-bearing.

See DESIGN.md §7.4 for the full rationale.

Architecture

                                 ┌─────────────────────────────┐
        MCP / REST request       │   Cloudflare Worker         │
        ────────────────────▶    │   (V8 isolate, no warmup)   │
        capture_thought          └──┬──────────┬──────────┬────┘
        semantic_search             │          │          │
        get_thought                 ▼          ▼          ▼
        list_recent              D1        Vectorize   Workers AI
        delete_thought          (docs +   (768-dim     (bge-base +
                                chunks)   cosine)      llama-3.1-8b)

The same Worker exposes:

  • /mcp — Streamable HTTP MCP server with 5 tools.
  • /capture, /search, /thoughts, /thought/:id — small REST surface for non-agent ingest (mobile capture apps, CI/CD, backup walks).

Authentication is Cloudflare Access with service tokens — no inbound ports, no shared bearer secret. See docs/access-setup.md.

MCP tools

Tool Inputs Behavior
capture_thought content, source?, metadata? Ingest a markdown note. Idempotent on content hash.
semantic_search query, top_k?, tags? Embed query, retrieve nearest chunks, hydrate from D1.
get_thought id Fetch a single document and its chunks.
list_recent limit?, before? Recent docs by created_at, cursor-paginated.
delete_thought id Remove a document, its chunks, and all its vectors.

Run your own

What you'll need

  • A Cloudflare account with Workers enabled (free plan works for personal-scale use).
  • Workers AI access (enabled by default on all accounts; usage-priced — embeddings and Llama 3.1 8B are both cheap at personal scale, expect cents per thousand captures).
  • Vectorize (included with Workers; first 5M queried vector dimensions/month are free).
  • D1 (free tier covers 5GB storage and ~5M reads/day — easily enough for a personal corpus).
  • Cloudflare Access (free for up to 50 users) for service-token auth in front of the Worker.
  • Node 20+ and pnpm; wrangler is installed as a devDep.

Bring-up

# 1. Fork or clone, then:
pnpm install
wrangler login

# 2. Create Cloudflare resources (one-time):
wrangler d1 create serverless-memex-db
wrangler vectorize create serverless-memex --dimensions=768 --metric=cosine

# 3. Update wrangler.jsonc with the returned database_id.
#    (The Vectorize binding uses index_name, no ID swap needed.)

# 4. Apply schema:
wrangler d1 migrations apply serverless-memex-db --remote

# 5. Deploy:
wrangler deploy

Your Worker is now reachable at https://serverless-memex.<your-subdomain>.workers.dev — but unauthenticated. Don't capture anything sensitive until step 6.

Lock it down

  1. Put Cloudflare Access in front of the Worker and issue a service token for your machine clients. Full walkthrough in docs/access-setup.md — ~10 minutes of dashboard clicking, no code changes required.

Connect a client

  1. Wire up an MCP client (Claude Code, etc.) — see docs/mcp-client.md. Or skip MCP entirely and use the REST endpoints directly from a shell script, iOS Shortcut, or CI job.

Cost expectations

For a personal-scale corpus (a few thousand captures over a year), you should expect to stay inside the free tiers for everything except Workers AI, where embedding + enrichment cost on the order of single-digit cents per month. The whole stack is designed to scale to zero — no idle cost.

Capture a thought (REST)

curl -X POST "$MEMEX_URL/capture" \
  -H "CF-Access-Client-Id: $MEMEX_CLIENT_ID" \
  -H "CF-Access-Client-Secret: $MEMEX_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"content": "the bug was in the retry loop, not the timeout"}'

Wire up Claude Code (MCP)

claude mcp add --transport http --scope user \
  memex "$MEMEX_URL/mcp" \
  --header "CF-Access-Client-Id: $MEMEX_CLIENT_ID" \
  --header "CF-Access-Client-Secret: $MEMEX_CLIENT_SECRET"

See docs/mcp-client.md for the full setup.

Docs

Related

About

Serverless MCP server for personal knowledge capture on Cloudflare Workers (D1 + Vectorize + Workers AI).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors