The knowledge store for the agentic era.
Knowmarks is an open-source, local-first knowledge store designed as a shared data substrate for humans and AI agents. Save content from any source — URLs, PDFs, YouTube videos, GitHub repos, local files — and the system extracts, understands, classifies, and connects every item automatically. Your bookmarks don't just sit there — agents put them to work.
Download the latest .dmg from Releases, drag to Applications, launch. No Python, no terminal, no Docker. A setup wizard walks you through API key configuration and importing your first bookmarks.
macOS Gatekeeper: The app is unsigned for beta. On first launch, right-click → Open → Open to bypass Gatekeeper. Only needed once.
# Full install (includes local ONNX embeddings)
pip install 'knowmarks[all,embeddings]'
# Slim install (use Ollama or API for embeddings)
pip install 'knowmarks[all]'docker compose up -d
# Dashboard at http://localhost:3749Save from anywhere. Desktop app, browser extension, CLI (km save), web dashboard, REST API, MCP, or automated connectors. Every surface is a peer — what you can do through one, you can do through any other.
Understand any format. HTML pages get clean text extraction with metadata cascading (JSON-LD, OpenGraph, meta tags). PDFs become searchable text. YouTube videos become transcripts. GitHub repos become READMEs with structured metadata (stars, language, topics, last commit). Local files go through the same pipeline.
Search intelligently. Every query hits both semantic (vector) and keyword (full-text) indexes simultaneously, merged via Reciprocal Rank Fusion. You don't pick a search mode — the system finds the best results regardless of how you phrase the query.
Organize automatically. Items self-organize into semantic clusters with plain-language labels. A post-clustering consolidation pass merges clusters that refer to the same concept under different names (acronyms, synonyms, variant phrasings). No tags, no folders, no manual sorting.
Track freshness. Source-aware staleness probes check real signals — GitHub last commit dates, npm publish timestamps, HTTP Last-Modified headers — not just elapsed time. You know when content has genuinely changed, not just when it's old.
Discover RSS feeds. Three-strategy feed discovery: standard <link> tag detection (6 MIME types), anchor heuristic scanning, and RSSHub radar rules lookup against 1,200+ domains. Finds feeds for sites that don't advertise them.
Serve agents. 27 MCP tools give AI agents full CRUD access — search, save, update, delete, browse clusters, manage projects, curated collections, governance actions. Progressive detail levels (minimal/summary/full) keep token budgets tight. Works with Claude Code, Claude Desktop, and any MCP client.
| Category | What You Get |
|---|---|
| Search | Hybrid vector + keyword search, query expansion via LLM, topic suggestions, scoped search by cluster/project/time |
| Dashboard | Pulse overview, cluster browsing, project views, curated collections, detail panel with re-fetch/reclassify, unified chat input |
| Chat | Generative answers grounded in your collection, persistent multi-turn conversations, agentic actions (create collections, bulk delete, add to project via natural language) |
| Import | Chrome, Firefox, Safari bookmarks, GitHub stars, Karakeep API, Readwise Reader, bookmark files (Netscape HTML, Karakeep JSON), local files (PDF, etc.) |
| Projects | Active project grouping with multi-facet scoring, score-gap detection, prospective flagging, manual pin/unpin, spec file auto-import |
| Collections | Keyword-seeded topic lenses that auto-populate and update as your collection grows |
| Governance | Dead link auto-trash, near-duplicate detection with LLM comparison summaries, stale content triage, bulk acknowledge/delete/refresh |
| Connectors | Polling-based sync with configurable schedules, health monitoring, incremental sync via high-water marks |
| Extraction | HTML, PDF, YouTube transcripts, GitHub repos, local files. Structured metadata from JSON-LD, OpenGraph, and meta tags |
| Embeddings | Pluggable: local ONNX (fastembed, default), Ollama, or any OpenAI-compatible API. Model-locked per collection |
| LLM | Auto-summaries, auto-connections, classification, relevance explanations, collection insights, query expansion. OpenRouter (default), Ollama, or any OpenAI-compatible API. Works without LLM — degrades gracefully |
| API | REST /api/v1/ (27 routes, Bearer auth), MCP (27 tools, stdio transport), CLI (km command) |
| Desktop | Native macOS app (Electron + PyInstaller), auto-updates via GitHub Releases, macOS Keychain for API key storage |
| Observability | Sentry error tracking (no user content), PostHog anonymous analytics (opt-out), km doctor diagnostics, km feedback bug reporting |
# Save something
km save https://simonwillison.net/2024/Oct/17/knowledge-graph/ --note "Great overview of knowledge graphs"
# Search
km find "knowledge graph architectures"
# Start the dashboard
km serve
# Check collection health
km status
# Import Chrome bookmarks
km import chrome
# Run diagnostics
km doctorKnowmarks exposes 27 tools via the Model Context Protocol for AI agent integration.
Add to your Claude Code or Claude Desktop MCP config:
{
"mcpServers": {
"knowmarks": {
"command": "km-mcp"
}
}
}Tools: pulse (collection overview), search (hybrid search with filters), get_knowmark / save / delete_knowmark / update_note (CRUD), related (find similar items), stale (freshness governance), refine (LLM-powered curation), list_projects / get_project / create_project / update_project / add_to_project / remove_from_project / delete_project (project management), list_clusters / get_cluster (cluster browsing), list_collections / get_collection / create_collection / delete_collection (curated collections), stats (collection statistics), reembed / retry_fetch / rebuild_clusters_tool / connector_health (governance actions).
Full REST API at /api/v1/ with optional Bearer token authentication. Complete parity with the MCP server.
# Search
curl http://localhost:3749/api/v1/search?q=knowledge+graphs
# Save
curl -X POST http://localhost:3749/api/v1/knowmarks \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "note": "Good read"}'
# Collection pulse
curl http://localhost:3749/api/v1/pulse
# With auth
export KNOWMARKS_API_KEY=your-secret-key
curl -H "Authorization: Bearer $KNOWMARKS_API_KEY" \
http://localhost:3749/api/v1/statsChrome extension for one-click saving with optional notes:
- Open
chrome://extensions - Enable Developer mode
- Click Load unpacked → select the
extension/folder - Click the Knowmarks icon on any page to save it
Configure via Settings screen (desktop app / dashboard), km config set, or environment variables. Env vars take precedence.
| Variable | Default | Description |
|---|---|---|
KNOWMARKS_DATA_DIR |
OS default | Data directory path |
KNOWMARKS_HOST |
127.0.0.1 |
Server bind address |
KNOWMARKS_PORT |
3749 |
Server port |
KNOWMARKS_API_KEY |
(empty) | Bearer token for REST API auth |
KNOWMARKS_EMBEDDING_PROVIDER |
fastembed |
fastembed, ollama, or openai |
KNOWMARKS_EMBEDDING_MODEL |
BAAI/bge-small-en-v1.5 |
Embedding model name |
KNOWMARKS_EMBEDDING_ENDPOINT |
(empty) | Endpoint URL for ollama/openai providers |
KM_LLM_URL |
https://openrouter.ai/api/v1 |
OpenAI-compatible LLM endpoint |
KM_LLM_MODEL |
google/gemini-2.5-flash |
LLM model name |
KM_LLM_API_KEY |
(empty) | LLM API key (OpenRouter, etc.) |
KM_LLM_ENABLED |
1 |
Set to 0 to disable all LLM features |
km save <url|path> Save a URL or local file
km find <query> Hybrid search
km stale Show stale items
km status Collection stats
km serve Start web dashboard + API
km mcp Start MCP server (stdio)
km import chrome|firefox|safari Import browser bookmarks
km import github <username> Import GitHub stars
km import karakeep -u URL -k KEY Import from Karakeep
km import readwise -t TOKEN Import from Readwise Reader
km import file <path> Import bookmark file
km connector list Show connectors and sync schedules
km connector schedule <n> <int> Set polling interval (e.g., 6h, 1d)
km project add <name> Create a project
km project list List projects
km project show <name> Show project details with members
km reclassify Retroactive content type classification
km config show Show effective configuration
km config set KEY VALUE Set persistent config value
km doctor Full system health check
km feedback "description" Submit bug report with auto-diagnostics
km service install Install launchd background service (macOS)
km service uninstall Remove background service
km service status Check service health + uptime
km service logs Show recent service output
src/knowmarks/
├── cli.py # Click CLI (km command)
├── config.py # Configuration (env > config file > defaults)
├── core/
│ ├── db.py # SQLite (WAL + FTS5), schema, CRUD, vector storage
│ ├── embed.py # Pluggable embedding providers
│ ├── extract.py # Content extraction (HTML, PDF, YouTube, GitHub, local)
│ ├── search.py # Hybrid search + Reciprocal Rank Fusion
│ ├── cluster.py # Semantic clustering with synonym consolidation
│ ├── freshness.py # Vitality scoring + cluster-relative decay
│ ├── classify.py # Content type classification (LLM + heuristic)
│ ├── projects.py # Multi-facet project association
│ ├── curated.py # Keyword-seeded topic collections
│ ├── llm.py # LLM client (OpenAI-compatible)
│ ├── probes.py # Source-aware staleness probes
│ ├── radar_rules.py # RSSHub radar rules for feed discovery
│ ├── telemetry.py # Sentry + PostHog (privacy-first)
│ └── connectors/ # Browser, GitHub, Karakeep, Readwise, file import
├── web/
│ ├── app.py # FastAPI routes + dashboard
│ ├── api_v1.py # REST API v1 (27 routes, Bearer auth)
│ └── static/ # Dashboard (HTML/CSS/JS, no build step)
├── mcp/
│ └── server.py # MCP tools (27 tools, progressive detail)
└── desktop/
└── electron/ # Electron shell + PyInstaller bundling
Stack: Python 3.11+, SQLite (WAL + FTS5), FastAPI, FastMCP, httpx, trafilatura, numpy, Electron, PyInstaller.
git clone https://github.com/spaceshipmike/knowmarks.git
cd knowmarks
uv sync --all-extras
# Run tests
uv run pytest
# Lint
uv run ruff check src/
# Dev server
uv run km serveSee CONTRIBUTING.md for detailed guidelines.
Full documentation at spaceshipmike.github.io/knowmarks.