Skip to content

spaceshipmike/knowmarks

Repository files navigation

Knowmarks

The knowledge store for the agentic era.

Knowmarks is an open-source, local-first knowledge store designed as a shared data substrate for humans and AI agents. Save content from any source — URLs, PDFs, YouTube videos, GitHub repos, local files — and the system extracts, understands, classifies, and connects every item automatically. Your bookmarks don't just sit there — agents put them to work.

License Python CI

Install

Desktop App (recommended)

Download the latest .dmg from Releases, drag to Applications, launch. No Python, no terminal, no Docker. A setup wizard walks you through API key configuration and importing your first bookmarks.

macOS Gatekeeper: The app is unsigned for beta. On first launch, right-click → Open → Open to bypass Gatekeeper. Only needed once.

pip

# Full install (includes local ONNX embeddings)
pip install 'knowmarks[all,embeddings]'

# Slim install (use Ollama or API for embeddings)
pip install 'knowmarks[all]'

Docker

docker compose up -d
# Dashboard at http://localhost:3749

What It Does

Save from anywhere. Desktop app, browser extension, CLI (km save), web dashboard, REST API, MCP, or automated connectors. Every surface is a peer — what you can do through one, you can do through any other.

Understand any format. HTML pages get clean text extraction with metadata cascading (JSON-LD, OpenGraph, meta tags). PDFs become searchable text. YouTube videos become transcripts. GitHub repos become READMEs with structured metadata (stars, language, topics, last commit). Local files go through the same pipeline.

Search intelligently. Every query hits both semantic (vector) and keyword (full-text) indexes simultaneously, merged via Reciprocal Rank Fusion. You don't pick a search mode — the system finds the best results regardless of how you phrase the query.

Organize automatically. Items self-organize into semantic clusters with plain-language labels. A post-clustering consolidation pass merges clusters that refer to the same concept under different names (acronyms, synonyms, variant phrasings). No tags, no folders, no manual sorting.

Track freshness. Source-aware staleness probes check real signals — GitHub last commit dates, npm publish timestamps, HTTP Last-Modified headers — not just elapsed time. You know when content has genuinely changed, not just when it's old.

Discover RSS feeds. Three-strategy feed discovery: standard <link> tag detection (6 MIME types), anchor heuristic scanning, and RSSHub radar rules lookup against 1,200+ domains. Finds feeds for sites that don't advertise them.

Serve agents. 27 MCP tools give AI agents full CRUD access — search, save, update, delete, browse clusters, manage projects, curated collections, governance actions. Progressive detail levels (minimal/summary/full) keep token budgets tight. Works with Claude Code, Claude Desktop, and any MCP client.

Features

Category What You Get
Search Hybrid vector + keyword search, query expansion via LLM, topic suggestions, scoped search by cluster/project/time
Dashboard Pulse overview, cluster browsing, project views, curated collections, detail panel with re-fetch/reclassify, unified chat input
Chat Generative answers grounded in your collection, persistent multi-turn conversations, agentic actions (create collections, bulk delete, add to project via natural language)
Import Chrome, Firefox, Safari bookmarks, GitHub stars, Karakeep API, Readwise Reader, bookmark files (Netscape HTML, Karakeep JSON), local files (PDF, etc.)
Projects Active project grouping with multi-facet scoring, score-gap detection, prospective flagging, manual pin/unpin, spec file auto-import
Collections Keyword-seeded topic lenses that auto-populate and update as your collection grows
Governance Dead link auto-trash, near-duplicate detection with LLM comparison summaries, stale content triage, bulk acknowledge/delete/refresh
Connectors Polling-based sync with configurable schedules, health monitoring, incremental sync via high-water marks
Extraction HTML, PDF, YouTube transcripts, GitHub repos, local files. Structured metadata from JSON-LD, OpenGraph, and meta tags
Embeddings Pluggable: local ONNX (fastembed, default), Ollama, or any OpenAI-compatible API. Model-locked per collection
LLM Auto-summaries, auto-connections, classification, relevance explanations, collection insights, query expansion. OpenRouter (default), Ollama, or any OpenAI-compatible API. Works without LLM — degrades gracefully
API REST /api/v1/ (27 routes, Bearer auth), MCP (27 tools, stdio transport), CLI (km command)
Desktop Native macOS app (Electron + PyInstaller), auto-updates via GitHub Releases, macOS Keychain for API key storage
Observability Sentry error tracking (no user content), PostHog anonymous analytics (opt-out), km doctor diagnostics, km feedback bug reporting

Quick Start

# Save something
km save https://simonwillison.net/2024/Oct/17/knowledge-graph/ --note "Great overview of knowledge graphs"

# Search
km find "knowledge graph architectures"

# Start the dashboard
km serve

# Check collection health
km status

# Import Chrome bookmarks
km import chrome

# Run diagnostics
km doctor

MCP Server

Knowmarks exposes 27 tools via the Model Context Protocol for AI agent integration.

Add to your Claude Code or Claude Desktop MCP config:

{
  "mcpServers": {
    "knowmarks": {
      "command": "km-mcp"
    }
  }
}

Tools: pulse (collection overview), search (hybrid search with filters), get_knowmark / save / delete_knowmark / update_note (CRUD), related (find similar items), stale (freshness governance), refine (LLM-powered curation), list_projects / get_project / create_project / update_project / add_to_project / remove_from_project / delete_project (project management), list_clusters / get_cluster (cluster browsing), list_collections / get_collection / create_collection / delete_collection (curated collections), stats (collection statistics), reembed / retry_fetch / rebuild_clusters_tool / connector_health (governance actions).

REST API

Full REST API at /api/v1/ with optional Bearer token authentication. Complete parity with the MCP server.

# Search
curl http://localhost:3749/api/v1/search?q=knowledge+graphs

# Save
curl -X POST http://localhost:3749/api/v1/knowmarks \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "note": "Good read"}'

# Collection pulse
curl http://localhost:3749/api/v1/pulse

# With auth
export KNOWMARKS_API_KEY=your-secret-key
curl -H "Authorization: Bearer $KNOWMARKS_API_KEY" \
  http://localhost:3749/api/v1/stats

Browser Extension

Chrome extension for one-click saving with optional notes:

  1. Open chrome://extensions
  2. Enable Developer mode
  3. Click Load unpacked → select the extension/ folder
  4. Click the Knowmarks icon on any page to save it

Configuration

Configure via Settings screen (desktop app / dashboard), km config set, or environment variables. Env vars take precedence.

Variable Default Description
KNOWMARKS_DATA_DIR OS default Data directory path
KNOWMARKS_HOST 127.0.0.1 Server bind address
KNOWMARKS_PORT 3749 Server port
KNOWMARKS_API_KEY (empty) Bearer token for REST API auth
KNOWMARKS_EMBEDDING_PROVIDER fastembed fastembed, ollama, or openai
KNOWMARKS_EMBEDDING_MODEL BAAI/bge-small-en-v1.5 Embedding model name
KNOWMARKS_EMBEDDING_ENDPOINT (empty) Endpoint URL for ollama/openai providers
KM_LLM_URL https://openrouter.ai/api/v1 OpenAI-compatible LLM endpoint
KM_LLM_MODEL google/gemini-2.5-flash LLM model name
KM_LLM_API_KEY (empty) LLM API key (OpenRouter, etc.)
KM_LLM_ENABLED 1 Set to 0 to disable all LLM features

CLI Reference

km save <url|path>                Save a URL or local file
km find <query>                   Hybrid search
km stale                          Show stale items
km status                         Collection stats
km serve                          Start web dashboard + API
km mcp                            Start MCP server (stdio)
km import chrome|firefox|safari   Import browser bookmarks
km import github <username>       Import GitHub stars
km import karakeep -u URL -k KEY  Import from Karakeep
km import readwise -t TOKEN       Import from Readwise Reader
km import file <path>             Import bookmark file
km connector list                 Show connectors and sync schedules
km connector schedule <n> <int>   Set polling interval (e.g., 6h, 1d)
km project add <name>             Create a project
km project list                   List projects
km project show <name>            Show project details with members
km reclassify                     Retroactive content type classification
km config show                    Show effective configuration
km config set KEY VALUE           Set persistent config value
km doctor                         Full system health check
km feedback "description"         Submit bug report with auto-diagnostics
km service install                Install launchd background service (macOS)
km service uninstall              Remove background service
km service status                 Check service health + uptime
km service logs                   Show recent service output

Architecture

src/knowmarks/
├── cli.py              # Click CLI (km command)
├── config.py           # Configuration (env > config file > defaults)
├── core/
│   ├── db.py           # SQLite (WAL + FTS5), schema, CRUD, vector storage
│   ├── embed.py        # Pluggable embedding providers
│   ├── extract.py      # Content extraction (HTML, PDF, YouTube, GitHub, local)
│   ├── search.py       # Hybrid search + Reciprocal Rank Fusion
│   ├── cluster.py      # Semantic clustering with synonym consolidation
│   ├── freshness.py    # Vitality scoring + cluster-relative decay
│   ├── classify.py     # Content type classification (LLM + heuristic)
│   ├── projects.py     # Multi-facet project association
│   ├── curated.py      # Keyword-seeded topic collections
│   ├── llm.py          # LLM client (OpenAI-compatible)
│   ├── probes.py       # Source-aware staleness probes
│   ├── radar_rules.py  # RSSHub radar rules for feed discovery
│   ├── telemetry.py    # Sentry + PostHog (privacy-first)
│   └── connectors/     # Browser, GitHub, Karakeep, Readwise, file import
├── web/
│   ├── app.py          # FastAPI routes + dashboard
│   ├── api_v1.py       # REST API v1 (27 routes, Bearer auth)
│   └── static/         # Dashboard (HTML/CSS/JS, no build step)
├── mcp/
│   └── server.py       # MCP tools (27 tools, progressive detail)
└── desktop/
    └── electron/       # Electron shell + PyInstaller bundling

Stack: Python 3.11+, SQLite (WAL + FTS5), FastAPI, FastMCP, httpx, trafilatura, numpy, Electron, PyInstaller.

Development

git clone https://github.com/spaceshipmike/knowmarks.git
cd knowmarks
uv sync --all-extras

# Run tests
uv run pytest

# Lint
uv run ruff check src/

# Dev server
uv run km serve

See CONTRIBUTING.md for detailed guidelines.

Documentation

Full documentation at spaceshipmike.github.io/knowmarks.

License

Apache License 2.0

About

Local-first, AI-native knowledge management for bookmarks

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors