A local-first SNOMED CT toolchain that's 10-100x faster than IHTSDO Snowstorm. One binary — from raw RF2 release to NDJSON, then SQL, Parquet, Markdown, TUI, GUI, graphs and MCP/LLM tool use. All on your machine, no network calls, REST APIs, or external servers required.
This is very much a work in progress, but it's ready to use and I would very much like feedback on how it performs for you.
RF2 Snapshot SNOMED-CT release
│
▼ sct ndjson (~10s for 831k concepts)
│
canonical NDJSON artefact
│
├── sct sqlite ──▶ snomed.db (SQL + FTS5, MCP backend)
│ │
│ ├── sct lexical ──▶ keyword search (FTS5)
│ ├── sct tct ──▶ adds a transitive closure table for faster hierarchy queries
│ ├── sct refset ──▶ query reference sets loaded into the SQLite database
│ └── sct mcp ──▶ stdio MCP server (Claude Desktop / Claude Code)
├── sct parquet ──▶ snomed.parquet (DuckDB / analytics)
├── sct markdown──▶ snomed-concepts/ (RAG / LLM file reading) (untested)
└── sct embed ──▶ snomed-embeddings.arrow (semantic vector search)
│
sct semantic ──▶ cosine similarity search (requires Ollama)
sct info <file> inspect any artefact for more information
sct diff --old <f> --new <f> compare two NDJSON releases (untested)
sct gui browser-based UI served over localhost
with graph visualisation and point-and-click exploration.
sct tui experimental terminal UI to explore concepts and relationships.
sct completions <shell> generate shell completions (optional)
The NDJSON artefact at the centre is a stable, versionable, greppable file. All other outputs are derived from it and can be regenerated at any time.
sct joins the relatively incomprehensible RF2 files into a single NDJSON artefact. For the UK Monolith Edition this NDJSON file is over 1Gb but it was still possible to load into VSCode to get a feel for the data structure, which is something that is impossible with the original RF2 files. This also means you can use standard tools like jq or ripgrep to query the data without needing a custom server or API.
SNOMED CT is distributed as RF2 — a set of tab-separated files that require joining across multiple tables to get anything useful. The entire healthcare industry relies on remote terminology servers for this, with the overhead of network calls and REST APIs. sct performs the join once creating an NDJSON artefact, and produces standard files you can query locally with sqlite3, duckdb, jq, ripgrep, or an LLM. No server, no API key, no network.
| Operation | sct + SQLite |
Snowstorm Lite | sct speedup |
|---|---|---|---|
| Import - Clinical Edition | 22s | 209s | ~10x faster |
| Import - Full UK Monolith | ~57s | Failed (OOM)* | ∞ |
| Single concept lookup (SCTID) | 6ms | 491ms | ~80x faster |
| Free-text search (10 results) | 2ms | 202ms | ~100x faster |
- Snowstorm Lite running in Docker with 24Gb of Java heap allocation ran out of memory on the full UK Monolith, which has 831k concepts.
scthandled it in under a minute.
For more detailed benchmarks, see docs/benchmarks.md. Feel free to run the benchmarks yourself and share your results, perhaps as an Issue.
Prebuilt binaries are published for Linux (x86_64, aarch64), macOS (Apple Silicon, Intel), and Windows (x86_64) on every release, with SHA-256 checksums you can verify against the SHA256SUMS file on the Releases page.
macOS / Linux:
curl -fsSL https://raw.githubusercontent.com/pacharanero/sct/main/install.sh | shWindows (PowerShell):
iwr -useb https://raw.githubusercontent.com/pacharanero/sct/main/install.ps1 | iexBoth installers auto-detect your OS and architecture, download the matching binary, verify its SHA-256 checksum against the published SHA256SUMS, and install to ~/.local/bin (macOS / Linux) or %LOCALAPPDATA%\sct\bin (Windows). Override the destination with SCT_INSTALL_DIR, or pin a specific version with SCT_VERSION=v0.3.9.
brew tap pacharanero/sct
brew install sctscoop bucket add sct https://github.com/pacharanero/scoop-sct
scoop install sctIf you already have a Rust toolchain (via rustup, stable 1.70+):
# Compile from crates.io
cargo install sct-rsOr, if you have the cargo-binstall plugin installed (it is not bundled with cargo itself), grab a prebuilt binary instead of compiling from source:
# One-time: install the binstall plugin (or follow the one-liner installers in its README)
cargo install cargo-binstall
# Then install sct without compilation
cargo binstall sct-rsgit clone https://github.com/pacharanero/sct
cd sct
cargo install --path . # default build
cargo install --path . --features tui # with terminal UI
cargo install --path . --features gui # with browser UI
cargo install --path . --features full # both| Feature | What it adds | Extra dependencies |
|---|---|---|
| (default) | All non-interactive subcommands | — |
tui |
Keyboard-driven terminal UI (sct tui) |
ratatui, crossterm |
gui |
Browser-based graph UI (sct gui) |
axum, tokio, open |
full |
Both of the above | all of the above |
Grab the appropriate archive from the Releases page, verify its SHA-256 against SHA256SUMS, extract, and drop sct somewhere on your PATH.
# 1. Download a distribution of SNOMED CT
# UK: https://isd.digital.nhs.uk/ → Monolith Edition, RF2: Snapshot
# (free under NHS England national licence — access is immediate)
# NB: You need to Subscribe to a release before you can see the Download option 🤯
# International: https://mlds.ihtsdotools.org/ (allow up to a week for approval)
# 2. Convert RF2 → NDJSON (~10s for 831k concepts)
# Pass the .zip directly — no manual extraction needed
sct ndjson --rf2 SnomedCT_MonolithRF2_PRODUCTION_20260311T120000Z.zip
# ✓ 831,487 concepts written → snomedct-monolithrf2-production-20260311t120000z.ndjson
# 3. Load into SQLite with FTS5
sct sqlite --input snomedct-monolithrf2-production-20260311t120000z.ndjson
# 4. Query with standard tools — no custom binary needed
sqlite3 snomed.db \
"SELECT id, preferred_term FROM concepts_fts WHERE concepts_fts MATCH 'heart attack' LIMIT 5"
# 5. Start the MCP server for Claude Desktop
sct mcp --db snomed.dbUK users can automate steps 1–3 with a single command once the TRUD API integration is set up:
sct trud download --edition uk_monolith --pipelineFor all further information see the full documentation by either exploring the docs/ directory or running the docs site locally with s/docs, or visit the docs on the GitHub Pages site: https://pacharanero.github.io/sct/
- sct ndjson — convert an RF2 Snapshot directory to a canonical NDJSON artefact
- sct sqlite — load NDJSON into a SQLite database with FTS5
- sct parquet — export NDJSON to a Parquet file for DuckDB / analytics
- sct markdown — export NDJSON to per-concept Markdown files (or per-hierarchy with
--mode hierarchy) - sct mcp — start a local MCP server over stdio backed by the SQLite database
- sct embed — generate Ollama vector embeddings and write an Arrow IPC file
- sct lexical — keyword (FTS5) search over the SQLite database
- sct semantic — semantic similarity search over the Arrow IPC embeddings file (requires Ollama)
sct info <file>— inspect any.ndjson,.db, or.arrowartefact and print a summarysct diff --old <file> --new <file>— compare two NDJSON releases and report what changed- sct completions — print shell completion scripts (bash, zsh, fish, powershell, elvish)
- sct tui — keyboard-driven terminal UI for interactive SNOMED CT exploration (optional feature)
- sct gui — browser-based UI served over localhost for point-and-click exploration (optional feature)
Run any subcommand with --help for full option reference.
| Goal | Command |
|---|---|
| Query with SQL / keyword search | sct sqlite then sct lexical |
| Analytics / DuckDB | sct parquet |
| RAG / LLM file ingestion | sct markdown |
| Semantic / meaning-based search | sct embed then sct semantic |
| Claude Desktop or Claude Code | sct sqlite then sct mcp |
SNOMED CT is licensed. Download the RF2 Snapshot for your region:
- UK: NHS Digital TRUD → SNOMED CT Monolith Edition, RF2: Snapshot. Covered by the NHS England national licence.
- International: MLDS or NLM.
Download the Monolith Snapshot if available — it bundles the international base, clinical extension, and drug extension in one directory.
Please try it out and let me know how it performs for you, especially if you have a use case that isn't well supported by the current subcommands. Open an Issue for anything you want to report, from bugs to feature requests to general feedback.
A devcontainer configuration is included in .devcontainer/. Open the project in VS Code and select "Reopen in Container" to get a ready-to-go environment with Rust, sqlite3, duckdb, jq, and ripgrep pre-installed. Also included is python3 and Ollama, for working with the embeddings and semantic search features.
Store SNOMED data files (zips, NDJSON, databases) in the data-volume/ directory inside the container — it's backed by a Docker volume for faster I/O than the default bind mount.
Please see CONTRIBUTING.md for guidelines on how to contribute, report issues, or request features.
See the ROADMAP for planned features, improvements, and long-term vision for the project.
SNOMED CT® is a registered trademark of SNOMED International. This project is an independent implementation and is not affiliated with SNOMED International. All SNOMED CT data is sourced from the official RF2 releases and remains copyright of SNOMED International. Please refer to the license terms for your use of SNOMED CT data. You must ensure you have an appropriate license to use SNOMED CT data in your jurisdiction.
sct is not trademarked. The source code and binaries are copyright Marcus Baw and Baw Medical Ltd, and provided to you under the terms of the AGPL-3.0 license.