sct

A local-first SNOMED CT toolchain that's 10-100x faster than IHTSDO Snowstorm. One binary — from raw RF2 release to NDJSON, then SQL, Parquet, Markdown, TUI, GUI, graphs and MCP/LLM tool use. All on your machine, no network calls, REST APIs, or external servers required.

This is very much a work in progress, but it's ready to use and I would very much like feedback on how it performs for you.

RF2 Snapshot SNOMED-CT release
    │
    ▼ sct ndjson                                    (~10s for 831k concepts)
    │
canonical NDJSON artefact
    │
    ├── sct sqlite  ──▶ snomed.db        (SQL + FTS5, MCP backend)
    │       │
    │       ├── sct lexical  ──▶ keyword search (FTS5)
    │       ├── sct tct      ──▶ adds a transitive closure table for faster hierarchy queries
    │       ├── sct refset   ──▶ query reference sets loaded into the SQLite database
    │       └── sct mcp      ──▶ stdio MCP server (Claude Desktop / Claude Code)
    ├── sct parquet ──▶ snomed.parquet   (DuckDB / analytics)
    ├── sct markdown──▶ snomed-concepts/ (RAG / LLM file reading) (untested)
    └── sct embed   ──▶ snomed-embeddings.arrow  (semantic vector search)
                              │
                         sct semantic ──▶ cosine similarity search (requires Ollama)

sct info  <file>              inspect any artefact for more information
sct diff  --old <f> --new <f> compare two NDJSON releases (untested)
sct gui                       browser-based UI served over localhost
                              with graph visualisation and point-and-click exploration.
sct tui                       experimental terminal UI to explore concepts and relationships.
sct completions <shell>       generate shell completions (optional)

The NDJSON artefact at the centre is a stable, versionable, greppable file. All other outputs are derived from it and can be regenerated at any time.

Why is this needed?

sct joins the relatively incomprehensible RF2 files into a single NDJSON artefact. For the UK Monolith Edition this NDJSON file is over 1Gb but it was still possible to load into VSCode to get a feel for the data structure, which is something that is impossible with the original RF2 files. This also means you can use standard tools like jq or ripgrep to query the data without needing a custom server or API.

SNOMED CT is distributed as RF2 — a set of tab-separated files that require joining across multiple tables to get anything useful. The entire healthcare industry relies on remote terminology servers for this, with the overhead of network calls and REST APIs. sct performs the join once creating an NDJSON artefact, and produces standard files you can query locally with sqlite3, duckdb, jq, ripgrep, or an LLM. No server, no API key, no network.

Speed comparison

Operation	`sct` + SQLite	Snowstorm Lite	`sct` speedup
Import - Clinical Edition	22s	209s	~10x faster
Import - Full UK Monolith	~57s	Failed (OOM)*	∞
Single concept lookup (SCTID)	6ms	491ms	~80x faster
Free-text search (10 results)	2ms	202ms	~100x faster

Snowstorm Lite running in Docker with 24Gb of Java heap allocation ran out of memory on the full UK Monolith, which has 831k concepts. sct handled it in under a minute.

For more detailed benchmarks, see docs/benchmarks.md. Feel free to run the benchmarks yourself and share your results, perhaps as an Issue.

Installation

Prebuilt binaries are published for Linux (x86_64, aarch64), macOS (Apple Silicon, Intel), and Windows (x86_64) on every release, with SHA-256 checksums you can verify against the SHA256SUMS file on the Releases page.

Shell one-liners

macOS / Linux:

curl -fsSL https://raw.githubusercontent.com/pacharanero/sct/main/install.sh | sh

Windows (PowerShell):

iwr -useb https://raw.githubusercontent.com/pacharanero/sct/main/install.ps1 | iex

Both installers auto-detect your OS and architecture, download the matching binary, verify its SHA-256 checksum against the published SHA256SUMS, and install to ~/.local/bin (macOS / Linux) or %LOCALAPPDATA%\sct\bin (Windows). Override the destination with SCT_INSTALL_DIR, or pin a specific version with SCT_VERSION=v0.3.9.

Homebrew (macOS and Linux)

brew tap pacharanero/sct
brew install sct

Scoop (Windows)

scoop bucket add sct https://github.com/pacharanero/scoop-sct
scoop install sct

Cargo

If you already have a Rust toolchain (via rustup, stable 1.70+):

# Compile from crates.io
cargo install sct-rs

Or, if you have the cargo-binstall plugin installed (it is not bundled with cargo itself), grab a prebuilt binary instead of compiling from source:

# One-time: install the binstall plugin (or follow the one-liner installers in its README)
cargo install cargo-binstall

# Then install sct without compilation
cargo binstall sct-rs

Build from source

git clone https://github.com/pacharanero/sct
cd sct
cargo install --path .                   # default build
cargo install --path . --features tui    # with terminal UI
cargo install --path . --features gui    # with browser UI
cargo install --path . --features full   # both

Feature	What it adds	Extra dependencies
(default)	All non-interactive subcommands	—
`tui`	Keyboard-driven terminal UI (`sct tui`)	`ratatui`, `crossterm`
`gui`	Browser-based graph UI (`sct gui`)	`axum`, `tokio`, `open`
`full`	Both of the above	all of the above

Manual download

Grab the appropriate archive from the Releases page, verify its SHA-256 against SHA256SUMS, extract, and drop sct somewhere on your PATH.

Quick start

# 1. Download a distribution of SNOMED CT
#    UK:            https://isd.digital.nhs.uk/ → Monolith Edition, RF2: Snapshot
#                   (free under NHS England national licence — access is immediate)
#                   NB: You need to Subscribe to a release before you can see the Download option 🤯
#    International: https://mlds.ihtsdotools.org/ (allow up to a week for approval)

# 2. Convert RF2 → NDJSON (~10s for 831k concepts)
#    Pass the .zip directly — no manual extraction needed
sct ndjson --rf2 SnomedCT_MonolithRF2_PRODUCTION_20260311T120000Z.zip
# ✓  831,487 concepts written → snomedct-monolithrf2-production-20260311t120000z.ndjson

# 3. Load into SQLite with FTS5
sct sqlite --input snomedct-monolithrf2-production-20260311t120000z.ndjson

# 4. Query with standard tools — no custom binary needed
sqlite3 snomed.db \
  "SELECT id, preferred_term FROM concepts_fts WHERE concepts_fts MATCH 'heart attack' LIMIT 5"

# 5. Start the MCP server for Claude Desktop
sct mcp --db snomed.db

UK users can automate steps 1–3 with a single command once the TRUD API integration is set up:

sct trud download --edition uk_monolith --pipeline

Documentation

For all further information see the full documentation by either exploring the docs/ directory or running the docs site locally with s/docs, or visit the docs on the GitHub Pages site: https://pacharanero.github.io/sct/

Subcommands

sct ndjson — convert an RF2 Snapshot directory to a canonical NDJSON artefact
sct sqlite — load NDJSON into a SQLite database with FTS5
sct parquet — export NDJSON to a Parquet file for DuckDB / analytics
sct markdown — export NDJSON to per-concept Markdown files (or per-hierarchy with --mode hierarchy)
sct mcp — start a local MCP server over stdio backed by the SQLite database
sct embed — generate Ollama vector embeddings and write an Arrow IPC file
sct lexical — keyword (FTS5) search over the SQLite database
sct semantic — semantic similarity search over the Arrow IPC embeddings file (requires Ollama)
sct info <file> — inspect any .ndjson, .db, or .arrow artefact and print a summary
sct diff --old <file> --new <file> — compare two NDJSON releases and report what changed
sct completions — print shell completion scripts (bash, zsh, fish, powershell, elvish)
sct tui — keyboard-driven terminal UI for interactive SNOMED CT exploration (optional feature)
sct gui — browser-based UI served over localhost for point-and-click exploration (optional feature)

Run any subcommand with --help for full option reference.

Which output do I want?

Goal	Command
Query with SQL / keyword search	`sct sqlite` then `sct lexical`
Analytics / DuckDB	`sct parquet`
RAG / LLM file ingestion	`sct markdown`
Semantic / meaning-based search	`sct embed` then `sct semantic`
Claude Desktop or Claude Code	`sct sqlite` then `sct mcp`

Getting SNOMED CT

SNOMED CT is licensed. Download the RF2 Snapshot for your region:

UK: NHS Digital TRUD → SNOMED CT Monolith Edition, RF2: Snapshot. Covered by the NHS England national licence.
International: MLDS or NLM.

Download the Monolith Snapshot if available — it bundles the international base, clinical extension, and drug extension in one directory.

Feedback

Please try it out and let me know how it performs for you, especially if you have a use case that isn't well supported by the current subcommands. Open an Issue for anything you want to report, from bugs to feature requests to general feedback.

Development

A devcontainer configuration is included in .devcontainer/. Open the project in VS Code and select "Reopen in Container" to get a ready-to-go environment with Rust, sqlite3, duckdb, jq, and ripgrep pre-installed. Also included is python3 and Ollama, for working with the embeddings and semantic search features.

Store SNOMED data files (zips, NDJSON, databases) in the data-volume/ directory inside the container — it's backed by a Docker volume for faster I/O than the default bind mount.

Contributing

Please see CONTRIBUTING.md for guidelines on how to contribute, report issues, or request features.

Roadmap

See the ROADMAP for planned features, improvements, and long-term vision for the project.

Trademarks and Copyright

SNOMED CT®

SNOMED CT® is a registered trademark of SNOMED International. This project is an independent implementation and is not affiliated with SNOMED International. All SNOMED CT data is sourced from the official RF2 releases and remains copyright of SNOMED International. Please refer to the license terms for your use of SNOMED CT data. You must ensure you have an appropriate license to use SNOMED CT data in your jurisdiction.

`sct`

sct is not trademarked. The source code and binaries are copyright Marcus Baw and Baw Medical Ltd, and provided to you under the terms of the AGPL-3.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
.devcontainer		.devcontainer
.downloads		.downloads
.githooks		.githooks
.github		.github
assets		assets
bench		bench
benchmarks		benchmarks
docs		docs
s		s
specs		specs
src		src
tests		tests
.gitignore		.gitignore
CODE-OF-CONDUCT.md		CODE-OF-CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
ETHICS.md		ETHICS.md
LICENSE		LICENSE
README.md		README.md
install.ps1		install.ps1
install.sh		install.sh
mkdocs.yml		mkdocs.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sct

Why is this needed?

Speed comparison

Installation

Shell one-liners

Homebrew (macOS and Linux)

Scoop (Windows)

Cargo

Build from source

Manual download

Quick start

Documentation

Subcommands

Which output do I want?

Getting SNOMED CT

Feedback

Development

Contributing

Roadmap

Trademarks and Copyright

SNOMED CT®

`sct`

About

Uh oh!

Releases 10

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sct

Why is this needed?

Speed comparison

Installation

Shell one-liners

Homebrew (macOS and Linux)

Scoop (Windows)

Cargo

Build from source

Manual download

Quick start

Documentation

Subcommands

Which output do I want?

Getting SNOMED CT

Feedback

Development

Contributing

Roadmap

Trademarks and Copyright

SNOMED CT®

sct

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 10

Uh oh!

Contributors

Uh oh!

Languages

`sct`