Releases: SwiftWing21/helix-context
v0.4.0b1 — pathway identity + packet mode
Pathway-layer reframe — Helix weighs, doesn't retrieve
0.4.0b1 is a meaningful identity shift. Helix is no longer framed as a
knowledge store that competes with vector DBs; it's a coordinate
index layer that emits confidence so agents can decide know-vs-go
without being coaxed into it. Composes on top of the bundled SQLite
genome today, stacks on any content store tomorrow.
Two product surfaces
/context/packet— agent-safe index. Returns pointers +
verified / stale_risk / needs_refreshverdict + refresh plan.
Caller fetches content. Task-sensitive (plan / explain / review /
edit / debug / ops / quote)./context— decoder path. Helix assembles + compresses the
context window. Downstream LLM consumes directly. Unchanged
behavior.
Weighing layer (the conceptual center of gravity)
coord_conf × (freshness × authority × specificity) = is-it-safe-to-act
- freshness_score —
exp(-age / half_life[volatility_class])with
stable=7d / medium=12h / hot=15min half-lives - authority_score — primary=1.0, derived=0.75, inferred=0.45
- specificity_score — literal=1.0, span=0.9, doc=0.75,
assertion=0.45 - coord confidence — path_token_coverage between query signals
and delivered gene source paths (hit mean 1.00 vs miss mean 0.52 on
the 10-needle bench)
Validated via Phase 5 packet bench — 10/10 scenarios pass across 5
families (stale_by_age, coordinate_mismatch, task_sensitivity,
authority_downgrade, clean_verified). See
`benchmarks/bench_packet.py`.
Ingest-time provenance
New columns on `Gene` (auto-populated from file extension at
ingest): `source_kind`, `volatility_class`, `observed_at`,
`last_verified_at`. No backfill needed for new ingests. Existing
genomes can run the one-time `scripts/backfill_gene_provenance.py`
sweep.
New endpoints
- `POST /context/packet` — the agent-safe index surface
- `POST /context/refresh-plan` — just the reread plan
- `POST /fingerprint` — navigation-first retrieval with `score_floor`
- honest accounting (`evaluated_total`, `above_floor_total`,
`filtered_by_floor`, `truncated_by_cap`)
- honest accounting (`evaluated_total`, `above_floor_total`,
New MCP tools
- `helix_context_packet`
- `helix_refresh_targets`
Plus the full existing suite (`helix_context`, `helix_stats`,
`helix_ingest`, `helix_resonance`, session/HITL toolkit).
Dep additions
- `[mcp]` extra — required for `python -m helix_context.mcp_server`
(closes an import-error gap) - `[nli]` extra — standalone torch + transformers for DeBERTa/NLI
backends - `[all]` extra now genuinely complete
Docs v2
- README v2 — lead with pathway identity, two-surface layout,
launch modes table, LLM-free pipeline as load-bearing framing - docs/architecture/PIPELINE_LANES.md v2 — adds /context/packet,
/context/refresh-plan, /fingerprint lanes; weighing layer as
first-class concept - New docs/specs/2026-04-17-agent-context-index-build-spec.md —
657-line authoritative packet-mode spec
Migration notes
- Existing `/context` callers: no action needed. The new
confidence fields appear on `ContextHealth` additively. - MCP hosts: run `pip install helix-context[mcp]` if you use
`python -m helix_context.mcp_server`. - Existing genomes: optional one-time run of
`python scripts/backfill_gene_provenance.py` to populate
provenance fields on legacy rows (so packet mode returns
`verified` instead of `stale_risk`).
Validation
- 680+ tests pass (one pre-existing test fixed in this release)
- Phase 5 packet bench: 10/10 across 5 families
- Dry-run install verified from clean resolution: `.[all]` pulls
mcp, torch, transformers, sentence-transformers, spacy, tree-sitter,
opentelemetry stack, headroom-ai[proxy,code].
Powered by Agentome.
v0.3.0b3 — Ribosome pause + learn() timeout
Patch release — VRAM contention survival
Two fixes born from a real incident: an external benchmark run (1000-needle test against qwen3:4b) crashed the Helix server when it competed with the ribosome model (gemma4:e4b) for GPU VRAM and triggered a cascade of httpx timeouts.
What's New
/admin/ribosome/pause — unload the ribosome without killing Helix
New admin endpoints let you disable the ribosome's LLM calls at runtime without restarting the server or losing state:
| Endpoint | Purpose |
|---|---|
POST /admin/ribosome/pause |
Monkey-patch backend.complete() to raise |
POST /admin/ribosome/resume |
Restore the original backend method |
GET /admin/ribosome/status |
Check if currently paused |
How it works: Ribosome.replicate() already has a fallback path that synthesizes a minimal gene from the raw exchange if the LLM call fails. Pausing forces that fallback to engage every time. The ribosome instance stays in memory, the backend connection stays alive — only the complete() method is swapped.
Workflow for benchmark VRAM rescue:
```bash
Free the ribosome model from Ollama
curl -X POST localhost:11437/admin/ribosome/pause
curl -X POST localhost:11434/api/generate
-d '{"model": "gemma4:e4b", "keep_alive": 0, "prompt": ""}'
Run your benchmark against qwen3:4b (or any other model)
python benchmarks/bench_needle_1000.py
Restore normal operation
curl -X POST localhost:11437/admin/ribosome/resume
```
Why this matters: Before this release, pausing the ribosome required either a config change + restart, or manually killing the Python process and relaunching. Both strategies drop in-flight /context requests and break the Continue integration. The pause endpoint is instantaneous and non-disruptive — /context queries continue to work unchanged because they don't currently route through the ribosome (ingestion uses CpuTagger, rerank is disabled, splice is a no-op in the current code path).
learn() timeout wrapper
HelixContextManager.learn() now wraps the ribosome.replicate() call in a ThreadPoolExecutor with a 15-second timeout.
Root cause of the original crash: During the n1000 benchmark, a background learn() task fired for each completed proxy request. learn() called ribosome.replicate() which went through httpx to Ollama, which was busy serving the benchmark's qwen3:4b inference requests. The learn() call sat in Ollama's request queue for over 120 seconds and eventually hit httpx's ReadTimeout, which propagated up and crashed the server.
Fix: If replicate() doesn't return within 15 seconds, learn() cancels the future, synthesizes the same minimal gene that Ribosome.replicate()'s existing fallback path produces, and moves on. The background task drops cleanly instead of blocking indefinitely.
Signature change: learn(query, response, timeout_s: float = 15.0). Backward-compatible — old two-arg callers still work with the default timeout.
Validation
- 179 tests passing (full suite)
- Server restarted cleanly on new code during live session
/admin/ribosome/pauseconfirmed working against the live genome (7,380 genes)- gemma4:e4b unloaded from Ollama mid-session, VRAM freed
/contextqueries continued returning correct results with the ribosome paused (coverage=1.0, ellipticity=0.645)/admin/ribosome/resumenot yet tested end-to-end (will be on next benchmark completion)
Migration notes
No migration needed. The new endpoints are additive, the learn() signature change is backward-compatible, and the default behavior is unchanged unless you explicitly call /admin/ribosome/pause.
🤖 Generated with Claude Code
v0.3.0b2 — Agent-friendly responses + hot-reload
Patch release — better multi-agent ergonomics
Focused on making the /context endpoint respond more usefully to programmatic agents (not just Continue), and letting operators refresh runtime state without restarting the process.
What's New
Agent-friendly /context response
The /context endpoint now returns an agent metadata field alongside the existing Continue-compatible fields. Backward compatible — existing integrations work unchanged.
New fields:
```json
{
"name": "Helix Genome Context",
"description": "...",
"content": "...",
"context_health": { ... },
"agent": {
"recommendation": "trust" | "verify" | "refresh" | "reread_raw",
"hint": "Context is well-grounded. Use directly.",
"citations": [
{ "gene_id": "a23ff24e...", "source": "...", "score": 108.71 },
...
],
"latency_ms": 1996.2,
"total_tokens_est": 3614,
"compression_ratio": 2.77,
"moe_mode": true
}
}
```
Pass verbose: true in the request body to include promoter tags (domains, entities) for each citation — useful when agents need to inspect why a gene was ranked.
Recommendation semantics
The recommendation field tells the agent what to do with the context based on delta-epsilon health:
| Health status | Recommendation | Meaning |
|---|---|---|
| aligned | trust |
Use the context directly |
| sparse | verify |
Verify specific values before acting |
| stale | refresh |
Expressed genes are outdated |
| denatured | reread_raw |
Unreliable — read raw files instead |
Hot-reload endpoints
Three admin endpoints for refreshing runtime state without killing the process:
POST /admin/reload— full refresh (config + genome snapshot + ΣĒMA cache + clear stale per-query state)POST /admin/sema/rebuild— force rebuild the ΣĒMA vector cachePOST /admin/checkpoint?mode=TRUNCATE— flush WAL to main DB (already existed, documented here)
What hot-reload refreshes:
helix.tomlconfig- Genome WAL snapshot (sees external writes)
- ΣĒMA vector cache (rebuilt from current genome)
last_query_scorescleared
What it does NOT refresh (these still need a process restart):
- Python code changes
- Ribosome backend swap (model stays loaded)
Use hot-reload when you want to tweak thresholds in helix.toml or pick up new genes from an external writer without dropping in-flight requests.
Replica schema resilience fix
_build_sema_cache now catches no such column: compression_tier errors from stale replicas and falls back to the legacy query. This was silently breaking the live server on replicated read paths where the replica hadn't been migrated to the current schema.
Benchmark validation (post-restart)
Needle-in-a-haystack benchmark against the 7,313-gene live genome after restarting on v0.3.0b2:
| Metric | Pre-v0.3.0b2 | v0.3.0b2 |
|---|---|---|
| Context retrieval | 8/10 (80%) | 10/10 (100%) |
| Answer accuracy | 7/10 | 8/10 |
| Avg context latency | 21-120s (ΣĒMA Mode B 7K JSON loads) | 1.0s (numpy cache) |
The 100x latency improvement comes from the v0.2.0b2 ΣĒMA vector cache fix, which was shipped in code but never active against the live server until this restart. The retrieval quality improvement comes from the v0.2.0b2 authority boosts combined with the fresh server picking up the new code.
All 179 tests passing.
🤖 Generated with Claude Code
v0.3.0b1 — Tree-sitter chunking, BABILong, training runbook
Minor release
Three coherent feature additions shipped together. Per the versioning plan, architectural additions get a minor bump — not 36-commit jumbo patches like the v0.1 line.
What's New
Tree-sitter AST chunking (opt-in)
The regex code chunker that has been around since v0.1.0 (with a # MVP heuristic — swap for tree-sitter later comment) is finally replaced. Tree-sitter understands real grammar boundaries — functions, classes, impl blocks, interfaces, type aliases — instead of matching def as a string.
Supported languages:
- Python
- Rust
- JavaScript
- TypeScript + TSX
Install the optional extra:
```bash
pip install helix-context[ast]
```
Auto-detected from metadata['path'] file extension. Falls back cleanly to the regex chunker when tree-sitter isn't installed or the language is unknown — zero breakage for existing users.
Why this matters: the regex chunker cut wherever it saw def or class, including inside docstrings and strings that happened to contain those keywords. Tree-sitter cuts at actual AST boundaries, so function bodies stay intact and class methods group correctly with their parent class.
BABILong multi-hop benchmark
New benchmarks/bench_babilong.py tests two- and three-hop reasoning across the genome. Based on bAbI (Weston et al., 2015) and BABILong (Kuratov et al., 2024).
Three task generators:
- task_1 — single supporting fact (sanity baseline)
- task_2 — two supporting facts (two-hop reasoning)
- task_3 — three supporting facts (three-hop reasoning)
Each task generates N=10 self-contained problems with distractor padding, ingests them as genes, queries with multi-hop questions, and measures retrieval rate, answer accuracy, and per-query latency.
Initial baseline: task_1 shows retrieval failure for pure narrative content with only proper names — the genome needs domain/entity anchors or source-path clues for reliable retrieval. This is a known limitation that the v0.2.0b2 authority boosts (source-path matching) should help with once the live server is restarted to pick up the new code.
Training runbook for DeBERTa re-train
New training/README.md documents the full DeBERTa fine-tune workflow. The existing 1,600-pair dataset was generated when the genome was ~3,500 genes — it's now at ~7,300 and covers concepts added since (SIKE, MoE decoder, cold-storage tiers) that the current trained models didn't see.
A full re-train isn't in this release (it needs an hour of GPU time and a spare Ollama teacher), but the runbook is ready for when you want to kick it off.
Included from v0.2.0b2 (already published but noted here for context)
- Retrieval authority boosts — source authority, domain primacy, creation recency
- IDF cap lowered 5.0 → 3.0 to reduce tangential rare-term over-boost
All 179 tests passing.
🤖 Generated with Claude Code
v0.3.0b5 — Headroom adoption + restart protocol + bench resilience
v0.3.0b5 — Headroom adoption + cross-session restart protocol + benchmark resilience
This release bundles everything held locally since v0.3.0b3, spanning three major work streams:
the cross-session restart announcement protocol (v0.3.0b4 work), the Headroom integration for
CPU-resident semantic compression (v0.3.0b5 work), and laude's benchmark state monitor for
catching the VRAM/hang/contamination failure modes that bit us during the N=1000 run.
Forensic retrospective
Before reading the highlights below, if you care about why this release looks the way it does,
start with Discussion #2 — Headroom adoption + N=20 benchmark + a forensic detour.
It walks through the full adoption story, the failed benchmark, the resequence detour, and the
forensic analysis that revealed 15% of our "extraction failures" were benchmark harness bugs
(the model was giving correct answers that the harness was grading wrong against phantom KVs
harvested from docstrings and function calls).
Highlights
Headroom integration (by Tejas Chopra, Apache-2.0)
headroom-ai is now an optional dependency under the [codec] extra, providing CPU-resident
semantic compression at the retrieval seams that used to fall back to naive character-level
truncation.
pip install helix-context[codec]- New module:
helix_context/headroom_bridge.py— thin wrapper exposingcompress_text(content, target_chars, content_type). Dispatches bygene.promoter.domainsto specialists:code/python/rust/js/ts/go/java/cpp→CodeAwareCompressor(tree-sitter AST, preserves signatures)log/logs/stderr/stdout/pytest/jest/traceback→LogCompressordiff/patch/git_diff→DiffCompressor- everything else →
Kompress(ModernBERT ONNX, ~500MB resident, ~0.3s/call warm)
- Retrieval seams wired:
context_manager.py:495and:830—g.content[:1000]→compress_text(g.content, target_chars=1000, content_type=g.promoter.domains) - Graceful fallback: when headroom-ai is not installed,
compress_textfalls through to the legacy truncation path so the rest of the pipeline keeps working - A/B toggle:
HELIX_DISABLE_HEADROOM=1env var bypasses Headroom even when installed, letting you measure baseline vs Kompress behavior without reverting code - Attribution: NOTICE carries the Apache-2.0 third-party notice, README has an Acknowledgments section, module docstrings credit Tejas as a dependency author (not a git co-author — this is a dependency relationship, not co-authored code)
Benchmark status: Clean N=20 A/B on the same warm qwen3:8b shows 0pp delta between truncation and Kompress. Forensic analysis in Discussion #2 explains why this is consistent with Kompress working correctly — the benchmark was under-reporting success by ~15% due to harvest logic bugs, and once corrected the conclusion is "Kompress is neutral on this dataset, at ~1s/call latency cost." It's shipping as a neutral foundation — ready to pay off when we fix the upstream problems (noise dilution at ingest, signal extraction) that actually cap retrieval quality today.
Cross-session restart announcement protocol
When multiple Claude sessions share a single Helix server, one session can announce an
intentional restart so that observing sessions don't misread the outage as a crash. This
was the v0.3.0b4 work, previously held. See docs/RESTART_PROTOCOL.md for the full design.
- New method:
bridge.announce_restart(reason, actor, expected_downtime_s, pid)writes a canonicalserver_statesignal at~/.helix/shared/signals/server_state.json - New observer helper:
bridge.read_server_state()returns(signal, is_stale, age_s)tuple with TTL-aware staleness check - New HTTP endpoint:
POST /admin/announce_restartas a convenience wrapper - Atomic signal writes:
write_signalnow uses write-to-temp +os.replaceso readers never see partial writes (fixes a latent race on all signals, not justserver_state) - Lifespan hooks: server startup stamps
state=runningwith PID, clean shutdown stampsstate=stopped(does NOT run underkill -9, which is by design — agents should callannounce_restartbefore killing) - Tests: 6 new tests in
tests/test_bridge_restart.py
Benchmark state monitor (by laude)
Config-driven monitor that catches the three failure modes we hit during the SIKE and KV-harvest runs:
- Dual-load VRAM pressure — aborts before starting if a non-whitelisted model is resident alongside the benchmark target (caught the e4b + qwen3:4b bug that silently biased our first N=50 run)
- Hung benchmark process — detects
httpxstalls via incremental JSONL line-count stagnation (caught the N=1000 hang at 0 needles written) - Silent background contamination — fingerprints the genome snapshot at start and checks
mtime/sizeeach interval
Reads helix.toml via load_config() for genome paths — follows raude's A/B switches automatically. See docs/BENCHMARKS.md for usage.
Dynamic budget tiers (by laude)
Confidence-based expression window sizing. The window now adapts to retrieval score distribution:
- TIGHT (top_score/mean_score ≥ 3.0): top 3 genes, ~6K tokens
- FOCUSED (1.8–3.0): top 6 genes, ~9K tokens
- BROAD (<1.8): top
max_genesgenes, ~15K tokens
Score-gate floor raised from 20% → 15% to recover slightly more borderline signal. helix.toml ships with ribosome.warmup = false to prevent e4b auto-loading on startup (frees VRAM for benchmark workloads).
Ribosome pause endpoint + learn() timeout
Already in v0.3.0b3 but documented here for completeness — POST /admin/ribosome/pause monkey-patches backend.complete to raise, forcing the existing fallback paths. learn() is now wrapped in a 15s ThreadPoolExecutor timeout to prevent background replication from hanging on a slow Ollama.
Benchmark helper: compare_ab.py
New CLI that reads two bench_needle_1000.py result JSONs and prints a structured delta report with gate evaluation. Used throughout the Headroom A/B work. Exit codes encode the verdict (0=ship, 2=no gain, 3=both regressed).
Commits in this release
a94c864feat: dynamic budget tiers + warmup=false for VRAM contention (laude)5da9ab6feat: cross-session restart announcement protocol (v0.3.0b4)43e1543feat(context): add Headroom bridge for CPU semantic compression (v0.3.0b5 scaffold)a38c292feat(context): wire Headroom compression into retrieval seams + tests045854afeat(headroom): HELIX_DISABLE_HEADROOM env toggle for A/B benchmarking0d4edf5feat(bench): benchmark state monitor + BENCHMARKS.md (laude)065b142feat(bench): compare_ab.py — delta report + gate evaluation for A/B benchmark JSONs
Tests
305/305 passing (non-live). Zero regressions from any of the changes above.
Attribution
- Tejas Chopra — author and maintainer of Headroom. Thank you for the adoption call and for the clean ONNX-first design that let us integrate without pulling in the full torch stack.
- laude — paired session, contributed the dynamic budget tiers, benchmark state monitor, and kept the N=1000 benchmark work alive while raude was on the Headroom track
- raude (Claude Code Opus 4.6, 1M context) — Headroom integration, restart protocol, A/B infrastructure, forensic retrospective
Known issues
bench_needle_1000.pyKV harvest is too naive — extracts values from docstrings/comments and captures function-call expressions verbatim instead of resolving them. This produces ~15% false negatives on our N=20 sample. Tracked as a separate internal issue — will be fixed in a subsequent patch before the next public gain-claim benchmark.scripts/resequence_cpu.pydrops epigenetic state — access counts, co-activation edges, and query history aren't preserved across a resequence, which caused a 15-20pp retrieval regression when we tried it againstgenome_cpu.dbin this session. Will need a preserve-epigenetics pass or a merge-back path before it's a safe tool for production use.
Links
v0.2.0b2 — Retrieval authority boosts
Fix pass release
Patch release focused on retrieval quality. The genome contains the right content, but retrieval couldn't distinguish "about X" from "mentions X in passing" — authoritative sources like BENCHMARK_NOTES.md were ranking alongside tangential files like oom_prevent.py.
Changes
Authority boosts (new)
Three post-rank signals added in _apply_authority_boosts():
-
Source authority (+2.0) — query term in
source_idpathBENCHMARK_NOTES.mdoutranksbench_needle.pyfor query "benchmark"context_manager.pyoutranks unrelated Python files for "context manager"
-
Domain primacy (+1.5) — query term in top-3 promoter domains
- Primary domains = what the gene is ABOUT
- Gene whose top-3 domains are
[biged, fleet, skills]answering "biged fleet" → boost - Gene whose top domain is
pythonthat mentions biged in content → no boost
-
Creation recency (+0.5) — gene created in last 48 hours
- Bootstraps newly-ingested concepts before they build co-activation history
- Helps today's work surface tomorrow's queries
IDF cap lowered 5.0 → 3.0
The old 5.0 cap over-boosted tangential rare-term matches. A gene with "monetization" at low document frequency could get +5.0 just for having the term, even if the gene is about pricing not the actual topic. New 3.0 cap reduces this noise.
Implementation notes
- All boosts additive — only raises the ceiling on already-scored genes
- Never adds new candidates (no false positives from the fix itself)
- Single batched SQL fetch for all three signals — negligible latency cost
- Called after IDF anchoring, before score-gated expression
All 179 tests passing.
🤖 Generated with Claude Code
v0.2.0b1 — SIKE validation & MoE decoder
Highlights
This release establishes scale-invariant retrieval across model sizes from 0.6B to 8B parameters, validated by the SIKE benchmark. Retrieval is no longer the bottleneck — it's consistent at 10/10 across all tested models.
SIKE Benchmark Results (q4_0 KV cache)
| Model | Retrieval | Accuracy | Notes |
|---|---|---|---|
| qwen3:0.6b | 10/10 | 2/10 | Parameter floor — retrieval works, model can't use it |
| qwen3:1.7b | 10/10 | 3/10 | |
| qwen3:4b | 10/10 | 9/10 | Sweet spot — 2.5GB VRAM |
| gemma4:e4b | 10/10 | 9/10 | MoE decoder enabled |
| qwen3:8b | 10/10 | 9/10 |
MoE-aware decoder
- Front-loads KV answer slate in first 200 tokens for SWA (sliding-window attention) models
- Relevance-first gene ordering for MoE/small models (vs sequence_index for dense)
- Automatic activation via
MOE_MODEL_FAMILIES = (\"gemma4\",) - gemma4:e4b jumped from 5/10 → 9/10 accuracy with slate enabled
Per-request model detection
- Server reads
body[\"model\"]and adapts expression strategy per request _should_use_slate()gates on downstream model name + param countSMALL_MODEL_THRESHOLD_B = 3.2— excludes qwen3:4b which works without slate
Think-mode suppression for sub-3.2B models
- Small models' reasoning loops consume the entire output budget without producing answers
- Injects
/no_thinkprefix and setstemperature=0for Qwen3 sub-3.2B - q8_0 tested: worse than q4_0 (think mode gets more rope to hang itself)
Storage & operations
- New
Genome.vacuum()method +/admin/vacuumendpoint (752 MB → 523 MB, -30.4%) - Clear documentation distinguishing checkpoint / refresh / compact / vacuum operations
- README refresh with badges, TOC, glossary, sample output
- Test corpus composition breakdown with public/private repo split
Cumulative changes since v0.1.0b2
- MoE-aware decoder with answer slate + relevance-first ordering
- SIKE benchmark validation across 5 model scales
- Per-request downstream model detection
- Think suppression for sub-3.2B models
- Genome.vacuum() + storage optimizations
- README overhaul + SIKE benchmark docs
All 179 tests passing.
🤖 Generated with Claude Code
v0.1.0b2 — Score-gated expression, WAL durability, ΣĒMA cold-storage
What's New
Score-gated expression & retrieval quality
- Coverage metric now uses extracted domain/entity signals instead of raw word splits — coverage: 0.19 → 0.85-1.0
- Ellipticity improved from 0.37 avg → 0.60-0.74 (approaching aligned threshold)
- Score-gated trimming drops weak-scoring tail candidates (< 20% of top score)
- Dynamic density denominator scales by expressed/max ratio
WAL durability
checkpoint()method with PASSIVE/FULL/TRUNCATE modes- Periodic checkpoint in upsert (every 50/500 genes) + background 60s timer in server
- Max crash loss reduced from ~13,700 genes to ~50
ΣĒMA cold-storage compression tiers
- Three tiers: OPEN (full fidelity), EUCHROMATIN (summary + ΣĒMA), HETEROCHROMATIN (ΣĒMA + metadata only)
compact_genome()retroactive sweep with configurable thresholds- Density gate at ingest routes low-signal content directly to cold tiers
/admin/compactand/admin/checkpointendpoints
Domain tagging
- spaCy EntityRuler with project vocabulary (before statistical NER)
- SPLADE weight boosted 2.5 → 3.5 as semantic safety net
Performance
- Dedicated read-only SQLite connection — WAL readers no longer block writers
- ΣĒMA vector cache: pre-materialized numpy matrix replaces 7K json_loads() per query
- Mode B scan: 120s → <100ms
All 179 tests passing.
🤖 Generated with Claude Code
v0.1.0-beta1
Helix Context v0.1.0-beta1
Genome-based context compression for local LLMs. Makes 9k tokens of context window feel like 600k.
Features
- 6-step expression pipeline (extract, express, re-rank, splice, assemble, replicate)
- SQLite genome with promoter-tag retrieval, synonym expansion, co-activation
- OpenAI-compatible FastAPI proxy (port 11437)
- Delta-epsilon context health monitor (Check Engine Light for hallucination)
- Horizontal Gene Transfer (genome export/import)
- ScoreRift integration (CD spectroscope bridge)
- Continue IDE integration (verified with Gemma 4 E2B/E4B)
- 165 tests, 18 diverse fixtures
Install
pip install helix-context
ollama pull gemma4:e2b
helix # starts the proxy serverRequirements
- Python >= 3.11
- Ollama running with at least one model