Skip to content

Releases: SwiftWing21/helix-context

v0.4.0b1 — pathway identity + packet mode

18 Apr 21:52

Choose a tag to compare

Pathway-layer reframe — Helix weighs, doesn't retrieve

0.4.0b1 is a meaningful identity shift. Helix is no longer framed as a
knowledge store that competes with vector DBs; it's a coordinate
index layer
that emits confidence so agents can decide know-vs-go
without being coaxed into it. Composes on top of the bundled SQLite
genome today, stacks on any content store tomorrow.

Two product surfaces

  • /context/packet — agent-safe index. Returns pointers +
    verified / stale_risk / needs_refresh verdict + refresh plan.
    Caller fetches content. Task-sensitive (plan / explain / review /
    edit / debug / ops / quote).
  • /context — decoder path. Helix assembles + compresses the
    context window. Downstream LLM consumes directly. Unchanged
    behavior.

Weighing layer (the conceptual center of gravity)

coord_conf × (freshness × authority × specificity) = is-it-safe-to-act
  • freshness_scoreexp(-age / half_life[volatility_class]) with
    stable=7d / medium=12h / hot=15min half-lives
  • authority_score — primary=1.0, derived=0.75, inferred=0.45
  • specificity_score — literal=1.0, span=0.9, doc=0.75,
    assertion=0.45
  • coord confidence — path_token_coverage between query signals
    and delivered gene source paths (hit mean 1.00 vs miss mean 0.52 on
    the 10-needle bench)

Validated via Phase 5 packet bench — 10/10 scenarios pass across 5
families (stale_by_age, coordinate_mismatch, task_sensitivity,
authority_downgrade, clean_verified). See
`benchmarks/bench_packet.py`.

Ingest-time provenance

New columns on `Gene` (auto-populated from file extension at
ingest): `source_kind`, `volatility_class`, `observed_at`,
`last_verified_at`. No backfill needed for new ingests. Existing
genomes can run the one-time `scripts/backfill_gene_provenance.py`
sweep.

New endpoints

  • `POST /context/packet` — the agent-safe index surface
  • `POST /context/refresh-plan` — just the reread plan
  • `POST /fingerprint` — navigation-first retrieval with `score_floor`
    • honest accounting (`evaluated_total`, `above_floor_total`,
      `filtered_by_floor`, `truncated_by_cap`)

New MCP tools

  • `helix_context_packet`
  • `helix_refresh_targets`

Plus the full existing suite (`helix_context`, `helix_stats`,
`helix_ingest`, `helix_resonance`, session/HITL toolkit).

Dep additions

  • `[mcp]` extra — required for `python -m helix_context.mcp_server`
    (closes an import-error gap)
  • `[nli]` extra — standalone torch + transformers for DeBERTa/NLI
    backends
  • `[all]` extra now genuinely complete

Docs v2

  • README v2 — lead with pathway identity, two-surface layout,
    launch modes table, LLM-free pipeline as load-bearing framing
  • docs/architecture/PIPELINE_LANES.md v2 — adds /context/packet,
    /context/refresh-plan, /fingerprint lanes; weighing layer as
    first-class concept
  • New docs/specs/2026-04-17-agent-context-index-build-spec.md
    657-line authoritative packet-mode spec

Migration notes

  • Existing `/context` callers: no action needed. The new
    confidence fields appear on `ContextHealth` additively.
  • MCP hosts: run `pip install helix-context[mcp]` if you use
    `python -m helix_context.mcp_server`.
  • Existing genomes: optional one-time run of
    `python scripts/backfill_gene_provenance.py` to populate
    provenance fields on legacy rows (so packet mode returns
    `verified` instead of `stale_risk`).

Validation

  • 680+ tests pass (one pre-existing test fixed in this release)
  • Phase 5 packet bench: 10/10 across 5 families
  • Dry-run install verified from clean resolution: `.[all]` pulls
    mcp, torch, transformers, sentence-transformers, spacy, tree-sitter,
    opentelemetry stack, headroom-ai[proxy,code].

Powered by Agentome.

v0.3.0b3 — Ribosome pause + learn() timeout

10 Apr 08:39

Choose a tag to compare

Patch release — VRAM contention survival

Two fixes born from a real incident: an external benchmark run (1000-needle test against qwen3:4b) crashed the Helix server when it competed with the ribosome model (gemma4:e4b) for GPU VRAM and triggered a cascade of httpx timeouts.

What's New

/admin/ribosome/pause — unload the ribosome without killing Helix

New admin endpoints let you disable the ribosome's LLM calls at runtime without restarting the server or losing state:

Endpoint Purpose
POST /admin/ribosome/pause Monkey-patch backend.complete() to raise
POST /admin/ribosome/resume Restore the original backend method
GET /admin/ribosome/status Check if currently paused

How it works: Ribosome.replicate() already has a fallback path that synthesizes a minimal gene from the raw exchange if the LLM call fails. Pausing forces that fallback to engage every time. The ribosome instance stays in memory, the backend connection stays alive — only the complete() method is swapped.

Workflow for benchmark VRAM rescue:
```bash

Free the ribosome model from Ollama

curl -X POST localhost:11437/admin/ribosome/pause
curl -X POST localhost:11434/api/generate
-d '{"model": "gemma4:e4b", "keep_alive": 0, "prompt": ""}'

Run your benchmark against qwen3:4b (or any other model)

python benchmarks/bench_needle_1000.py

Restore normal operation

curl -X POST localhost:11437/admin/ribosome/resume
```

Why this matters: Before this release, pausing the ribosome required either a config change + restart, or manually killing the Python process and relaunching. Both strategies drop in-flight /context requests and break the Continue integration. The pause endpoint is instantaneous and non-disruptive — /context queries continue to work unchanged because they don't currently route through the ribosome (ingestion uses CpuTagger, rerank is disabled, splice is a no-op in the current code path).

learn() timeout wrapper

HelixContextManager.learn() now wraps the ribosome.replicate() call in a ThreadPoolExecutor with a 15-second timeout.

Root cause of the original crash: During the n1000 benchmark, a background learn() task fired for each completed proxy request. learn() called ribosome.replicate() which went through httpx to Ollama, which was busy serving the benchmark's qwen3:4b inference requests. The learn() call sat in Ollama's request queue for over 120 seconds and eventually hit httpx's ReadTimeout, which propagated up and crashed the server.

Fix: If replicate() doesn't return within 15 seconds, learn() cancels the future, synthesizes the same minimal gene that Ribosome.replicate()'s existing fallback path produces, and moves on. The background task drops cleanly instead of blocking indefinitely.

Signature change: learn(query, response, timeout_s: float = 15.0). Backward-compatible — old two-arg callers still work with the default timeout.

Validation

  • 179 tests passing (full suite)
  • Server restarted cleanly on new code during live session
  • /admin/ribosome/pause confirmed working against the live genome (7,380 genes)
  • gemma4:e4b unloaded from Ollama mid-session, VRAM freed
  • /context queries continued returning correct results with the ribosome paused (coverage=1.0, ellipticity=0.645)
  • /admin/ribosome/resume not yet tested end-to-end (will be on next benchmark completion)

Migration notes

No migration needed. The new endpoints are additive, the learn() signature change is backward-compatible, and the default behavior is unchanged unless you explicitly call /admin/ribosome/pause.

🤖 Generated with Claude Code

v0.3.0b2 — Agent-friendly responses + hot-reload

10 Apr 07:54

Choose a tag to compare

Patch release — better multi-agent ergonomics

Focused on making the /context endpoint respond more usefully to programmatic agents (not just Continue), and letting operators refresh runtime state without restarting the process.

What's New

Agent-friendly /context response

The /context endpoint now returns an agent metadata field alongside the existing Continue-compatible fields. Backward compatible — existing integrations work unchanged.

New fields:
```json
{
"name": "Helix Genome Context",
"description": "...",
"content": "...",
"context_health": { ... },
"agent": {
"recommendation": "trust" | "verify" | "refresh" | "reread_raw",
"hint": "Context is well-grounded. Use directly.",
"citations": [
{ "gene_id": "a23ff24e...", "source": "...", "score": 108.71 },
...
],
"latency_ms": 1996.2,
"total_tokens_est": 3614,
"compression_ratio": 2.77,
"moe_mode": true
}
}
```

Pass verbose: true in the request body to include promoter tags (domains, entities) for each citation — useful when agents need to inspect why a gene was ranked.

Recommendation semantics

The recommendation field tells the agent what to do with the context based on delta-epsilon health:

Health status Recommendation Meaning
aligned trust Use the context directly
sparse verify Verify specific values before acting
stale refresh Expressed genes are outdated
denatured reread_raw Unreliable — read raw files instead

Hot-reload endpoints

Three admin endpoints for refreshing runtime state without killing the process:

  • POST /admin/reload — full refresh (config + genome snapshot + ΣĒMA cache + clear stale per-query state)
  • POST /admin/sema/rebuild — force rebuild the ΣĒMA vector cache
  • POST /admin/checkpoint?mode=TRUNCATE — flush WAL to main DB (already existed, documented here)

What hot-reload refreshes:

  • helix.toml config
  • Genome WAL snapshot (sees external writes)
  • ΣĒMA vector cache (rebuilt from current genome)
  • last_query_scores cleared

What it does NOT refresh (these still need a process restart):

  • Python code changes
  • Ribosome backend swap (model stays loaded)

Use hot-reload when you want to tweak thresholds in helix.toml or pick up new genes from an external writer without dropping in-flight requests.

Replica schema resilience fix

_build_sema_cache now catches no such column: compression_tier errors from stale replicas and falls back to the legacy query. This was silently breaking the live server on replicated read paths where the replica hadn't been migrated to the current schema.

Benchmark validation (post-restart)

Needle-in-a-haystack benchmark against the 7,313-gene live genome after restarting on v0.3.0b2:

Metric Pre-v0.3.0b2 v0.3.0b2
Context retrieval 8/10 (80%) 10/10 (100%)
Answer accuracy 7/10 8/10
Avg context latency 21-120s (ΣĒMA Mode B 7K JSON loads) 1.0s (numpy cache)

The 100x latency improvement comes from the v0.2.0b2 ΣĒMA vector cache fix, which was shipped in code but never active against the live server until this restart. The retrieval quality improvement comes from the v0.2.0b2 authority boosts combined with the fresh server picking up the new code.

All 179 tests passing.

🤖 Generated with Claude Code

v0.3.0b1 — Tree-sitter chunking, BABILong, training runbook

10 Apr 07:31

Choose a tag to compare

Minor release

Three coherent feature additions shipped together. Per the versioning plan, architectural additions get a minor bump — not 36-commit jumbo patches like the v0.1 line.

What's New

Tree-sitter AST chunking (opt-in)

The regex code chunker that has been around since v0.1.0 (with a # MVP heuristic — swap for tree-sitter later comment) is finally replaced. Tree-sitter understands real grammar boundaries — functions, classes, impl blocks, interfaces, type aliases — instead of matching def as a string.

Supported languages:

  • Python
  • Rust
  • JavaScript
  • TypeScript + TSX

Install the optional extra:
```bash
pip install helix-context[ast]
```

Auto-detected from metadata['path'] file extension. Falls back cleanly to the regex chunker when tree-sitter isn't installed or the language is unknown — zero breakage for existing users.

Why this matters: the regex chunker cut wherever it saw def or class, including inside docstrings and strings that happened to contain those keywords. Tree-sitter cuts at actual AST boundaries, so function bodies stay intact and class methods group correctly with their parent class.

BABILong multi-hop benchmark

New benchmarks/bench_babilong.py tests two- and three-hop reasoning across the genome. Based on bAbI (Weston et al., 2015) and BABILong (Kuratov et al., 2024).

Three task generators:

  • task_1 — single supporting fact (sanity baseline)
  • task_2 — two supporting facts (two-hop reasoning)
  • task_3 — three supporting facts (three-hop reasoning)

Each task generates N=10 self-contained problems with distractor padding, ingests them as genes, queries with multi-hop questions, and measures retrieval rate, answer accuracy, and per-query latency.

Initial baseline: task_1 shows retrieval failure for pure narrative content with only proper names — the genome needs domain/entity anchors or source-path clues for reliable retrieval. This is a known limitation that the v0.2.0b2 authority boosts (source-path matching) should help with once the live server is restarted to pick up the new code.

Training runbook for DeBERTa re-train

New training/README.md documents the full DeBERTa fine-tune workflow. The existing 1,600-pair dataset was generated when the genome was ~3,500 genes — it's now at ~7,300 and covers concepts added since (SIKE, MoE decoder, cold-storage tiers) that the current trained models didn't see.

A full re-train isn't in this release (it needs an hour of GPU time and a spare Ollama teacher), but the runbook is ready for when you want to kick it off.

Included from v0.2.0b2 (already published but noted here for context)

  • Retrieval authority boosts — source authority, domain primacy, creation recency
  • IDF cap lowered 5.0 → 3.0 to reduce tangential rare-term over-boost

All 179 tests passing.

🤖 Generated with Claude Code

v0.3.0b5 — Headroom adoption + restart protocol + bench resilience

10 Apr 20:57

Choose a tag to compare

v0.3.0b5 — Headroom adoption + cross-session restart protocol + benchmark resilience

This release bundles everything held locally since v0.3.0b3, spanning three major work streams:
the cross-session restart announcement protocol (v0.3.0b4 work), the Headroom integration for
CPU-resident semantic compression (v0.3.0b5 work), and laude's benchmark state monitor for
catching the VRAM/hang/contamination failure modes that bit us during the N=1000 run.

Forensic retrospective

Before reading the highlights below, if you care about why this release looks the way it does,
start with Discussion #2 — Headroom adoption + N=20 benchmark + a forensic detour.
It walks through the full adoption story, the failed benchmark, the resequence detour, and the
forensic analysis that revealed 15% of our "extraction failures" were benchmark harness bugs
(the model was giving correct answers that the harness was grading wrong against phantom KVs
harvested from docstrings and function calls).


Highlights

Headroom integration (by Tejas Chopra, Apache-2.0)

headroom-ai is now an optional dependency under the [codec] extra, providing CPU-resident
semantic compression at the retrieval seams that used to fall back to naive character-level
truncation.

pip install helix-context[codec]
  • New module: helix_context/headroom_bridge.py — thin wrapper exposing compress_text(content, target_chars, content_type). Dispatches by gene.promoter.domains to specialists:
    • code/python/rust/js/ts/go/java/cppCodeAwareCompressor (tree-sitter AST, preserves signatures)
    • log/logs/stderr/stdout/pytest/jest/tracebackLogCompressor
    • diff/patch/git_diffDiffCompressor
    • everything else → Kompress (ModernBERT ONNX, ~500MB resident, ~0.3s/call warm)
  • Retrieval seams wired: context_manager.py:495 and :830g.content[:1000]compress_text(g.content, target_chars=1000, content_type=g.promoter.domains)
  • Graceful fallback: when headroom-ai is not installed, compress_text falls through to the legacy truncation path so the rest of the pipeline keeps working
  • A/B toggle: HELIX_DISABLE_HEADROOM=1 env var bypasses Headroom even when installed, letting you measure baseline vs Kompress behavior without reverting code
  • Attribution: NOTICE carries the Apache-2.0 third-party notice, README has an Acknowledgments section, module docstrings credit Tejas as a dependency author (not a git co-author — this is a dependency relationship, not co-authored code)

Benchmark status: Clean N=20 A/B on the same warm qwen3:8b shows 0pp delta between truncation and Kompress. Forensic analysis in Discussion #2 explains why this is consistent with Kompress working correctly — the benchmark was under-reporting success by ~15% due to harvest logic bugs, and once corrected the conclusion is "Kompress is neutral on this dataset, at ~1s/call latency cost." It's shipping as a neutral foundation — ready to pay off when we fix the upstream problems (noise dilution at ingest, signal extraction) that actually cap retrieval quality today.

Cross-session restart announcement protocol

When multiple Claude sessions share a single Helix server, one session can announce an
intentional restart so that observing sessions don't misread the outage as a crash. This
was the v0.3.0b4 work, previously held. See docs/RESTART_PROTOCOL.md for the full design.

  • New method: bridge.announce_restart(reason, actor, expected_downtime_s, pid) writes a canonical server_state signal at ~/.helix/shared/signals/server_state.json
  • New observer helper: bridge.read_server_state() returns (signal, is_stale, age_s) tuple with TTL-aware staleness check
  • New HTTP endpoint: POST /admin/announce_restart as a convenience wrapper
  • Atomic signal writes: write_signal now uses write-to-temp + os.replace so readers never see partial writes (fixes a latent race on all signals, not just server_state)
  • Lifespan hooks: server startup stamps state=running with PID, clean shutdown stamps state=stopped (does NOT run under kill -9, which is by design — agents should call announce_restart before killing)
  • Tests: 6 new tests in tests/test_bridge_restart.py

Benchmark state monitor (by laude)

Config-driven monitor that catches the three failure modes we hit during the SIKE and KV-harvest runs:

  1. Dual-load VRAM pressure — aborts before starting if a non-whitelisted model is resident alongside the benchmark target (caught the e4b + qwen3:4b bug that silently biased our first N=50 run)
  2. Hung benchmark process — detects httpx stalls via incremental JSONL line-count stagnation (caught the N=1000 hang at 0 needles written)
  3. Silent background contamination — fingerprints the genome snapshot at start and checks mtime/size each interval

Reads helix.toml via load_config() for genome paths — follows raude's A/B switches automatically. See docs/BENCHMARKS.md for usage.

Dynamic budget tiers (by laude)

Confidence-based expression window sizing. The window now adapts to retrieval score distribution:

  • TIGHT (top_score/mean_score ≥ 3.0): top 3 genes, ~6K tokens
  • FOCUSED (1.8–3.0): top 6 genes, ~9K tokens
  • BROAD (<1.8): top max_genes genes, ~15K tokens

Score-gate floor raised from 20% → 15% to recover slightly more borderline signal. helix.toml ships with ribosome.warmup = false to prevent e4b auto-loading on startup (frees VRAM for benchmark workloads).

Ribosome pause endpoint + learn() timeout

Already in v0.3.0b3 but documented here for completeness — POST /admin/ribosome/pause monkey-patches backend.complete to raise, forcing the existing fallback paths. learn() is now wrapped in a 15s ThreadPoolExecutor timeout to prevent background replication from hanging on a slow Ollama.

Benchmark helper: compare_ab.py

New CLI that reads two bench_needle_1000.py result JSONs and prints a structured delta report with gate evaluation. Used throughout the Headroom A/B work. Exit codes encode the verdict (0=ship, 2=no gain, 3=both regressed).


Commits in this release

  • a94c864 feat: dynamic budget tiers + warmup=false for VRAM contention (laude)
  • 5da9ab6 feat: cross-session restart announcement protocol (v0.3.0b4)
  • 43e1543 feat(context): add Headroom bridge for CPU semantic compression (v0.3.0b5 scaffold)
  • a38c292 feat(context): wire Headroom compression into retrieval seams + tests
  • 045854a feat(headroom): HELIX_DISABLE_HEADROOM env toggle for A/B benchmarking
  • 0d4edf5 feat(bench): benchmark state monitor + BENCHMARKS.md (laude)
  • 065b142 feat(bench): compare_ab.py — delta report + gate evaluation for A/B benchmark JSONs

Tests

305/305 passing (non-live). Zero regressions from any of the changes above.

Attribution

  • Tejas Chopra — author and maintainer of Headroom. Thank you for the adoption call and for the clean ONNX-first design that let us integrate without pulling in the full torch stack.
  • laude — paired session, contributed the dynamic budget tiers, benchmark state monitor, and kept the N=1000 benchmark work alive while raude was on the Headroom track
  • raude (Claude Code Opus 4.6, 1M context) — Headroom integration, restart protocol, A/B infrastructure, forensic retrospective

Known issues

  • bench_needle_1000.py KV harvest is too naive — extracts values from docstrings/comments and captures function-call expressions verbatim instead of resolving them. This produces ~15% false negatives on our N=20 sample. Tracked as a separate internal issue — will be fixed in a subsequent patch before the next public gain-claim benchmark.
  • scripts/resequence_cpu.py drops epigenetic state — access counts, co-activation edges, and query history aren't preserved across a resequence, which caused a 15-20pp retrieval regression when we tried it against genome_cpu.db in this session. Will need a preserve-epigenetics pass or a merge-back path before it's a safe tool for production use.

Links

v0.2.0b2 — Retrieval authority boosts

10 Apr 07:05

Choose a tag to compare

Fix pass release

Patch release focused on retrieval quality. The genome contains the right content, but retrieval couldn't distinguish "about X" from "mentions X in passing" — authoritative sources like BENCHMARK_NOTES.md were ranking alongside tangential files like oom_prevent.py.

Changes

Authority boosts (new)

Three post-rank signals added in _apply_authority_boosts():

  1. Source authority (+2.0) — query term in source_id path

    • BENCHMARK_NOTES.md outranks bench_needle.py for query "benchmark"
    • context_manager.py outranks unrelated Python files for "context manager"
  2. Domain primacy (+1.5) — query term in top-3 promoter domains

    • Primary domains = what the gene is ABOUT
    • Gene whose top-3 domains are [biged, fleet, skills] answering "biged fleet" → boost
    • Gene whose top domain is python that mentions biged in content → no boost
  3. Creation recency (+0.5) — gene created in last 48 hours

    • Bootstraps newly-ingested concepts before they build co-activation history
    • Helps today's work surface tomorrow's queries

IDF cap lowered 5.0 → 3.0

The old 5.0 cap over-boosted tangential rare-term matches. A gene with "monetization" at low document frequency could get +5.0 just for having the term, even if the gene is about pricing not the actual topic. New 3.0 cap reduces this noise.

Implementation notes

  • All boosts additive — only raises the ceiling on already-scored genes
  • Never adds new candidates (no false positives from the fix itself)
  • Single batched SQL fetch for all three signals — negligible latency cost
  • Called after IDF anchoring, before score-gated expression

All 179 tests passing.

🤖 Generated with Claude Code

v0.2.0b1 — SIKE validation & MoE decoder

10 Apr 06:52

Choose a tag to compare

Highlights

This release establishes scale-invariant retrieval across model sizes from 0.6B to 8B parameters, validated by the SIKE benchmark. Retrieval is no longer the bottleneck — it's consistent at 10/10 across all tested models.

SIKE Benchmark Results (q4_0 KV cache)

Model Retrieval Accuracy Notes
qwen3:0.6b 10/10 2/10 Parameter floor — retrieval works, model can't use it
qwen3:1.7b 10/10 3/10
qwen3:4b 10/10 9/10 Sweet spot — 2.5GB VRAM
gemma4:e4b 10/10 9/10 MoE decoder enabled
qwen3:8b 10/10 9/10

MoE-aware decoder

  • Front-loads KV answer slate in first 200 tokens for SWA (sliding-window attention) models
  • Relevance-first gene ordering for MoE/small models (vs sequence_index for dense)
  • Automatic activation via MOE_MODEL_FAMILIES = (\"gemma4\",)
  • gemma4:e4b jumped from 5/10 → 9/10 accuracy with slate enabled

Per-request model detection

  • Server reads body[\"model\"] and adapts expression strategy per request
  • _should_use_slate() gates on downstream model name + param count
  • SMALL_MODEL_THRESHOLD_B = 3.2 — excludes qwen3:4b which works without slate

Think-mode suppression for sub-3.2B models

  • Small models' reasoning loops consume the entire output budget without producing answers
  • Injects /no_think prefix and sets temperature=0 for Qwen3 sub-3.2B
  • q8_0 tested: worse than q4_0 (think mode gets more rope to hang itself)

Storage & operations

  • New Genome.vacuum() method + /admin/vacuum endpoint (752 MB → 523 MB, -30.4%)
  • Clear documentation distinguishing checkpoint / refresh / compact / vacuum operations
  • README refresh with badges, TOC, glossary, sample output
  • Test corpus composition breakdown with public/private repo split

Cumulative changes since v0.1.0b2

  • MoE-aware decoder with answer slate + relevance-first ordering
  • SIKE benchmark validation across 5 model scales
  • Per-request downstream model detection
  • Think suppression for sub-3.2B models
  • Genome.vacuum() + storage optimizations
  • README overhaul + SIKE benchmark docs

All 179 tests passing.

🤖 Generated with Claude Code

v0.1.0b2 — Score-gated expression, WAL durability, ΣĒMA cold-storage

10 Apr 03:24

Choose a tag to compare

What's New

Score-gated expression & retrieval quality

  • Coverage metric now uses extracted domain/entity signals instead of raw word splits — coverage: 0.19 → 0.85-1.0
  • Ellipticity improved from 0.37 avg → 0.60-0.74 (approaching aligned threshold)
  • Score-gated trimming drops weak-scoring tail candidates (< 20% of top score)
  • Dynamic density denominator scales by expressed/max ratio

WAL durability

  • checkpoint() method with PASSIVE/FULL/TRUNCATE modes
  • Periodic checkpoint in upsert (every 50/500 genes) + background 60s timer in server
  • Max crash loss reduced from ~13,700 genes to ~50

ΣĒMA cold-storage compression tiers

  • Three tiers: OPEN (full fidelity), EUCHROMATIN (summary + ΣĒMA), HETEROCHROMATIN (ΣĒMA + metadata only)
  • compact_genome() retroactive sweep with configurable thresholds
  • Density gate at ingest routes low-signal content directly to cold tiers
  • /admin/compact and /admin/checkpoint endpoints

Domain tagging

  • spaCy EntityRuler with project vocabulary (before statistical NER)
  • SPLADE weight boosted 2.5 → 3.5 as semantic safety net

Performance

  • Dedicated read-only SQLite connection — WAL readers no longer block writers
  • ΣĒMA vector cache: pre-materialized numpy matrix replaces 7K json_loads() per query
  • Mode B scan: 120s → <100ms

All 179 tests passing.

🤖 Generated with Claude Code

v0.1.0-beta1

08 Apr 01:20

Choose a tag to compare

Helix Context v0.1.0-beta1

Genome-based context compression for local LLMs. Makes 9k tokens of context window feel like 600k.

Features

  • 6-step expression pipeline (extract, express, re-rank, splice, assemble, replicate)
  • SQLite genome with promoter-tag retrieval, synonym expansion, co-activation
  • OpenAI-compatible FastAPI proxy (port 11437)
  • Delta-epsilon context health monitor (Check Engine Light for hallucination)
  • Horizontal Gene Transfer (genome export/import)
  • ScoreRift integration (CD spectroscope bridge)
  • Continue IDE integration (verified with Gemma 4 E2B/E4B)
  • 165 tests, 18 diverse fixtures

Install

pip install helix-context
ollama pull gemma4:e2b
helix  # starts the proxy server

Requirements

  • Python >= 3.11
  • Ollama running with at least one model