V2 by andthattoo · Pull Request #206 · firstbatchxyz/dkn-compute-node

andthattoo · 2026-03-17T15:44:19Z

No description provided.

Replace the old workspace (compute, executor, p2p, utils) with a single-crate binary featuring embedded GGUF inference via llama-cpp-2, QUIC networking via quinn, HuggingFace model management, and a backpressure-aware worker pool.

…allenge-response - Add dkn-protocol path dependency; re-export wire types via thin modules - Remove local copies of protocol types, proof, and chat template (now in dkn-protocol) - Worker holds per-model engines/templates; TPS is HashMap<String, f64> - Implement challenge signing in the main event loop - Add try_reconnect() with exponential backoff on stream close/error - Introduce NodeContext for shared state across event handlers

- Replace old models (gemma, llama, mistral) with lfm2.5, qwen3.5, nanbeige, locooperator - Add ModelType field to ModelSpec and propagate through worker/main - Worker rejects tasks with image/audio content when model lacks that modality - Re-export ModelType and MessageContent from dkn-protocol

Allows users to specify a quantization level (e.g. Q8_0, Q5_K_M) instead of always downloading the registry default (Q4_K_M). This avoids redundant downloads when a different quantization is already cached locally via HuggingFace hub. Also fix dkn-protocol type mismatches: add ModelType enum, MessageContent enum, and model_type field to ModelRegistryEntry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…rguments Router requires ALPN "dkn" on QUIC connections. Without it, the handshake fails with "peer doesn't support any known protocol". Set ALPN on the client config and on all mock server configs in tests.

Refactor RouterConnection to use mpsc write channel for concurrent sends. Extend Worker with streaming path: run_inference_streaming sends StreamToken messages per token via sync_channel bridge, then StreamEnd/StreamError on completion. Update main.rs to pass stream flag and connection sender to worker.

Add InferenceEngine::apply_template() which extracts the Jinja2 chat template from GGUF metadata and applies it via llama.cpp's built-in engine. Remove all hardcoded template plumbing: chat_template field from ModelSpec, template parameter from Worker/add_engine/run_inference, and the models::template re-export module. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Enable llama-cpp-2 mtmd feature, add MtmdContext to InferenceEngine with generate_multimodal() path, mmproj download/cache, hf_mmproj_file on ModelSpec (populated for lfm2.5-vl and lfm2.5-audio), and multimodal branching in worker sync/streaming paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ignored test that downloads lfm2.5-vl:1.6b + mmproj, runs generate_multimodal() with a synthetic BMP or user-provided image via TEST_IMAGE_PATH env var.

- Add `dria-node setup` interactive command: detects RAM, shows models that fit, downloads selection, runs test inference - Update release workflow: auto-update homebrew-dkn formula with SHA256s - Update CI workflow for v2 branch structure - Add install scripts for macOS/Linux (curl) and Windows (PowerShell) - Update README with quick start, model table, and CLI reference - Bump version to 1.0.0-alpha.1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Bump version from 1.0.0-alpha.2 to 0.7.1 - Replace number input with arrow-key selection (dialoguer) - Add 4-bit/8-bit quantization picker, RAM-aware - Retry loop on download/load failure instead of crashing - Fix qwen3.5:35b-a3b GGUF filename (was 404)

Checks GitHub releases on startup: patch bumps warn, minor/major bumps download and replace the binary. Includes --skip-update flag and makeLatest: true in the release workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… paths

Pin to rev ce026e3 so CI and other machines can build without a local checkout of dkn-protocol. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… sampler bug Plumb OpenAI-compatible response_format parameter through the stack: - ResponseFormat enum (JsonObject, JsonSchema) from dkn-protocol - Worker converts response_format to GBNF grammar via json_schema_to_grammar - Grammar sampler inserted first in chain (masks invalid tokens before sampling) - Both generate() and generate_multimodal() support grammar constraints Fix critical double-accept bug: sample() internally calls accept(), so the explicit sampler.accept() after sample() was advancing grammar stacks twice, causing GGML_ASSERT(!stacks.empty()) crash on any grammar-constrained generation. Switch dkn-protocol dependency from rev to branch = "main".

…ampling generate_multimodal() was setting logits_idx to the sequence position (current_pos) after single-token decode, but get_logits_ith expects a batch output index. Single-token decode has only one output slot (index 0), so passing e.g. 55 caused a panic. Always use -1 (C API sentinel for "last logits"), matching the pattern in generate(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Replace hardcoded n_ctx=2048/4096 with model-native context window (auto-detected from GGUF metadata via n_ctx_train) - Add --context-size / DRIA_CONTEXT_SIZE to optionally cap context for limited VRAM (uses min of model native and cap) - Add pre-flight check in worker: reject tasks where prompt_tokens + max_tokens exceeds context before consuming a capacity slot - Propagate inference errors back to router via StreamError so it can retry on another node instead of waiting for timeout Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Allocate prompt_tokens + max_tokens per request instead of the full ctx_limit (e.g. 32k). This avoids OOM on machines where the model fits in RAM but a full-size KV cache does not. The ctx_limit ceiling still gates pre-flight rejection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

KV cache now defaults to Q8_0 quantization instead of F16, roughly halving KV memory with negligible quality loss. Operators can override with --kv-quant (f16, q8_0, q4_0, etc.) or DRIA_KV_QUANT env var. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

LlamaBackend::init() is a global singleton that errors on second call. Use OnceLock to ensure it's initialized exactly once and shared across all InferenceEngine instances. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Use PowerShell Get-CimInstance instead of deprecated wmic for RAM detection - Verify SHA-256 of cached model files before loading - Auto-delete and re-download corrupt files instead of failing permanently

with_quant() used rfind('-') which broke for repos using dot separators (e.g. LocoOperator-4B.Q4_K_M.gguf → LocoOperator-Q8_0.gguf instead of LocoOperator-4B.Q8_0.gguf). Now finds the quant portion by matching known quant prefixes, preserving the original separator.

- Prefix cached mmproj filenames with model name to avoid collisions between models sharing the same mmproj filename (e.g. multiple Qwen models all using mmproj-BF16.gguf from different repos) - Add LLVM/Clang to Windows prerequisites (needed by bindgen) - Add libclang troubleshooting entry

Chunk prompt evaluation into batches of 2048 tokens instead of decoding all at once, which triggers GGML_ASSERT(n_tokens_all <= n_batch). Applied to both generate() and validate_prefill(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

andthattoo and others added 30 commits March 2, 2026 04:17

updated registry

54fcc29

Add ALPN protocol negotiation for QUIC and suppress clippy too_many_a…

0491fa4

…rguments Router requires ALPN "dkn" on QUIC connections. Without it, the handshake fails with "peer doesn't support any known protocol". Set ALPN on the client config and on all mock server configs in tests.

gracefull shutdown after reconnect

68d1bc7

Add vision integration test for multimodal inference

ef970bb

Ignored test that downloads lfm2.5-vl:1.6b + mmproj, runs generate_multimodal() with a synthetic BMP or user-provided image via TEST_IMAGE_PATH env var.

Mark Qwen 3.5 models as Vision with mmproj files

2995a97

Remove obsolete Docker image workflows

667041e

Fix Linux musl builds: use rustls-tls for hf-hub, add cmake

8fce699

Switch Linux targets from musl to gnu (llama.cpp needs C++ compiler)

60590ae

Use native ARM64 runner for Linux arm64 build

bc04372

Fix ARM64 Linux build: use cargo directly instead of cross

91cbd3a

Add install script

123d551

Add Windows install script

c47af32

Print banner after model load

9576a1e

Set macOS deployment target to 14.0 for Metal compatibility

0af8ef6

Bump version to 1.0.0-alpha.2

e09ed5f

Update README with install scripts and setup details

195407f

Fix clippy warnings

5890067

Bump version to 0.7.2

4f5ff7f

Add workflow_dispatch to releases workflow

961e68e

Update Cargo.lock

005b441

Remove --locked from release builds (git dep breaks it)

6b25f64

andthattoo and others added 27 commits March 5, 2026 22:52

Quote RELEASE_TAG expression to fix YAML syntax

a9274f6

Remove check_release job (fixes YAML validation on push)

90c5b72

Add auto-update mechanism for v2 compute node

29d04a4

Checks GitHub releases on startup: patch bumps warn, minor/major bumps download and replace the binary. Includes --skip-update flag and makeLatest: true in the release workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add prefill-only validation and stride-based logprob extraction

d2242e1

Fix validate_prefill: use output index for get_logits_ith

94e90b5

Fix logprob extraction: always use output index 0 in generation loop

fa3b793

Fix get_logits_ith: use batch index in both generation and validation…

9d3a592

… paths

Switch dkn-protocol from path to git dependency

42705ae

Pin to rev ce026e3 so CI and other machines can build without a local checkout of dkn-protocol. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add qwen3.5:0.8b, qwen3.5:2b, nemotron:30b-a3b to model registry

2c987ee

Add size estimates for new models in setup mode

48bf710

Show multiple capability tags in setup model list

8cc2ce3

Add CUDA and ROCm GPU build targets to release workflow

c33aaac

Add TESTER_GUIDE.md to gitignore

bcea3e8

Fix clippy: remove needless borrows on static backend ref

3da746c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix Windows RAM detection and auto-recover corrupt cached models

ff5eef8

- Use PowerShell Get-CimInstance instead of deprecated wmic for RAM detection - Verify SHA-256 of cached model files before loading - Auto-delete and re-download corrupt files instead of failing permanently

Remove legacy mmproj fallback to prevent cross-model collision

191c79e

Bump CI CUDA toolkit to 12.8.0 for Blackwell GPU support

e10f4ba

Bump version to 0.7.3 and target master branch in CI

e440658

andthattoo merged commit 5cfb9fb into master Mar 17, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V2#206

V2#206
andthattoo merged 57 commits intomasterfrom
v2

andthattoo commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andthattoo commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant