Skip to content

Releases: hivellm/vectorizer

v3.3.0

04 May 21:38

Choose a tag to compare

Security

  • Hardened dashboard cookies + CSRF. Closes the cookie + CSRF gaps phase8 enumerated (audit sections 1.9 + 6.5–6.6).

    • POST /auth/login and POST /auth/refresh now set a hardened vectorizer_session cookie carrying the JWT, and a sibling XSRF-TOKEN cookie carrying a 32-byte random CSRF token. Both cookies are emitted with SameSite=Strict; Path=/; Max-Age=<jwt_exp>. The session cookie is HttpOnly; Secure; the CSRF cookie is Secure but readable so the SPA can echo it.
    • New require_csrf_middleware rejects POST/PUT/PATCH/DELETE requests under /auth/* and /admin/* with HTTP 403 when the X-CSRF-Token header is missing or does not match the token bound to the caller's session JWT. GET/HEAD/OPTIONS requests bypass the check; /auth/login and /auth/validate-password are exempt; X-API-Key requests are exempt (header-bearer credentials are not subject to the cross-origin attack the CSRF token defends against).
    • POST /auth/logout emits expired Set-Cookie headers for both cookies and drops the CSRF binding.
    • New auth.cookies.insecure_dev config flag (default false) drops only the Secure attribute for plain-HTTP 127.0.0.1 development. Boot fails with a clear error when the flag is true while binding to any non-loopback host (most importantly 0.0.0.0).
    • dashboard/src/lib/api-middleware.ts adds a csrfMiddleware that reads the XSRF-TOKEN cookie and echoes it in X-CSRF-Token on every mutating request.
    • Backward compatibility: the legacy access_token field in the login response body is preserved so existing SDK callers and the dashboard's Authorization: Bearer middleware continue to work; the cookie is purely additive for browser-side use.
  • Loopback dev-mode auth bypass. Closes phase8 audit gap 7.5. New auth.dev_mode_skip_loopback config flag (default false) lets local developers run the dashboard / SDK against 127.0.0.1 without minting a JWT or echoing tokens on every cURL.

    • When the flag is true, the auth middleware short-circuits with a synthetic local-dev-admin principal (Role::Admin, empty scopes) and every response carries X-Vectorizer-Dev-Mode: true so tooling, logs, and integration tests can spot the bypass. The CSRF middleware no-ops in the same mode (no session JWT to bind a token to).
    • Boot fails fast with a clear error when the flag is true and the bind host is anything other than 127.0.0.1, ::1, or localhost. The loopback predicate is now centralised so the cookie boot guard and the dev-mode boot guard apply the same definition.
    • Boot logs a multi-line WARN banner ("AUTH IS DISABLED FOR LOOPBACK — DO NOT EXPOSE THIS BUILD") whenever the flag is engaged so the operator sees the security posture immediately.
    • Documented in docs/users/api/AUTHENTICATION.md under "Local Development".

Added

  • Dashboard metrics endpoints (phase25). Three additive REST surfaces so the v3.3.0 console renders real numbers instead of synthetic generators:

    • GET /metrics/runtime (admin-gated) — single JSON snapshot refreshed once per second. Carries process CPU / memory / RSS / total / active_connections / uptime_seconds, rolling 60 s qps_window_60s, per-route throughput_by_route[] (route + qps + p50_ms + p99_ms, sorted desc by QPS), error_rate_5xx_60s, and a wal { current_seq, size_bytes, last_checkpoint_at, last_checkpoint_seq } block. Standalone (non-replicated) servers report a zero-initialised WAL block — honest about not having one. Implementation: RuntimeSampler ticks every second using sysinfo::ProcessRefreshKind::nothing().with_cpu().with_memory() (sysinfo 0.38); LatencyAggregator keeps a 60 s rolling window per route; an axum middleware increments / decrements an Arc<AtomicUsize> connection counter on every request.
    • GET /stats extension — additive default_quantization: String (most-common quantization label across active collections — one of none, binary, sq-4bit, sq-8bit, sq-16bit, sq, pq) and compression_ratio: f32 (mean static ratio across collections sharing that label; PQ ratio is dimension-aware, the others are dimension-independent). Empty store reports ("none", 1.0).
    • GET /collections/{name} extension — additive vector_count_history: [{at, count}] ring buffer, capped at 60 entries, sampled lazily on each read (Collection::record_vector_count_sample() — no-op if the last sample is < 60 s old, so a static collection produces zero ongoing CPU). Sharded / GPU / DistributedSharded variants always report [].
    • WAL plumbing. DurableReplicationLog::wal_snapshot() returns WalSnapshot (re-exported from vectorizer::replication); mark_replicated() only stamps last_checkpoint_at when min_confirmed_offset actually advances, so retried ACKs do not lie. MasterNode::wal_snapshot() is the public surface; RuntimeSampler::set_master_node() is the wiring point.
    • docs/specs/API_REFERENCE.md documents both the route table entry and the full response shape (/metrics/runtime + the /stats extension), with the standalone-mode caveat called out.
  • SDK parity for the phase25 surface (phase27). Every supported SDK gains the same typed wrappers — RuntimeMetrics, RouteStats, WalSnapshot, VectorCountSample, plus Stats.{default_quantization, compression_ratio} and Collection.vector_count_history:

    • Rust SDKVectorizerClient::get_runtime_metrics(), models in sdks/rust/src/models.rs. Defaults zero-valued so older servers and standalone-mode payloads parse unchanged.
    • TypeScript SDKAdminClient.getRuntimeMetrics(), interfaces in sdks/typescript/src/models/admin.ts. Every field is optional so partial payloads decode without runtime errors.
    • Python SDKAdminClient.get_runtime_metrics(), dataclasses in sdks/python/models.py with from_dict classmethods that tolerate missing keys. CollectionInfo.__post_init__ hydrates dict entries from **data kwargs into typed VectorCountSample instances.
    • Go SDKClient.GetRuntimeMetrics(), structs in sdks/go/models.go with omitempty tags on every field.
    • C# SDKVectorizerClient.GetRuntimeMetricsAsync(), classes in sdks/csharp/Models/AdminModels.cs with JsonPropertyName and sensible default initialisers.
    • 4 new unit tests per SDK (Rust 4, TS 7, Python 8, Go 7, C# 7 = 33 total) cover the route + decode, full + partial payloads, the new Stats quantization fields, and the vector_count_history round-trip.
  • Cluster admin endpoints. Five new server routes for production cluster operations:

    • POST /cluster/failover — promote a replica to primary with a pre-flight WAL-lag check (returns 409 when replica lag exceeds max_lag_segments, default 1). Residual loss window documented in src/replication/state.rs.
    • POST /cluster/replicas/{id}/resync — force a full snapshot + WAL replay on a lagging replica.
    • POST /cluster/peers — add a peer node (member or observer role) to the cluster; complements the existing Qdrant-compatible remove-peer endpoint.
    • POST /cluster/rebalance — trigger shard rebalance across all active nodes using insert-before-delete invariant; returns a job ID immediately while the moves complete asynchronously.
    • GET /cluster/rebalance/status — poll progress of the active or last completed rebalance job.
  • Auth/RBAC admin endpoints. Four new server routes for production multi-tenant deployments:

    • POST /auth/keys/{id}/rotate — atomic key rotation with a configurable grace window (default 300 s). Both the old and new token are accepted during the window; only the new token is accepted after. Returns { old_key_id, new_key_id, new_token, grace_until }.
    • POST /auth/keys (extended body) — existing endpoint now accepts an optional scopes: [{ collection, permissions }] array. Keys with a non-empty scopes list are collection-scoped and are denied on collections not listed. Keys with an empty list have no implicit access (default-deny). Existing global-key callers that omit scopes are unaffected.
    • POST /auth/introspect — RFC 7662 token introspection. Accepts any JWT or API key in the request body and returns { active, scope, sub, exp }.
    • GET /auth/audit — admin-only audit log. Returns the most recent admin-action entries from the in-memory ring buffer (capped at 4096 entries, flushed to daily-rotated JSONL files under the backup directory every 30 s). Filterable by from, to, actor, action query parameters.
  • AuditLogger (vectorizer::auth::audit). Non-blocking audit logger: record() sends to an unbounded mpsc channel and never blocks the handler hot-path. Background flusher drains the channel, maintains the in-memory VecDeque ring, and writes to daily-rotated files. Durability SLO: at-most-once, best-effort — entries in the buffer at crash time are lost. Operators requiring a durable audit ledger should ship the JSONL files to an external sink.

  • Scoped API keys (TokenScope, create_scoped_api_key on AuthManager). Per-collection permission scopes attached to API keys and propagated through UserClaims. Default-deny when the scopes list is empty.

  • Key rotation (rotate_api_key on AuthManager, set_rotation_metadata on ApiKeyManager). Old keys remain valid through their grace window without an active = false revocation, enabling zero-downtime credential rollover.

  • Token introspection (introspect_token on AuthManager). Tries JWT then API key; returns a typed TokenIntrospection value regardless of token type.

  • Failover / resync helpers (vectorizer::replication::state). failover_to() and force_resync() operate on MasterNode references. LagTooHigh variant added to ReplicationError.

  • Cluster peer-add / rebalance (vectorizer::cluster::rebalance). add_peer(), rebalance(), rebalance_status() wi...

Read more

vectorizer-3.3.0

04 May 21:38

Choose a tag to compare

A Helm chart for Vectorizer - High-performance vector database

v3.2.0

01 May 22:22

Choose a tag to compare

Added

  • Bulk-upsert backpressure (#263). Three layers of bounded-resource enforcement so a fan-out producer (Cortex cortex-embedder-worker, Synap consumers, etc.) can no longer drive the server into the unbounded-CPU restart loop documented in the issue:
    1. Bounded BM25 vocabulary-build concurrency. A shared tokio::sync::Semaphore gates the CPU-heavy section of every vocab build. Default num_cpus::get(), configurable via backpressure.max_concurrent_vocab_builds.
    2. Per-collection upsert admission (REST / gRPC / MCP / UMICP). Once a collection's in-flight depth crosses backpressure.upsert_queue_hard_limit (default 1024) new upserts are refused with HTTP 429 Too Many Requests + Retry-After, gRPC RESOURCE_EXHAUSTED + retry-after metadata, or a structured MCP error { code: "queue_full", retryAfterSeconds: N }. The backpressure.upsert_queue_high_water (default 256) emits a structured warn + bumps vectorizer_upsert_rejected_total{reason="queue_high_water_warn"} but admits the request.
    3. Log rate-limiting for the WARN BM25 vocabulary is empty … line — at most one emit per collection per 5 s window, while the new vectorizer_bm25_empty_vocab_fallback_total{collection} counter retains the true volume signal.
  • Five new Prometheus metrics under vectorizer_*: upsert_queue_depth{collection}, upsert_in_flight{collection}, vocab_build_permits_available, upsert_rejected_total{reason}, bm25_empty_vocab_fallback_total{collection}. All registered automatically; surface on GET /prometheus/metrics.
  • All five first-party SDKs honor Retry-After — Rust vectorizer-sdk, Python sdks/python/, TypeScript sdks/typescript/, Go sdks/go/, and C# sdks/csharp/. Each parses the header with identical semantics (1 s default, 30 s cap, 3 retries) and surfaces a typed RateLimit / RateLimitError / VectorizerError(status=429) only after retry exhaustion. Pre-v3.2.0 clients bounced 429s into a generic 5xx.
  • Operator runbook at docs/deployment/backpressure.md and ready-to-import Grafana panels at docs/grafana/backpressure-panels.json.
  • Docker image hivehub/vectorizer:3.2.0 validated end-to-end with a smoke test (scripts/docker_smoke.py): 200 concurrent inserts against hard_limit=2 produce ~65 well-formed 429 responses (every one carrying Retry-After: 1); the vectorizer_upsert_rejected_total{reason="queue_full"} counter delta matches the observed 429 count exactly; /health stays healthy throughout the flood.

Build

  • Dashboard pnpm-lock.yaml refreshed to track the happy-dom: ">=20.8.9" override that was added in the v3.2.0 dependency refresh. Without this the docker image build fails at the dashboard stage with ERR_PNPM_OUTDATED_LOCKFILE.

Configuration

New backpressure: block in config.example.yml plus four env-var overrides: CORTEX_VECTORIZER_BACKPRESSURE_ENABLED, CORTEX_VECTORIZER_MAX_CONCURRENT_BUILDS, CORTEX_VECTORIZER_UPSERT_HIGH_WATER, CORTEX_VECTORIZER_UPSERT_HARD_LIMIT. Defaults are conservative — existing deployments inherit safe limits without touching their YAML.

vectorizer-3.2.0

01 May 22:23

Choose a tag to compare

A Helm chart for Vectorizer - High-performance vector database

v3.1.0 — honor client ids + flat chunk payload + /insert_vectors

29 Apr 21:03

Choose a tag to compare

[3.1.0] - 2026-04-29

Added

  • POST /insert_vectors — bulk-insert pre-computed embeddings with caller-supplied vector ids. Skips the embedding pipeline entirely; the request body carries the vectors as raw Vec<f32>. Useful when the client already has its own embedder, needs deterministic ids for idempotent re-ingest, or wants to upsert without going through the chunk-and-embed path. Request shape: {collection, vectors: [{id?, embedding, payload?, metadata?}], public_key?}. Response shape mirrors /insert_texts: {collection, inserted, failed, count, results: [{index, client_id, status, vector_ids}]}. Per-entry validation rejects with HTTP 400 when embedding.len() != collection.dimension, when the embedding array contains a non-numeric value, or when an explicit id violates the client-id contract (see below). Quota / Raft / cache-invalidation post-processing is shared with /insert_texts.
  • POST /insert and POST /insert_texts honor the request id field as the resulting Vector.id. Previously the id was parsed and silently echoed back as client_id in the response, but the actual stored Vector always got a fresh Uuid::new_v4() — re-ingesting the same document produced duplicates and there was no path from a logical client id to the (multiple) UUIDs spawned by chunking. Now: non-chunked inputs use the client id verbatim; chunked inputs derive <client_id>#<chunk_index> (e.g. doc:42#0, doc:42#1, ...) so re-running the same /insert_texts payload upserts in place instead of duplicating, and DELETE / POST /qdrant/.../points round-trips by client id work without a UUID lookup. Falling back to a server UUID still works when the request omits id, so existing callers that never sent the field are unchanged. Client-id contract: non-empty, length ≤ 256, no leading / trailing whitespace, must not contain # (reserved as the chunk-id separator) — violations return HTTP 400 with error_type: "validation_error".
  • payload.parent_id on chunked vectors links chunks back to the source document. Set to the request's id when provided; otherwise a single freshly-minted UUID v4 is shared across every chunk of the same /insert_texts entry. Lets clients group, count, or delete every chunk of a logical document without re-deriving membership from the _id-in-payload defensive duplicate.

Changed

  • /insert_texts chunked payload layout flipped from nested to flat — BREAKING for clients that read payload.metadata.<field> directly. Pre-3.1.0 chunks landed as {content, metadata: {file_path, chunk_index, _id, casa, ...}} — file-navigation fields and user metadata buried under a metadata sub-object. Qdrant payload filters (payload.x = "X") silently missed every chunked row because the user fields weren't at the path the filter expected, and MCP search_semantic consumers had to read result.metadata.metadata.x (two levels of nesting) instead of the obvious result.metadata.x. 3.1.0 emits a flat shape: {content, file_path, chunk_index, parent_id, _id, casa, x, ...} with every key at the root. Server-provided keys (content, file_path, chunk_index, parent_id) take precedence over any colliding keys in user metadata. Non-chunked inputs already stored metadata flat — no change there. Migration: see "Migrating from 3.0.x chunked payloads" below.

    Readers tolerate both shapes during the deprecation window. FileOperations::{get_file_content, list_files_in_collection, get_file_chunks_ordered} and file_watcher's discovery loops accept the legacy nested shape, log a tracing::debug!("…via legacy nested payload shape (deprecated since phase9 in favor of flat layout, will be removed in a future major release)") once per call, and resolve file_path from either path. mcp_tools.rs::flatten_payload_metadata (used by all four MCP search tools) lifts legacy nested keys to the root of the returned metadata map so MCP consumers can read result.metadata.<field> uniformly across new and legacy collections; the original nested object is preserved alongside the lifted keys, so consumers that explicitly read result.metadata.metadata.<field> keep working too.

    No automatic on-disk rewrite ships in 3.1.0 — collections written by ≤ 3.0.13 stay nested on disk and rely on the tolerant readers. To migrate to the flat shape, re-ingest the source data through /insert_texts against a fresh collection or use /insert_vectors if you already hold the embeddings.

Migrating from 3.0.x chunked payloads

If your client uses Qdrant payload filters or reads payload.metadata.<field> directly on chunked vectors:

  1. Audit filter paths. payload.x = "X" matched zero chunked rows on 3.0.x because the field lived at payload.metadata.x. On 3.1.0 the same filter matches new writes correctly. Old data still lives at payload.metadata.x until re-ingested.
  2. MCP consumers. Reads of result.metadata.<field> work on both new and legacy data after 3.1.0 — the MCP layer lifts nested keys to the root automatically. No code change required.
  3. Re-ingest is optional. Tolerant readers cover the legacy shape during the deprecation window. To converge a collection on the new layout, drop and re-create with /insert_texts, or use /insert_vectors with embeddings you already computed.
  4. Idempotent re-ingest. Send id in each /insert_texts entry to upsert by client id (doc:42 non-chunked, doc:42#N per chunk). Re-running the same payload now replaces in place instead of duplicating.

vectorizer-3.1.0

29 Apr 21:03

Choose a tag to compare

A Helm chart for Vectorizer - High-performance vector database

v3.0.13 — kill false-positive Raft warn + pin K8s data dir

25 Apr 14:32

Choose a tag to compare

What changes

Two small follow-ups from a live debugging session on the ermes prod cluster after the v3.0.11 upgrade.

1. Bootstrap election warn was a false positive

Post-bootstrap retry loop checked `metrics().current_leader.is_some()` to decide whether to nudge openraft. That field only populates after the leader gets a quorum-ack, but openraft elects + follows a leader (state = Follower, vote committed) for a real interval before `current_leader` lands. Healthy clusters logged `No leader after Ns — triggering election` repeatedly and called `trigger().elect()` for nothing.

Gate the nudge on `state == ServerState::Candidate` instead — the actual "no quorum" signal openraft exposes.

2. K8s StatefulSet must pin `VECTORIZER_DATA_DIR` to the PVC

`vectorizer-core::paths::data_dir()` resolves `$VECTORIZER_DATA_DIR` → `dirs::data_dir().join("vectorizer")` → `./data`. Without the env var, the per-OS Linux default (`~/.local/share/vectorizer/`) wins — and that path is inside the container's writable layer, not on the PVC. Every restart looks like first-time setup; the actual `vectorizer.vecdb` on the PVC at `/data/data/` never gets read.

Live observation: 15 collections / 1992 vectors sat unread on the ermes PVC for two release cycles. Adding `VECTORIZER_DATA_DIR=/data/data` to the StatefulSet env block recovered all of it on next restart (the legacy `auth.enc` had to be wiped so the current `VECTORIZER_JWT_SECRET` could log back in — runbook documents the recovery flow).

`k8s/statefulset-ha.yaml` now sets the env var by default and `docs/deployment/HA_KUBERNETES_RUNBOOK.md` ships with a Data directory pitfall section covering both the trap and the recovery sequence.

Versioning

5 Rust crates + Rust SDK 3.0.12 → 3.0.13 (Cargo.lock refreshed); Helm chart `appVersion 3.0.12` → `3.0.13`, chart `version 1.5.9` → `1.5.10`.

🤖 Cut by Claude Code

v3.0.12 — fastembed as vectorizer-server default + lint fix

25 Apr 04:26

Choose a tag to compare

What changes

  • `fastembed` is now a real `[features] default` of `vectorizer-server` instead of a hard-coded entry on the `vectorizer = { features = [...] }` dep declaration. Behaviour is unchanged — `cargo build --bin vectorizer` still ships with FastEmbed enabled — but the dependency now follows normal feature plumbing. Slim builds without the ONNX runtime can opt out with `cargo build --no-default-features`.
  • Format fix under `cargo +nightly fmt` for the two `resolve_leader_addr` call sites in `raft_watcher.rs` (the calls were over the 100-col limit after the v3.0.9 signature change). The `Formatter and linter` workflow has been red since 3.0.9; this restores it to green.

Versioning

5 Rust crates + Rust SDK 3.0.11 → 3.0.12 (Cargo.lock refreshed); Helm chart `appVersion 3.0.11` → `3.0.12`, chart `version 1.5.8` → `1.5.9`. The HA fixes from 3.0.9 / 3.0.10 / 3.0.11 are unchanged.

🤖 Cut by Claude Code

v3.0.11 — Stop forced elections sabotaging stable Raft leaders

25 Apr 03:29

Choose a tag to compare

What changes

fix(raft): stop sabotaging stable elections with the AddNode retry loop.

Live test of v3.0.10 with verbose openraft logs proved the cluster was electing leaders cleanly (current_leader settles, AppendEntries flow), but the leader rotated every ~10 seconds. Pattern: vt3-1 → vt3-2 → vt3-1 …, always 10s after the previous transition.

Root cause: the post-bootstrap "register all nodes" task in `bootstrap.rs` ran on every pod, and its inner loop called `raft().trigger().elect()` unconditionally on every non-leader iteration. That forced a fresh election every 10 seconds on every follower, kicking the existing leader out before its first heartbeat could renew its lease.

Gate the trigger on `current_leader.is_none()` and a 30 s warm-up. If openraft already has a leader visible, followers just wait. The trigger only fires when openraft has genuinely stalled — the only situation the original comment was trying to handle.

Versioning

5 Rust crates + Rust SDK 3.0.10 → 3.0.11 (Cargo.lock refreshed); Helm chart `appVersion 3.0.10` → `3.0.11`, chart `version 1.5.7` → `1.5.8`. Server-side only fix.

🤖 Cut by Claude Code

v3.0.10 — Raft single-node bootstrap (no split-init)

25 Apr 02:56

Choose a tag to compare

What changes

fix(raft): only the lowest-ordinal node bootstraps Raft cluster.

Live test of v3.0.9 showed `current_leader` never settles — bootstrap log says "Raft cluster initialized successfully" on every node, but no election ever wins.

Root cause: `initialize_cluster` was called on every pod simultaneously. openraft's `initialize` writes to the local log/vote independently on each node — there is no cross-cluster atomicity until membership replicates via AppendEntries. With N pods racing through `initialize`, each one writes a divergent term-1 log entry naming itself as the initial voter. Subsequent Vote RPCs are rejected on log-mismatch grounds and the cluster never converges.

Fix: gate the `initialize_cluster` call on the lowest-ordinal server id (`-0` in a Kubernetes StatefulSet). Exactly one node bootstraps; the others wait, accept the membership log entry the bootstrap node propagates, and join cleanly.

Versioning

5 Rust crates + Rust SDK 3.0.9 → 3.0.10 (Cargo.lock refreshed); Helm chart `appVersion 3.0.9` → `3.0.10`, chart `version 1.5.6` → `1.5.7`.

🤖 Cut by Claude Code