Releases: hivellm/vectorizer
v3.3.0
Security
-
Hardened dashboard cookies + CSRF. Closes the cookie + CSRF gaps phase8 enumerated (audit sections 1.9 + 6.5–6.6).
POST /auth/loginandPOST /auth/refreshnow set a hardenedvectorizer_sessioncookie carrying the JWT, and a siblingXSRF-TOKENcookie carrying a 32-byte random CSRF token. Both cookies are emitted withSameSite=Strict; Path=/; Max-Age=<jwt_exp>. The session cookie isHttpOnly; Secure; the CSRF cookie isSecurebut readable so the SPA can echo it.- New
require_csrf_middlewarerejectsPOST/PUT/PATCH/DELETErequests under/auth/*and/admin/*with HTTP 403 when theX-CSRF-Tokenheader is missing or does not match the token bound to the caller's session JWT.GET/HEAD/OPTIONSrequests bypass the check;/auth/loginand/auth/validate-passwordare exempt;X-API-Keyrequests are exempt (header-bearer credentials are not subject to the cross-origin attack the CSRF token defends against). POST /auth/logoutemits expiredSet-Cookieheaders for both cookies and drops the CSRF binding.- New
auth.cookies.insecure_devconfig flag (defaultfalse) drops only theSecureattribute for plain-HTTP127.0.0.1development. Boot fails with a clear error when the flag istruewhile binding to any non-loopback host (most importantly0.0.0.0). dashboard/src/lib/api-middleware.tsadds acsrfMiddlewarethat reads theXSRF-TOKENcookie and echoes it inX-CSRF-Tokenon every mutating request.- Backward compatibility: the legacy
access_tokenfield in the login response body is preserved so existing SDK callers and the dashboard'sAuthorization: Bearermiddleware continue to work; the cookie is purely additive for browser-side use.
-
Loopback dev-mode auth bypass. Closes phase8 audit gap 7.5. New
auth.dev_mode_skip_loopbackconfig flag (defaultfalse) lets local developers run the dashboard / SDK against127.0.0.1without minting a JWT or echoing tokens on every cURL.- When the flag is
true, the auth middleware short-circuits with a syntheticlocal-dev-adminprincipal (Role::Admin, empty scopes) and every response carriesX-Vectorizer-Dev-Mode: trueso tooling, logs, and integration tests can spot the bypass. The CSRF middleware no-ops in the same mode (no session JWT to bind a token to). - Boot fails fast with a clear error when the flag is
trueand the bind host is anything other than127.0.0.1,::1, orlocalhost. The loopback predicate is now centralised so the cookie boot guard and the dev-mode boot guard apply the same definition. - Boot logs a multi-line
WARNbanner ("AUTH IS DISABLED FOR LOOPBACK — DO NOT EXPOSE THIS BUILD") whenever the flag is engaged so the operator sees the security posture immediately. - Documented in
docs/users/api/AUTHENTICATION.mdunder "Local Development".
- When the flag is
Added
-
Dashboard metrics endpoints (phase25). Three additive REST surfaces so the v3.3.0 console renders real numbers instead of synthetic generators:
GET /metrics/runtime(admin-gated) — single JSON snapshot refreshed once per second. Carries process CPU / memory / RSS / total /active_connections/uptime_seconds, rolling 60 sqps_window_60s, per-routethroughput_by_route[](route + qps + p50_ms + p99_ms, sorted desc by QPS),error_rate_5xx_60s, and awal { current_seq, size_bytes, last_checkpoint_at, last_checkpoint_seq }block. Standalone (non-replicated) servers report a zero-initialised WAL block — honest about not having one. Implementation:RuntimeSamplerticks every second usingsysinfo::ProcessRefreshKind::nothing().with_cpu().with_memory()(sysinfo 0.38);LatencyAggregatorkeeps a 60 s rolling window per route; an axum middleware increments / decrements anArc<AtomicUsize>connection counter on every request.GET /statsextension — additivedefault_quantization: String(most-common quantization label across active collections — one ofnone,binary,sq-4bit,sq-8bit,sq-16bit,sq,pq) andcompression_ratio: f32(mean static ratio across collections sharing that label; PQ ratio is dimension-aware, the others are dimension-independent). Empty store reports("none", 1.0).GET /collections/{name}extension — additivevector_count_history: [{at, count}]ring buffer, capped at 60 entries, sampled lazily on each read (Collection::record_vector_count_sample()— no-op if the last sample is < 60 s old, so a static collection produces zero ongoing CPU). Sharded / GPU / DistributedSharded variants always report[].- WAL plumbing.
DurableReplicationLog::wal_snapshot()returnsWalSnapshot(re-exported fromvectorizer::replication);mark_replicated()only stampslast_checkpoint_atwhenmin_confirmed_offsetactually advances, so retried ACKs do not lie.MasterNode::wal_snapshot()is the public surface;RuntimeSampler::set_master_node()is the wiring point. docs/specs/API_REFERENCE.mddocuments both the route table entry and the full response shape (/metrics/runtime+ the/statsextension), with the standalone-mode caveat called out.
-
SDK parity for the phase25 surface (phase27). Every supported SDK gains the same typed wrappers —
RuntimeMetrics,RouteStats,WalSnapshot,VectorCountSample, plusStats.{default_quantization, compression_ratio}andCollection.vector_count_history:- Rust SDK —
VectorizerClient::get_runtime_metrics(), models insdks/rust/src/models.rs. Defaults zero-valued so older servers and standalone-mode payloads parse unchanged. - TypeScript SDK —
AdminClient.getRuntimeMetrics(), interfaces insdks/typescript/src/models/admin.ts. Every field is optional so partial payloads decode without runtime errors. - Python SDK —
AdminClient.get_runtime_metrics(), dataclasses insdks/python/models.pywithfrom_dictclassmethods that tolerate missing keys.CollectionInfo.__post_init__hydrates dict entries from**datakwargs into typedVectorCountSampleinstances. - Go SDK —
Client.GetRuntimeMetrics(), structs insdks/go/models.gowithomitemptytags on every field. - C# SDK —
VectorizerClient.GetRuntimeMetricsAsync(), classes insdks/csharp/Models/AdminModels.cswithJsonPropertyNameand sensible default initialisers. - 4 new unit tests per SDK (Rust 4, TS 7, Python 8, Go 7, C# 7 = 33 total) cover the route + decode, full + partial payloads, the new
Statsquantization fields, and thevector_count_historyround-trip.
- Rust SDK —
-
Cluster admin endpoints. Five new server routes for production cluster operations:
POST /cluster/failover— promote a replica to primary with a pre-flight WAL-lag check (returns 409 when replica lag exceedsmax_lag_segments, default 1). Residual loss window documented insrc/replication/state.rs.POST /cluster/replicas/{id}/resync— force a full snapshot + WAL replay on a lagging replica.POST /cluster/peers— add a peer node (member or observer role) to the cluster; complements the existing Qdrant-compatible remove-peer endpoint.POST /cluster/rebalance— trigger shard rebalance across all active nodes using insert-before-delete invariant; returns a job ID immediately while the moves complete asynchronously.GET /cluster/rebalance/status— poll progress of the active or last completed rebalance job.
-
Auth/RBAC admin endpoints. Four new server routes for production multi-tenant deployments:
POST /auth/keys/{id}/rotate— atomic key rotation with a configurable grace window (default 300 s). Both the old and new token are accepted during the window; only the new token is accepted after. Returns{ old_key_id, new_key_id, new_token, grace_until }.POST /auth/keys(extended body) — existing endpoint now accepts an optionalscopes: [{ collection, permissions }]array. Keys with a non-empty scopes list are collection-scoped and are denied on collections not listed. Keys with an empty list have no implicit access (default-deny). Existing global-key callers that omitscopesare unaffected.POST /auth/introspect— RFC 7662 token introspection. Accepts any JWT or API key in the request body and returns{ active, scope, sub, exp }.GET /auth/audit— admin-only audit log. Returns the most recent admin-action entries from the in-memory ring buffer (capped at 4096 entries, flushed to daily-rotated JSONL files under the backup directory every 30 s). Filterable byfrom,to,actor,actionquery parameters.
-
AuditLogger (
vectorizer::auth::audit). Non-blocking audit logger:record()sends to an unboundedmpscchannel and never blocks the handler hot-path. Background flusher drains the channel, maintains the in-memoryVecDequering, and writes to daily-rotated files. Durability SLO: at-most-once, best-effort — entries in the buffer at crash time are lost. Operators requiring a durable audit ledger should ship the JSONL files to an external sink. -
Scoped API keys (
TokenScope,create_scoped_api_keyonAuthManager). Per-collection permission scopes attached to API keys and propagated throughUserClaims. Default-deny when the scopes list is empty. -
Key rotation (
rotate_api_keyonAuthManager,set_rotation_metadataonApiKeyManager). Old keys remain valid through their grace window without anactive = falserevocation, enabling zero-downtime credential rollover. -
Token introspection (
introspect_tokenonAuthManager). Tries JWT then API key; returns a typedTokenIntrospectionvalue regardless of token type. -
Failover / resync helpers (
vectorizer::replication::state).failover_to()andforce_resync()operate onMasterNodereferences.LagTooHighvariant added toReplicationError. -
Cluster peer-add / rebalance (
vectorizer::cluster::rebalance).add_peer(),rebalance(),rebalance_status()wi...
vectorizer-3.3.0
A Helm chart for Vectorizer - High-performance vector database
v3.2.0
Added
- Bulk-upsert backpressure (#263). Three layers of bounded-resource enforcement so a fan-out producer (Cortex
cortex-embedder-worker, Synap consumers, etc.) can no longer drive the server into the unbounded-CPU restart loop documented in the issue:- Bounded BM25 vocabulary-build concurrency. A shared
tokio::sync::Semaphoregates the CPU-heavy section of every vocab build. Defaultnum_cpus::get(), configurable viabackpressure.max_concurrent_vocab_builds. - Per-collection upsert admission (REST / gRPC / MCP / UMICP). Once a collection's in-flight depth crosses
backpressure.upsert_queue_hard_limit(default 1024) new upserts are refused with HTTP429 Too Many Requests+Retry-After, gRPCRESOURCE_EXHAUSTED+retry-aftermetadata, or a structured MCP error{ code: "queue_full", retryAfterSeconds: N }. Thebackpressure.upsert_queue_high_water(default 256) emits a structured warn + bumpsvectorizer_upsert_rejected_total{reason="queue_high_water_warn"}but admits the request. - Log rate-limiting for the
WARN BM25 vocabulary is empty …line — at most one emit per collection per 5 s window, while the newvectorizer_bm25_empty_vocab_fallback_total{collection}counter retains the true volume signal.
- Bounded BM25 vocabulary-build concurrency. A shared
- Five new Prometheus metrics under
vectorizer_*:upsert_queue_depth{collection},upsert_in_flight{collection},vocab_build_permits_available,upsert_rejected_total{reason},bm25_empty_vocab_fallback_total{collection}. All registered automatically; surface onGET /prometheus/metrics. - All five first-party SDKs honor
Retry-After— Rustvectorizer-sdk, Pythonsdks/python/, TypeScriptsdks/typescript/, Gosdks/go/, and C#sdks/csharp/. Each parses the header with identical semantics (1 s default, 30 s cap, 3 retries) and surfaces a typedRateLimit/RateLimitError/VectorizerError(status=429)only after retry exhaustion. Pre-v3.2.0clients bounced 429s into a generic 5xx. - Operator runbook at
docs/deployment/backpressure.mdand ready-to-import Grafana panels atdocs/grafana/backpressure-panels.json. - Docker image
hivehub/vectorizer:3.2.0validated end-to-end with a smoke test (scripts/docker_smoke.py): 200 concurrent inserts againsthard_limit=2produce ~65 well-formed 429 responses (every one carryingRetry-After: 1); thevectorizer_upsert_rejected_total{reason="queue_full"}counter delta matches the observed 429 count exactly;/healthstayshealthythroughout the flood.
Build
- Dashboard
pnpm-lock.yamlrefreshed to track thehappy-dom: ">=20.8.9"override that was added in the v3.2.0 dependency refresh. Without this the docker image build fails at the dashboard stage withERR_PNPM_OUTDATED_LOCKFILE.
Configuration
New backpressure: block in config.example.yml plus four env-var overrides: CORTEX_VECTORIZER_BACKPRESSURE_ENABLED, CORTEX_VECTORIZER_MAX_CONCURRENT_BUILDS, CORTEX_VECTORIZER_UPSERT_HIGH_WATER, CORTEX_VECTORIZER_UPSERT_HARD_LIMIT. Defaults are conservative — existing deployments inherit safe limits without touching their YAML.
vectorizer-3.2.0
A Helm chart for Vectorizer - High-performance vector database
v3.1.0 — honor client ids + flat chunk payload + /insert_vectors
[3.1.0] - 2026-04-29
Added
POST /insert_vectors— bulk-insert pre-computed embeddings with caller-supplied vector ids. Skips the embedding pipeline entirely; the request body carries the vectors as rawVec<f32>. Useful when the client already has its own embedder, needs deterministic ids for idempotent re-ingest, or wants to upsert without going through the chunk-and-embed path. Request shape:{collection, vectors: [{id?, embedding, payload?, metadata?}], public_key?}. Response shape mirrors/insert_texts:{collection, inserted, failed, count, results: [{index, client_id, status, vector_ids}]}. Per-entry validation rejects with HTTP 400 whenembedding.len() != collection.dimension, when the embedding array contains a non-numeric value, or when an explicitidviolates the client-id contract (see below). Quota / Raft / cache-invalidation post-processing is shared with/insert_texts.POST /insertandPOST /insert_textshonor the requestidfield as the resultingVector.id. Previously theidwas parsed and silently echoed back asclient_idin the response, but the actual stored Vector always got a freshUuid::new_v4()— re-ingesting the same document produced duplicates and there was no path from a logical client id to the (multiple) UUIDs spawned by chunking. Now: non-chunked inputs use the clientidverbatim; chunked inputs derive<client_id>#<chunk_index>(e.g.doc:42#0,doc:42#1, ...) so re-running the same/insert_textspayload upserts in place instead of duplicating, andDELETE/POST /qdrant/.../pointsround-trips by client id work without a UUID lookup. Falling back to a server UUID still works when the request omitsid, so existing callers that never sent the field are unchanged. Client-id contract: non-empty, length ≤ 256, no leading / trailing whitespace, must not contain#(reserved as the chunk-id separator) — violations return HTTP 400 witherror_type: "validation_error".payload.parent_idon chunked vectors links chunks back to the source document. Set to the request'sidwhen provided; otherwise a single freshly-minted UUID v4 is shared across every chunk of the same/insert_textsentry. Lets clients group, count, or delete every chunk of a logical document without re-deriving membership from the_id-in-payload defensive duplicate.
Changed
-
/insert_textschunked payload layout flipped from nested to flat — BREAKING for clients that readpayload.metadata.<field>directly. Pre-3.1.0 chunks landed as{content, metadata: {file_path, chunk_index, _id, casa, ...}}— file-navigation fields and user metadata buried under ametadatasub-object. Qdrant payload filters (payload.x = "X") silently missed every chunked row because the user fields weren't at the path the filter expected, and MCPsearch_semanticconsumers had to readresult.metadata.metadata.x(two levels of nesting) instead of the obviousresult.metadata.x. 3.1.0 emits a flat shape:{content, file_path, chunk_index, parent_id, _id, casa, x, ...}with every key at the root. Server-provided keys (content,file_path,chunk_index,parent_id) take precedence over any colliding keys in user metadata. Non-chunked inputs already stored metadata flat — no change there. Migration: see "Migrating from 3.0.x chunked payloads" below.Readers tolerate both shapes during the deprecation window.
FileOperations::{get_file_content, list_files_in_collection, get_file_chunks_ordered}andfile_watcher's discovery loops accept the legacy nested shape, log atracing::debug!("…via legacy nested payload shape (deprecated since phase9 in favor of flat layout, will be removed in a future major release)")once per call, and resolvefile_pathfrom either path.mcp_tools.rs::flatten_payload_metadata(used by all four MCP search tools) lifts legacy nested keys to the root of the returnedmetadatamap so MCP consumers can readresult.metadata.<field>uniformly across new and legacy collections; the original nested object is preserved alongside the lifted keys, so consumers that explicitly readresult.metadata.metadata.<field>keep working too.No automatic on-disk rewrite ships in 3.1.0 — collections written by ≤ 3.0.13 stay nested on disk and rely on the tolerant readers. To migrate to the flat shape, re-ingest the source data through
/insert_textsagainst a fresh collection or use/insert_vectorsif you already hold the embeddings.
Migrating from 3.0.x chunked payloads
If your client uses Qdrant payload filters or reads payload.metadata.<field> directly on chunked vectors:
- Audit filter paths.
payload.x = "X"matched zero chunked rows on 3.0.x because the field lived atpayload.metadata.x. On 3.1.0 the same filter matches new writes correctly. Old data still lives atpayload.metadata.xuntil re-ingested. - MCP consumers. Reads of
result.metadata.<field>work on both new and legacy data after 3.1.0 — the MCP layer lifts nested keys to the root automatically. No code change required. - Re-ingest is optional. Tolerant readers cover the legacy shape during the deprecation window. To converge a collection on the new layout, drop and re-create with
/insert_texts, or use/insert_vectorswith embeddings you already computed. - Idempotent re-ingest. Send
idin each/insert_textsentry to upsert by client id (doc:42non-chunked,doc:42#Nper chunk). Re-running the same payload now replaces in place instead of duplicating.
vectorizer-3.1.0
A Helm chart for Vectorizer - High-performance vector database
v3.0.13 — kill false-positive Raft warn + pin K8s data dir
What changes
Two small follow-ups from a live debugging session on the ermes prod cluster after the v3.0.11 upgrade.
1. Bootstrap election warn was a false positive
Post-bootstrap retry loop checked `metrics().current_leader.is_some()` to decide whether to nudge openraft. That field only populates after the leader gets a quorum-ack, but openraft elects + follows a leader (state = Follower, vote committed) for a real interval before `current_leader` lands. Healthy clusters logged `No leader after Ns — triggering election` repeatedly and called `trigger().elect()` for nothing.
Gate the nudge on `state == ServerState::Candidate` instead — the actual "no quorum" signal openraft exposes.
2. K8s StatefulSet must pin `VECTORIZER_DATA_DIR` to the PVC
`vectorizer-core::paths::data_dir()` resolves `$VECTORIZER_DATA_DIR` → `dirs::data_dir().join("vectorizer")` → `./data`. Without the env var, the per-OS Linux default (`~/.local/share/vectorizer/`) wins — and that path is inside the container's writable layer, not on the PVC. Every restart looks like first-time setup; the actual `vectorizer.vecdb` on the PVC at `/data/data/` never gets read.
Live observation: 15 collections / 1992 vectors sat unread on the ermes PVC for two release cycles. Adding `VECTORIZER_DATA_DIR=/data/data` to the StatefulSet env block recovered all of it on next restart (the legacy `auth.enc` had to be wiped so the current `VECTORIZER_JWT_SECRET` could log back in — runbook documents the recovery flow).
`k8s/statefulset-ha.yaml` now sets the env var by default and `docs/deployment/HA_KUBERNETES_RUNBOOK.md` ships with a Data directory pitfall section covering both the trap and the recovery sequence.
Versioning
5 Rust crates + Rust SDK 3.0.12 → 3.0.13 (Cargo.lock refreshed); Helm chart `appVersion 3.0.12` → `3.0.13`, chart `version 1.5.9` → `1.5.10`.
🤖 Cut by Claude Code
v3.0.12 — fastembed as vectorizer-server default + lint fix
What changes
- `fastembed` is now a real `[features] default` of `vectorizer-server` instead of a hard-coded entry on the `vectorizer = { features = [...] }` dep declaration. Behaviour is unchanged — `cargo build --bin vectorizer` still ships with FastEmbed enabled — but the dependency now follows normal feature plumbing. Slim builds without the ONNX runtime can opt out with `cargo build --no-default-features`.
- Format fix under `cargo +nightly fmt` for the two `resolve_leader_addr` call sites in `raft_watcher.rs` (the calls were over the 100-col limit after the v3.0.9 signature change). The `Formatter and linter` workflow has been red since 3.0.9; this restores it to green.
Versioning
5 Rust crates + Rust SDK 3.0.11 → 3.0.12 (Cargo.lock refreshed); Helm chart `appVersion 3.0.11` → `3.0.12`, chart `version 1.5.8` → `1.5.9`. The HA fixes from 3.0.9 / 3.0.10 / 3.0.11 are unchanged.
🤖 Cut by Claude Code
v3.0.11 — Stop forced elections sabotaging stable Raft leaders
What changes
fix(raft): stop sabotaging stable elections with the AddNode retry loop.
Live test of v3.0.10 with verbose openraft logs proved the cluster was electing leaders cleanly (current_leader settles, AppendEntries flow), but the leader rotated every ~10 seconds. Pattern: vt3-1 → vt3-2 → vt3-1 …, always 10s after the previous transition.
Root cause: the post-bootstrap "register all nodes" task in `bootstrap.rs` ran on every pod, and its inner loop called `raft().trigger().elect()` unconditionally on every non-leader iteration. That forced a fresh election every 10 seconds on every follower, kicking the existing leader out before its first heartbeat could renew its lease.
Gate the trigger on `current_leader.is_none()` and a 30 s warm-up. If openraft already has a leader visible, followers just wait. The trigger only fires when openraft has genuinely stalled — the only situation the original comment was trying to handle.
Versioning
5 Rust crates + Rust SDK 3.0.10 → 3.0.11 (Cargo.lock refreshed); Helm chart `appVersion 3.0.10` → `3.0.11`, chart `version 1.5.7` → `1.5.8`. Server-side only fix.
🤖 Cut by Claude Code
v3.0.10 — Raft single-node bootstrap (no split-init)
What changes
fix(raft): only the lowest-ordinal node bootstraps Raft cluster.
Live test of v3.0.9 showed `current_leader` never settles — bootstrap log says "Raft cluster initialized successfully" on every node, but no election ever wins.
Root cause: `initialize_cluster` was called on every pod simultaneously. openraft's `initialize` writes to the local log/vote independently on each node — there is no cross-cluster atomicity until membership replicates via AppendEntries. With N pods racing through `initialize`, each one writes a divergent term-1 log entry naming itself as the initial voter. Subsequent Vote RPCs are rejected on log-mismatch grounds and the cluster never converges.
Fix: gate the `initialize_cluster` call on the lowest-ordinal server id (`-0` in a Kubernetes StatefulSet). Exactly one node bootstraps; the others wait, accept the membership log entry the bootstrap node propagates, and join cleanly.
Versioning
5 Rust crates + Rust SDK 3.0.9 → 3.0.10 (Cargo.lock refreshed); Helm chart `appVersion 3.0.9` → `3.0.10`, chart `version 1.5.6` → `1.5.7`.
🤖 Cut by Claude Code