Skip to content

feat: Add ALTER statement support and enhance collection configuration#34

Merged
pavanjava merged 2 commits into
pavanjava:mainfrom
srimon12:feat/alter-enchance
May 18, 2026
Merged

feat: Add ALTER statement support and enhance collection configuration#34
pavanjava merged 2 commits into
pavanjava:mainfrom
srimon12:feat/alter-enchance

Conversation

@srimon12
Copy link
Copy Markdown
Collaborator

End-to-End Collection Configuration Cleanup and ALTER Support

Overview

Replaces the legacy flat CollectionConfig dataclass with clean nested block configs (VectorsConfig, HnswRuntimeConfig, OptimizersRuntimeConfig, CollectionParamsConfig), introduces first-class ALTER COLLECTION support with the same block structure, and aligns the dumper, SHOW diagnostics, script runner, and CLI help surface to match. The contract is now a single clean path: CREATE and ALTER share identical config block syntax but enforce create-only vs alter-only param boundaries at parse time.


New Syntax

On CREATE:

CREATE COLLECTION docs
WITH VECTORS { on_disk: true }
WITH HNSW { m: 32, ef_construct: 200, full_scan_threshold: 10000, payload_m: 24, inline_storage: false }
WITH OPTIMIZERS { indexing_threshold: 10000, memmap_threshold: 20000, deleted_threshold: 0.2, max_optimization_threads: 'auto' }
WITH PARAMS { replication_factor: 2, write_consistency_factor: 1, on_disk_payload: true }
QUANTIZE SCALAR ALWAYS RAM

Config blocks come before optional QUANTIZE. WITH COLLECTION { ... } is removed; the block name is now WITH PARAMS { ... }.

On ALTER:

ALTER COLLECTION docs
WITH HNSW { full_scan_threshold: 5000 }
WITH PARAMS { read_fan_out_factor: 4, on_disk_payload: false }
QUANTIZE DISABLED

At least one WITH block or QUANTIZE clause is required. read_fan_out_factor and read_fan_out_delay_ms are rejected at parse time on CREATE (alter-only params). QUANTIZE DISABLED disables quantization via Disabled.DISABLED in the SDK.


Architecture

File Change
src/qql/ast_nodes.py Replaced flat CollectionConfig with nested frozen dataclasses: VectorsConfig, HnswRuntimeConfig, OptimizersRuntimeConfig, CollectionParamsConfig, QuantizationUpdate. AlterCollectionStmt now carries optional config and quantization.
src/qql/lexer.py Added keyword tokens: VECTORS, OPTIMIZERS, PARAMS, DISABLED.
src/qql/parser.py Parses WITH <block> { ... } blocks in any order (deduplicated by block type). for_alter flag propagates from statement parser through config-block parsing to toggle create-vs-alter validation. Range validation added for hnsw.m >= 4, deleted_threshold [0.0–1.0], max_optimization_threads int or 'auto'. QUANTIZE is always parsed last after all WITH blocks, enforcing a clean ordering contract.
src/qql/executor.py Block-level SDK routing: _build_hnsw_configHnswConfigDiff, _build_optimizers_configOptimizersConfigDiff (with MaxOptimizationThreadsSetting.AUTO), _build_collection_params_diffCollectionParamsDiff, _build_alter_quantization_configDisabled.DISABLED or typed quantization. _build_vectors_config_diff looks up the dense vector name for named-vector collections. _build_dense_point_vector wraps plain vectors as {"dense": ...} for named-vector collections during INSERT. _collection_is_hybrid now checks sparse-vector presence, not isinstance(vectors, dict), aligned with the dumper.

Fixes & Improvements

Area Description
SHOW COLLECTION Now exposes per-vector on_disk, hnsw.inline_storage, read_fan_out_factor, read_fan_out_delay_ms, on_disk_payload. "Payload indexes: none" correctly gated on empty payload_schema, not on_disk_payload. Removed duplicate replication/write-consistency lines.
Dumper dump_collection now emits the full CREATE COLLECTION ... WITH ... WITH ... QUANTIZE ... line preserving HNSW, OPTIMIZERS, PARAMS, VECTORS, and quantization config. Hybrid detection uses sparse-vector presence, consistent with the executor.
Script runner ALTER added to _STMT_STARTERS. strip_comments is now string-aware — -- inside string literals is preserved.
CLI help ALTER COLLECTION section expanded with full key lists for all WITH blocks and QUANTIZE DISABLED.
CREATE vs ALTER param boundary read_fan_out_factor and read_fan_out_delay_ms are rejected at parse time on CREATE, not at executor runtime.
DISABLED token Promoted from raw identifier check to a proper lexer TokenKind.

Behavioral Changes

Before After
WITH COLLECTION { ... } with flat config WITH PARAMS { ... } with nested block config
No ALTER support Full ALTER COLLECTION with WITH blocks and QUANTIZE
Named-vector INSERT assumed any dict-vectors collection was hybrid Correctly distinguishes hybrid (sparse present) from named-dense-only; sends {"dense": ...} payload for named-vector collections
Dumper wrote bare CREATE COLLECTION name HYBRID Dumper emits full config + quantization in CREATE statement
-- inside string literals truncated by comment stripping Preserved
Hybrid detection: executor checked isinstance(vectors, dict), dumper checked sparse_vectors Both now check sparse-vector presence

Regression Coverage

  • tests/test_parser.py — Multi-block CREATE/ALTER parsing, duplicate-block rejection, parse-time rejection of alter-only PARAMS on CREATE, unknown-key errors per block, range validation.
  • tests/test_executor.py — All config blocks routed to correct SDK models for CREATE and ALTER, quantization create/alter/disable, inline_storage visibility, named-dense INSERT payload format, sparse-vector-based hybrid detection, no duplicate SHOW formatting.
  • tests/test_dumper.py — Config clauses and quantization clauses emitted in generated CREATE statement.
  • tests/test_lexer.pyVECTORS, OPTIMIZERS, PARAMS, DISABLED tokenization.
  • tests/test_script.py — ALTER as top-level statement split, string-aware comment stripping.
  • Full suite: 573 tests passing.

Intentional Boundaries (Not In Scope)

  • No WAL, strict-mode, or sharding-key syntax added.
  • No named-vector generalization beyond QQL's hardcoded "dense" convention (roadmap gap ).
  • Dumper intentionally omits read_fan_out_factor / read_fan_out_delay_ms from the emitted CREATE line — those are alter-only runtime controls in the current contract.
  • VECTORS block only exposes on_disk; per-vector HNSW, quantization, and multivector config remain underexposed.

- Introduced ALTER statement handling in the lexer and parser, allowing modifications to existing collections.
- Expanded the TokenKind enum to include new keywords: VECTORS, OPTIMIZERS, PARAMS, and DISABLED.
- Enhanced collection configuration parsing to support new configuration blocks for VECTORS, OPTIMIZERS, and PARAMS.
- Updated the CreateCollectionStmt and AlterCollectionStmt to accept and process new configuration parameters.
- Improved error handling for duplicate configuration blocks and invalid parameter usage.
- Enhanced tests to cover new functionality, including parsing and execution of ALTER statements and collection configurations.
- Updated comment stripping logic to preserve double dashes within string literals.
@srimon12 srimon12 requested a review from pavanjava May 18, 2026 11:40
@pavanjava pavanjava merged commit 7b7d434 into pavanjava:main May 18, 2026
2 checks passed
@srimon12 srimon12 deleted the feat/alter-enchance branch May 18, 2026 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants