feat: Add ALTER statement support and enhance collection configuration#34
Merged
Merged
Conversation
- Introduced ALTER statement handling in the lexer and parser, allowing modifications to existing collections. - Expanded the TokenKind enum to include new keywords: VECTORS, OPTIMIZERS, PARAMS, and DISABLED. - Enhanced collection configuration parsing to support new configuration blocks for VECTORS, OPTIMIZERS, and PARAMS. - Updated the CreateCollectionStmt and AlterCollectionStmt to accept and process new configuration parameters. - Improved error handling for duplicate configuration blocks and invalid parameter usage. - Enhanced tests to cover new functionality, including parsing and execution of ALTER statements and collection configurations. - Updated comment stripping logic to preserve double dashes within string literals.
pavanjava
approved these changes
May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
End-to-End Collection Configuration Cleanup and ALTER Support
Overview
Replaces the legacy flat
CollectionConfigdataclass with clean nested block configs (VectorsConfig,HnswRuntimeConfig,OptimizersRuntimeConfig,CollectionParamsConfig), introduces first-classALTER COLLECTIONsupport with the same block structure, and aligns the dumper, SHOW diagnostics, script runner, and CLI help surface to match. The contract is now a single clean path: CREATE and ALTER share identical config block syntax but enforce create-only vs alter-only param boundaries at parse time.New Syntax
On CREATE:
CREATE COLLECTION docs WITH VECTORS { on_disk: true } WITH HNSW { m: 32, ef_construct: 200, full_scan_threshold: 10000, payload_m: 24, inline_storage: false } WITH OPTIMIZERS { indexing_threshold: 10000, memmap_threshold: 20000, deleted_threshold: 0.2, max_optimization_threads: 'auto' } WITH PARAMS { replication_factor: 2, write_consistency_factor: 1, on_disk_payload: true } QUANTIZE SCALAR ALWAYS RAMConfig blocks come before optional
QUANTIZE.WITH COLLECTION { ... }is removed; the block name is nowWITH PARAMS { ... }.On ALTER:
ALTER COLLECTION docs WITH HNSW { full_scan_threshold: 5000 } WITH PARAMS { read_fan_out_factor: 4, on_disk_payload: false } QUANTIZE DISABLEDAt least one
WITHblock orQUANTIZEclause is required.read_fan_out_factorandread_fan_out_delay_msare rejected at parse time on CREATE (alter-only params).QUANTIZE DISABLEDdisables quantization viaDisabled.DISABLEDin the SDK.Architecture
src/qql/ast_nodes.pyCollectionConfigwith nested frozen dataclasses:VectorsConfig,HnswRuntimeConfig,OptimizersRuntimeConfig,CollectionParamsConfig,QuantizationUpdate.AlterCollectionStmtnow carries optionalconfigandquantization.src/qql/lexer.pyVECTORS,OPTIMIZERS,PARAMS,DISABLED.src/qql/parser.pyWITH <block> { ... }blocks in any order (deduplicated by block type).for_alterflag propagates from statement parser through config-block parsing to toggle create-vs-alter validation. Range validation added forhnsw.m >= 4,deleted_threshold[0.0–1.0],max_optimization_threadsint or'auto'.QUANTIZEis always parsed last after allWITHblocks, enforcing a clean ordering contract.src/qql/executor.py_build_hnsw_config→HnswConfigDiff,_build_optimizers_config→OptimizersConfigDiff(withMaxOptimizationThreadsSetting.AUTO),_build_collection_params_diff→CollectionParamsDiff,_build_alter_quantization_config→Disabled.DISABLEDor typed quantization._build_vectors_config_difflooks up the dense vector name for named-vector collections._build_dense_point_vectorwraps plain vectors as{"dense": ...}for named-vector collections during INSERT._collection_is_hybridnow checks sparse-vector presence, notisinstance(vectors, dict), aligned with the dumper.Fixes & Improvements
on_disk,hnsw.inline_storage,read_fan_out_factor,read_fan_out_delay_ms,on_disk_payload. "Payload indexes: none" correctly gated on emptypayload_schema, noton_disk_payload. Removed duplicate replication/write-consistency lines.dump_collectionnow emits the fullCREATE COLLECTION ... WITH ... WITH ... QUANTIZE ...line preserving HNSW, OPTIMIZERS, PARAMS, VECTORS, and quantization config. Hybrid detection uses sparse-vector presence, consistent with the executor.ALTERadded to_STMT_STARTERS.strip_commentsis now string-aware —--inside string literals is preserved.WITHblocks andQUANTIZE DISABLED.read_fan_out_factorandread_fan_out_delay_msare rejected at parse time on CREATE, not at executor runtime.DISABLEDtokenTokenKind.Behavioral Changes
WITH COLLECTION { ... }with flat configWITH PARAMS { ... }with nested block configALTER COLLECTIONwith WITH blocks and QUANTIZE{"dense": ...}payload for named-vector collectionsCREATE COLLECTION name HYBRID--inside string literals truncated by comment strippingisinstance(vectors, dict), dumper checkedsparse_vectorsRegression Coverage
tests/test_parser.py— Multi-block CREATE/ALTER parsing, duplicate-block rejection, parse-time rejection of alter-only PARAMS on CREATE, unknown-key errors per block, range validation.tests/test_executor.py— All config blocks routed to correct SDK models for CREATE and ALTER, quantization create/alter/disable,inline_storagevisibility, named-dense INSERT payload format, sparse-vector-based hybrid detection, no duplicate SHOW formatting.tests/test_dumper.py— Config clauses and quantization clauses emitted in generated CREATE statement.tests/test_lexer.py—VECTORS,OPTIMIZERS,PARAMS,DISABLEDtokenization.tests/test_script.py— ALTER as top-level statement split, string-aware comment stripping.Intentional Boundaries (Not In Scope)
"dense"convention (roadmap gap ).read_fan_out_factor/read_fan_out_delay_msfrom the emitted CREATE line — those are alter-only runtime controls in the current contract.VECTORSblock only exposeson_disk; per-vector HNSW, quantization, and multivector config remain underexposed.