Conversation
| 1.5: TurboQuantBitSize.BITS1_5, | ||
| 1.0: TurboQuantBitSize.BITS1, | ||
| } | ||
| bits_enum = _BITS_MAP.get(qc.turbo_bits or 4.0, TurboQuantBitSize.BITS4) |
There was a problem hiding this comment.
_build_quantization_config()currently treats missing and invalid Turbo bit depths the same by defaulting both toBITS4:bits_enum = _BITS_MAP.get(qc.turbo_bits or 4.0, TurboQuantBitSize.BITS4)That creates two issues here. First, if
BITSwas omitted, QQL is explicitly forcingBITS4instead of preserving omission and letting the SDK/server default apply, even though the SDK model makesbitsoptional. Second, if an unexpectedturbo_bitsvalue ever reaches the executor, it gets silently coerced toBITS4instead of failing loudly. The parser validates current CLI input, but I’d still prefer the executor to distinguish these cases: preserveNonewhen the user omittedBITS, and raise explicitly for unsupported values rather than silently downgrading them.
There was a problem hiding this comment.
Line 10: The README intro still says quantization support is scalar, binary, product, but this PR adds TURBO as a fourth option. The top-level feature summary should be updated so the repo landing page and package description reflect the actual supported set.
line 87: still describes quantization as scalar/binary/product. Since TURBO is now supported and documented elsewhere in the PR, this should be updated to avoid inconsistency between the README summary and the actual syntax/examples below.
srimon12
left a comment
There was a problem hiding this comment.
minor executor fix and docs fixes needed.
| @@ -69,25 +69,33 @@ When `USING MODEL` is omitted, the collection uses the **default embedding model | |||
|
|
|||
| Quantization reduces the memory footprint of vector collections and speeds up search at the cost of a small, controllable accuracy loss. QQL supports all three Qdrant quantization strategies via an optional `QUANTIZE` clause appended to `CREATE COLLECTION`. | |||
There was a problem hiding this comment.
This paragraph still says QQL supports “all three” quantization strategies, but the table and examples below now document four (SCALAR, TURBO, BINARY, PRODUCT). This should be updated for consistency.
| - `1` — 1-bit, **32×** compression (same ratio as BINARY, but better recall) | ||
| - **`ALWAYS RAM`** — keep the **quantized** vectors in RAM at all times, regardless of the collection's `on_disk` setting. Improves search throughput at the cost of higher RAM usage for the compressed index. The original full-precision vectors are stored and managed independently of this flag. Supported by all four quantization types. | ||
| - **`QUANTIZE`** always appears **after** all other clauses (`HYBRID`, `USING MODEL`, etc.). | ||
| - For `PRODUCT`, the compression ratio is fixed at **4×** in this version. |
There was a problem hiding this comment.
“For PRODUCT, the compression ratio is fixed at 4× in this version” is implementation-specific and matches the current executor, but for TURBO the docs are making stronger behavioral claims than the QQL layer actually enforces. QQL only maps the user input to the SDK model here; the runtime behavior still depends on Qdrant. I’d keep the docs precise about syntax/support and avoid overcommitting on engine behavior unless we intend to validate those guarantees end-to-end.
There was a problem hiding this comment.
Package metadata still advertises quantization support as scalar, binary, product. Since this PR adds TURBO, the published package description should be updated too so PyPI metadata stays aligned with the actual feature set.
|
re-checked after fixes; earlier review comments are addressed. |
No description provided.