new quantization implementation by pavanjava · Pull Request #23 · pavanjava/qql

pavanjava · 2026-05-12T00:33:46Z

No description provided.

srimon12 · 2026-05-12T16:38:00Z

+                1.5: TurboQuantBitSize.BITS1_5,
+                1.0: TurboQuantBitSize.BITS1,
+            }
+            bits_enum = _BITS_MAP.get(qc.turbo_bits or 4.0, TurboQuantBitSize.BITS4)


_build_quantization_config() currently treats missing and invalid Turbo bit depths the same by defaulting both to BITS4:

bits_enum = _BITS_MAP.get(qc.turbo_bits or 4.0, TurboQuantBitSize.BITS4)

That creates two issues here. First, if BITS was omitted, QQL is explicitly forcing BITS4 instead of preserving omission and letting the SDK/server default apply, even though the SDK model makes bits optional. Second, if an unexpected turbo_bits value ever reaches the executor, it gets silently coerced to BITS4 instead of failing loudly. The parser validates current CLI input, but I’d still prefer the executor to distinguish these cases: preserve None when the user omitted BITS, and raise explicitly for unsupported values rather than silently downgrading them.

srimon12 · 2026-05-12T16:40:38Z

Line 10: The README intro still says quantization support is scalar, binary, product, but this PR adds TURBO as a fourth option. The top-level feature summary should be updated so the repo landing page and package description reflect the actual supported set.
line 87: still describes quantization as scalar/binary/product. Since TURBO is now supported and documented elsewhere in the PR, this should be updated to avoid inconsistency between the README summary and the actual syntax/examples below.

srimon12

minor executor fix and docs fixes needed.

srimon12 · 2026-05-12T16:41:43Z

@@ -69,25 +69,33 @@ When `USING MODEL` is omitted, the collection uses the **default embedding model

 Quantization reduces the memory footprint of vector collections and speeds up search at the cost of a small, controllable accuracy loss. QQL supports all three Qdrant quantization strategies via an optional `QUANTIZE` clause appended to `CREATE COLLECTION`.


This paragraph still says QQL supports “all three” quantization strategies, but the table and examples below now document four (SCALAR, TURBO, BINARY, PRODUCT). This should be updated for consistency.

srimon12 · 2026-05-12T16:43:06Z

+  - `1` — 1-bit, **32×** compression (same ratio as BINARY, but better recall)
+- **`ALWAYS RAM`** — keep the **quantized** vectors in RAM at all times, regardless of the collection's `on_disk` setting. Improves search throughput at the cost of higher RAM usage for the compressed index. The original full-precision vectors are stored and managed independently of this flag. Supported by all four quantization types.
 - **`QUANTIZE`** always appears **after** all other clauses (`HYBRID`, `USING MODEL`, etc.).
 - For `PRODUCT`, the compression ratio is fixed at **4×** in this version.


“For PRODUCT, the compression ratio is fixed at 4× in this version” is implementation-specific and matches the current executor, but for TURBO the docs are making stronger behavioral claims than the QQL layer actually enforces. QQL only maps the user input to the SDK model here; the runtime behavior still depends on Qdrant. I’d keep the docs precise about syntax/support and avoid overcommitting on engine behavior unless we intend to validate those guarantees end-to-end.

srimon12 · 2026-05-12T16:43:26Z

Package metadata still advertises quantization support as scalar, binary, product. Since this PR adds TURBO, the published package description should be updated too so PyPI metadata stays aligned with the actual feature set.

srimon12 · 2026-05-12T17:21:55Z

re-checked after fixes; earlier review comments are addressed.

new quantization implementation

8de7a7e

pavanjava requested a review from srimon12 May 12, 2026 00:34

srimon12 reviewed May 12, 2026

View reviewed changes

pavanjava added 2 commits May 12, 2026 22:43

fixed the comments

ab8b7e2

fixed the version for implemented the major feature

1921755

srimon12 approved these changes May 12, 2026

View reviewed changes

srimon12 merged commit 8bc85fb into main May 12, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new quantization implementation#23

new quantization implementation#23
srimon12 merged 3 commits into
mainfrom
qql14

pavanjava commented May 12, 2026

Uh oh!

srimon12 May 12, 2026 •

edited

Loading

Uh oh!

srimon12 May 12, 2026 •

edited

Loading

Uh oh!

srimon12 left a comment

Uh oh!

srimon12 May 12, 2026

Uh oh!

srimon12 May 12, 2026

Uh oh!

srimon12 May 12, 2026

Uh oh!

srimon12 commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -69,25 +69,33 @@ When `USING MODEL` is omitted, the collection uses the **default embedding model

		Quantization reduces the memory footprint of vector collections and speeds up search at the cost of a small, controllable accuracy loss. QQL supports all three Qdrant quantization strategies via an optional `QUANTIZE` clause appended to `CREATE COLLECTION`.

Conversation

pavanjava commented May 12, 2026

Uh oh!

srimon12 May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srimon12 May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srimon12 left a comment

Choose a reason for hiding this comment

Uh oh!

srimon12 May 12, 2026

Choose a reason for hiding this comment

Uh oh!

srimon12 May 12, 2026

Choose a reason for hiding this comment

Uh oh!

srimon12 May 12, 2026

Choose a reason for hiding this comment

Uh oh!

srimon12 commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

srimon12 May 12, 2026 •

edited

Loading

srimon12 May 12, 2026 •

edited

Loading