Skip to content

new quantization implementation#23

Merged
srimon12 merged 3 commits into
mainfrom
qql14
May 12, 2026
Merged

new quantization implementation#23
srimon12 merged 3 commits into
mainfrom
qql14

Conversation

@pavanjava
Copy link
Copy Markdown
Owner

No description provided.

@pavanjava pavanjava requested a review from srimon12 May 12, 2026 00:34
Comment thread src/qql/executor.py Outdated
1.5: TurboQuantBitSize.BITS1_5,
1.0: TurboQuantBitSize.BITS1,
}
bits_enum = _BITS_MAP.get(qc.turbo_bits or 4.0, TurboQuantBitSize.BITS4)
Copy link
Copy Markdown
Collaborator

@srimon12 srimon12 May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_build_quantization_config() currently treats missing and invalid Turbo bit depths the same by defaulting both to BITS4:

bits_enum = _BITS_MAP.get(qc.turbo_bits or 4.0, TurboQuantBitSize.BITS4)

That creates two issues here. First, if BITS was omitted, QQL is explicitly forcing BITS4 instead of preserving omission and letting the SDK/server default apply, even though the SDK model makes bits optional. Second, if an unexpected turbo_bits value ever reaches the executor, it gets silently coerced to BITS4 instead of failing loudly. The parser validates current CLI input, but I’d still prefer the executor to distinguish these cases: preserve None when the user omitted BITS, and raise explicitly for unsupported values rather than silently downgrading them.

Comment thread README.md
Copy link
Copy Markdown
Collaborator

@srimon12 srimon12 May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 10: The README intro still says quantization support is scalar, binary, product, but this PR adds TURBO as a fourth option. The top-level feature summary should be updated so the repo landing page and package description reflect the actual supported set.
line 87: still describes quantization as scalar/binary/product. Since TURBO is now supported and documented elsewhere in the PR, this should be updated to avoid inconsistency between the README summary and the actual syntax/examples below.

Copy link
Copy Markdown
Collaborator

@srimon12 srimon12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor executor fix and docs fixes needed.

Comment thread docs/collections.md Outdated
@@ -69,25 +69,33 @@ When `USING MODEL` is omitted, the collection uses the **default embedding model

Quantization reduces the memory footprint of vector collections and speeds up search at the cost of a small, controllable accuracy loss. QQL supports all three Qdrant quantization strategies via an optional `QUANTIZE` clause appended to `CREATE COLLECTION`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph still says QQL supports “all three” quantization strategies, but the table and examples below now document four (SCALAR, TURBO, BINARY, PRODUCT). This should be updated for consistency.

Comment thread docs/collections.md
- `1` — 1-bit, **32×** compression (same ratio as BINARY, but better recall)
- **`ALWAYS RAM`** — keep the **quantized** vectors in RAM at all times, regardless of the collection's `on_disk` setting. Improves search throughput at the cost of higher RAM usage for the compressed index. The original full-precision vectors are stored and managed independently of this flag. Supported by all four quantization types.
- **`QUANTIZE`** always appears **after** all other clauses (`HYBRID`, `USING MODEL`, etc.).
- For `PRODUCT`, the compression ratio is fixed at **4×** in this version.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“For PRODUCT, the compression ratio is fixed at 4× in this version” is implementation-specific and matches the current executor, but for TURBO the docs are making stronger behavioral claims than the QQL layer actually enforces. QQL only maps the user input to the SDK model here; the runtime behavior still depends on Qdrant. I’d keep the docs precise about syntax/support and avoid overcommitting on engine behavior unless we intend to validate those guarantees end-to-end.

Comment thread pyproject.toml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package metadata still advertises quantization support as scalar, binary, product. Since this PR adds TURBO, the published package description should be updated too so PyPI metadata stays aligned with the actual feature set.

@srimon12
Copy link
Copy Markdown
Collaborator

re-checked after fixes; earlier review comments are addressed.

@srimon12 srimon12 merged commit 8bc85fb into main May 12, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants