pavanjava · srimon12 · May 12, 2026 · May 12, 2026 · May 12, 2026 · May 12, 2026
diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@
 [![MIT License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
 [![Tests](https://img.shields.io/badge/tests-375%20passing-brightgreen)](tests/)
 
-Write `INSERT`, `SEARCH`, `RECOMMEND`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, binary, product), SQL-style `WHERE` filters, script execution, and collection dump/restore.
+Write `INSERT`, `SEARCH`, `RECOMMEND`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, turbo, binary, product), SQL-style `WHERE` filters, script execution, and collection dump/restore.
 
 ```
 qql> INSERT INTO COLLECTION notes VALUES {'text': 'Qdrant is a vector database', 'author': 'alice', 'year': 2024}
@@ -84,7 +84,7 @@ Full documentation lives in the [`docs/`](docs/) folder and at **[pavanjava.gith
 | [INSERT / INSERT BULK](docs/insert.md) | Adding documents, batch inserts, payload types |
 | [SEARCH / RECOMMEND / Hybrid / RERANK](docs/search.md) | Semantic search, hybrid, reranking, recommendations |
 | [WHERE Filters](docs/filters.md) | Full SQL-style filter operators |
-| [Collections & Quantization](docs/collections.md) | CREATE, DROP, QUANTIZE (scalar/binary/product), CREATE INDEX |
+| [Collections & Quantization](docs/collections.md) | CREATE, DROP, QUANTIZE (scalar/turbo/binary/product), CREATE INDEX |
 | [Scripts: EXECUTE / DUMP](docs/scripts.md) | Script files, collection backup/restore |
 | [Programmatic Usage](docs/programmatic.md) | Use QQL as a Python library |
 | [Reference: Models / Config / Errors](docs/reference.md) | Embedding models, config file, error reference |
@@ -111,6 +111,9 @@ RECOMMEND FROM articles POSITIVE IDS (1001, 1002) LIMIT 5
 CREATE COLLECTION articles
 CREATE COLLECTION articles HYBRID
 CREATE COLLECTION articles QUANTIZE SCALAR
+CREATE COLLECTION articles QUANTIZE TURBO
+CREATE COLLECTION articles QUANTIZE TURBO BITS 2
+CREATE COLLECTION articles QUANTIZE TURBO BITS 1.5 ALWAYS RAM
 CREATE INDEX ON COLLECTION articles FOR year TYPE integer
 SHOW COLLECTIONS
 DROP COLLECTION articles

diff --git a/docs/collections.md b/docs/collections.md
@@ -67,27 +67,38 @@ When `USING MODEL` is omitted, the collection uses the **default embedding model
 
 ## Quantization — QUANTIZE clause
 
-Quantization reduces the memory footprint of vector collections and speeds up search at the cost of a small, controllable accuracy loss. QQL supports all three Qdrant quantization strategies via an optional `QUANTIZE` clause appended to `CREATE COLLECTION`.
+Quantization reduces the memory footprint of vector collections and speeds up search at the cost of a small, controllable accuracy loss. QQL supports all four Qdrant quantization strategies via an optional `QUANTIZE` clause appended to `CREATE COLLECTION`.
 
-**Three strategies:**
+**Four strategies:**
 
-| Type | Compression | Accuracy Loss | Best For |
+| Type | Compression | Accuracy | Best For |
 |---|---|---|---|
-| `SCALAR` | 4× (float32 → int8) | < 1% | Most collections — best balance |
-| `BINARY` | 32× (float32 → 1-bit) | Higher | High-dimensional vectors (768+), speed priority |
+| `SCALAR` | 4× (float32 → int8) | < 1% loss | Most collections — best balance |
+| `TURBO` | 8–32× (4-bit to 1-bit) | Low–medium | Better recall than BINARY at same storage budget |
+| `BINARY` | 32× (float32 → 1-bit) | Higher loss | Speed priority; centered distributions only |
 | `PRODUCT` | 4× (configurable) | Variable | Memory-constrained deployments |
 
 **Full syntax:**
 ```
 CREATE COLLECTION <name> ... QUANTIZE SCALAR [QUANTILE <0.0–1.0>] [ALWAYS RAM]
+CREATE COLLECTION <name> ... QUANTIZE TURBO  [BITS <1|1.5|2|4>]   [ALWAYS RAM]
 CREATE COLLECTION <name> ... QUANTIZE BINARY  [ALWAYS RAM]
 CREATE COLLECTION <name> ... QUANTIZE PRODUCT [ALWAYS RAM]
 ```
 
-- **`QUANTILE <float>`** — (scalar only) calibration quantile for the INT8 conversion; defaults to Qdrant's built-in default (0.99) when omitted.
-- **`ALWAYS RAM`** — keep the **quantized** vectors in RAM at all times, regardless of the collection's `on_disk` setting. Improves search throughput at the cost of higher RAM usage for the compressed index. The original full-precision vectors are stored and managed independently of this flag. Supported by all three quantization types.
+- **`QUANTILE <float>`** — (SCALAR only) calibration quantile for the INT8 conversion; defaults to Qdrant's built-in default (0.99) when omitted.
+- **`BITS <depth>`** — (TURBO only) bit depth passed to the Qdrant SDK:
+  - `4` — 4-bit (default when `BITS` is omitted; server applies its own default)
+  - `2` — 2-bit
+  - `1.5` — 1.5-bit
+  - `1` — 1-bit
+  > Compression ratios (8×, 16×, 24×, 32×) and recall characteristics are
+  > Qdrant server-side behaviors. QQL maps the `BITS` value to the SDK model and
+  > passes it to Qdrant; actual results depend on your Qdrant server version.
+- **`ALWAYS RAM`** — keep the **quantized** vectors in RAM at all times, regardless of the collection's `on_disk` setting. Improves search throughput at the cost of higher RAM usage for the compressed index. The original full-precision vectors are stored and managed independently of this flag. Supported by all four quantization types.
 - **`QUANTIZE`** always appears **after** all other clauses (`HYBRID`, `USING MODEL`, etc.).
 - For `PRODUCT`, the compression ratio is fixed at **4×** in this version.
+- For `TURBO`, Cosine, Dot, and Euclidean distance are supported by the Qdrant server when TurboQuant is enabled.
 - When used with `HYBRID` collections, quantization applies only to the **dense** vector.
 
 **Examples:**
@@ -102,6 +113,26 @@ Scalar with explicit calibration and quantized vectors pinned to RAM:
 CREATE COLLECTION research_papers QUANTIZE SCALAR QUANTILE 0.95 ALWAYS RAM
 ```
 
+TurboQuant — default 4-bit (8× compression, good recall):
+```sql
+CREATE COLLECTION research_papers QUANTIZE TURBO
+```
+
+TurboQuant — 2-bit (16× compression):
+```sql
+CREATE COLLECTION research_papers QUANTIZE TURBO BITS 2
+```
+
+TurboQuant — 1.5-bit (24× compression) with quantized vectors pinned to RAM:
+```sql
+CREATE COLLECTION research_papers QUANTIZE TURBO BITS 1.5 ALWAYS RAM
+```
+
+TurboQuant — 1-bit (32× compression, same ratio as BINARY but better recall):
+```sql
+CREATE COLLECTION research_papers QUANTIZE TURBO BITS 1
+```
+
 Binary quantization for large high-dimensional embeddings:
 ```sql
 CREATE COLLECTION research_papers QUANTIZE BINARY
@@ -115,22 +146,29 @@ CREATE COLLECTION research_papers QUANTIZE PRODUCT ALWAYS RAM
 Combined with hybrid collection:
 ```sql
 CREATE COLLECTION research_papers HYBRID QUANTIZE SCALAR
+CREATE COLLECTION research_papers HYBRID QUANTIZE TURBO BITS 2
 ```
 
 Combined with a pinned model:
 ```sql
 CREATE COLLECTION research_papers USING MODEL 'BAAI/bge-base-en-v1.5' QUANTIZE SCALAR QUANTILE 0.99
+CREATE COLLECTION research_papers USING MODEL 'BAAI/bge-base-en-v1.5' QUANTIZE TURBO BITS 2
+```
+
+Combined with hybrid + dense model:
+```sql
+CREATE COLLECTION research_papers USING HYBRID DENSE MODEL 'BAAI/bge-base-en-v1.5' QUANTIZE TURBO
 ```
 
 **Valid combinations:**
 
-| Base form | + QUANTIZE SCALAR | + QUANTIZE BINARY | + QUANTIZE PRODUCT |
-|---|---|---|---|
-| `CREATE COLLECTION name` | ✓ | ✓ | ✓ |
-| `... HYBRID` | ✓ | ✓ | ✓ |
-| `... USING MODEL 'x'` | ✓ | ✓ | ✓ |
-| `... USING HYBRID` | ✓ | ✓ | ✓ |
-| `... USING HYBRID DENSE MODEL 'x'` | ✓ | ✓ | ✓ |
+| Base form | + SCALAR | + TURBO | + BINARY | + PRODUCT |
+|---|---|---|---|---|
+| `CREATE COLLECTION name` | ✓ | ✓ | ✓ | ✓ |
+| `... HYBRID` | ✓ | ✓ | ✓ | ✓ |
+| `... USING MODEL 'x'` | ✓ | ✓ | ✓ | ✓ |
+| `... USING HYBRID` | ✓ | ✓ | ✓ | ✓ |
+| `... USING HYBRID DENSE MODEL 'x'` | ✓ | ✓ | ✓ | ✓ |
 
 > INSERT and SEARCH on quantized collections work exactly the same as on non-quantized ones — no changes to INSERT or SEARCH syntax are needed.
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,7 +1,7 @@
 [project]
 name = "qql-cli"
-version = "2.0.0"
-description = "QQL is a SQL-like query language and CLI for Qdrant vector database. Write INSERT, SEARCH, RECOMMEND, DELETE, and CREATE COLLECTION statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, binary, product), WHERE clause filters, script execution, and collection dump/restore."
+version = "2.1.0"
+description = "QQL is a SQL-like query language and CLI for Qdrant vector database. Write INSERT, SEARCH, RECOMMEND, DELETE, and CREATE COLLECTION statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, turbo, binary, product), WHERE clause filters, script execution, and collection dump/restore."
 readme = "README.md"
 license = { file = "LICENSE" }
 requires-python = ">=3.12"
@@ -37,7 +37,7 @@ classifiers = [
     "Topic :: Text Processing :: Indexing",
 ]
 dependencies = [
-    "qdrant-client[fastembed]>=1.13.0",
+    "qdrant-client[fastembed]>=1.18.0",
     "click>=8.1.0",
     "rich>=13.0.0",
     "prompt_toolkit>=3.0.0",

diff --git a/src/qql/ast_nodes.py b/src/qql/ast_nodes.py
@@ -9,14 +9,16 @@ class QuantizationType(Enum):
     SCALAR  = "scalar"
     BINARY  = "binary"
     PRODUCT = "product"
+    TURBO   = "turbo"
 
 
 @dataclass(frozen=True)
 class QuantizationConfig:
     """Quantization settings parsed from a QUANTIZE clause."""
     type: QuantizationType
-    quantile: float | None = None   # SCALAR only; None → Qdrant default (0.99)
-    always_ram: bool = False        # all types; default False
+    quantile: float | None = None    # SCALAR only; None → Qdrant default (0.99)
+    always_ram: bool = False         # all types; default False
+    turbo_bits: float | None = None  # TURBO only; None → bits4 (Qdrant default 4-bit, 8×)
 
 
 @dataclass(frozen=True)

diff --git a/src/qql/executor.py b/src/qql/executor.py
@@ -41,6 +41,9 @@
     ScalarQuantization,
     ScalarQuantizationConfig,
     ScalarType,
+    TurboQuantBitSize,
+    TurboQuantization,
+    TurboQuantQuantizationConfig,
     SearchParams,
     SparseVector,
     SparseVectorParams,
@@ -846,7 +849,7 @@ def _wrap_as_filter(self, qdrant_expr: Any) -> Filter:
 
     def _build_quantization_config(
         self, qc: QuantizationConfig
-    ) -> ScalarQuantization | BinaryQuantization | ProductQuantization:
+    ) -> ScalarQuantization | BinaryQuantization | ProductQuantization | TurboQuantization:
         """Convert a parsed QuantizationConfig to a Qdrant SDK quantization object."""
         if qc.type == QuantizationType.SCALAR:
             return ScalarQuantization(
@@ -867,6 +870,28 @@ def _build_quantization_config(
                     always_ram=qc.always_ram,
                 )
             )
+        if qc.type == QuantizationType.TURBO:
+            _BITS_MAP: dict[float, TurboQuantBitSize] = {
+                4.0: TurboQuantBitSize.BITS4,
+                2.0: TurboQuantBitSize.BITS2,
+                1.5: TurboQuantBitSize.BITS1_5,
+                1.0: TurboQuantBitSize.BITS1,
+            }
+            if qc.turbo_bits is None:
+                bits_enum = None           # user omitted BITS → preserve None, server applies default
+            elif qc.turbo_bits in _BITS_MAP:
+                bits_enum = _BITS_MAP[qc.turbo_bits]
+            else:
+                raise QQLRuntimeError(
+                    f"Unsupported TURBO bit depth: {qc.turbo_bits}. "
+                    f"Valid values: 1, 1.5, 2, 4"
+                )
+            return TurboQuantization(
+                turbo=TurboQuantQuantizationConfig(
+                    bits=bits_enum,
+                    always_ram=qc.always_ram,
+                )
+            )
         raise QQLRuntimeError(f"Unknown quantization type: {qc.type}")
 
     def _collection_is_hybrid(self, name: str) -> bool:

diff --git a/src/qql/lexer.py b/src/qql/lexer.py
@@ -27,6 +27,8 @@ class TokenKind(Enum):
     QUANTILE = auto()
     ALWAYS   = auto()
     RAM      = auto()
+    TURBO    = auto()
+    BITS     = auto()
     CREATE = auto()
     INDEX = auto()
     ON = auto()
@@ -113,6 +115,8 @@ class TokenKind(Enum):
     "QUANTILE": TokenKind.QUANTILE,
     "ALWAYS":   TokenKind.ALWAYS,
     "RAM":      TokenKind.RAM,
+    "TURBO":    TokenKind.TURBO,
+    "BITS":     TokenKind.BITS,
     "CREATE": TokenKind.CREATE,
     "INDEX": TokenKind.INDEX,
     "ON": TokenKind.ON,

diff --git a/src/qql/parser.py b/src/qql/parser.py
@@ -248,8 +248,32 @@ def _parse_quantize_clause(self) -> QuantizationConfig:
                 always_ram = True
             return QuantizationConfig(type=QuantizationType.PRODUCT, always_ram=always_ram)
 
+        if tok.kind == TokenKind.TURBO:
+            self._advance()
+            turbo_bits: float | None = None
+            always_ram = False
+            if self._peek().kind == TokenKind.BITS:
+                self._advance()
+                bits_tok = self._peek()
+                raw = float(self._parse_number())
+                if raw not in (1.0, 1.5, 2.0, 4.0):
+                    raise QQLSyntaxError(
+                        f"BITS must be one of 1, 1.5, 2, or 4 for TURBO quantization, got {raw}",
+                        bits_tok.pos,
+                    )
+                turbo_bits = raw
+            if self._peek().kind == TokenKind.ALWAYS:
+                self._advance()
+                self._expect(TokenKind.RAM)
+                always_ram = True
+            return QuantizationConfig(
+                type=QuantizationType.TURBO,
+                turbo_bits=turbo_bits,
+                always_ram=always_ram,
+            )
+
         raise QQLSyntaxError(
-            f"Expected SCALAR, BINARY, or PRODUCT after QUANTIZE, got '{tok.value}'",
+            f"Expected SCALAR, BINARY, PRODUCT, or TURBO after QUANTIZE, got '{tok.value}'",
             tok.pos,
         )
 

diff --git a/tests/test_executor.py b/tests/test_executor.py
@@ -1640,3 +1640,110 @@ def test_result_message_no_quantization_suffix_when_absent(self, executor, mock_
         node = CreateCollectionStmt(collection="articles")
         result = executor.execute(node)
         assert "quantization" not in result.message
+
+
+class TestTurboQuantCreate:
+    """Executor tests for QUANTIZE TURBO — verifies correct SDK objects are built."""
+
+    @pytest.fixture
+    def executor(self, cfg, mock_client):
+        return Executor(mock_client, cfg)
+
+    # ── TurboQuantization object is produced ──────────────────────────────
+
+    def test_turbo_passes_turbo_quantization(self, executor, mock_client):
+        from qdrant_client.models import TurboQuantization
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert isinstance(kw.get("quantization_config"), TurboQuantization)
+
+    def test_turbo_default_bits_is_none(self, executor, mock_client):
+        """When BITS is omitted, bits must be None — preserving omission so the
+        SDK/server applies its own default rather than QQL forcing BITS4."""
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert kw["quantization_config"].turbo.bits is None
+
+    def test_turbo_bits2(self, executor, mock_client):
+        from qdrant_client.models import TurboQuantBitSize
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO, turbo_bits=2.0),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert kw["quantization_config"].turbo.bits == TurboQuantBitSize.BITS2
+
+    def test_turbo_bits1_5(self, executor, mock_client):
+        from qdrant_client.models import TurboQuantBitSize
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO, turbo_bits=1.5),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert kw["quantization_config"].turbo.bits == TurboQuantBitSize.BITS1_5
+
+    def test_turbo_bits1(self, executor, mock_client):
+        from qdrant_client.models import TurboQuantBitSize
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO, turbo_bits=1.0),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert kw["quantization_config"].turbo.bits == TurboQuantBitSize.BITS1
+
+    def test_turbo_always_ram_true(self, executor, mock_client):
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO, always_ram=True),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert kw["quantization_config"].turbo.always_ram is True
+
+    def test_turbo_always_ram_false_by_default(self, executor, mock_client):
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert kw["quantization_config"].turbo.always_ram is False
+
+    def test_turbo_hybrid_collection_has_both_configs(self, executor, mock_client):
+        from qdrant_client.models import TurboQuantization
+        node = CreateCollectionStmt(
+            collection="articles",
+            hybrid=True,
+            quantization=QuantizationConfig(type=QuantizationType.TURBO),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert isinstance(kw.get("quantization_config"), TurboQuantization)
+        assert "sparse_vectors_config" in kw
+
+    def test_turbo_result_message_includes_turbo(self, executor, mock_client):
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO),
+        )
+        result = executor.execute(node)
+        assert "turbo" in result.message
+
+    def test_turbo_invalid_bits_at_executor_raises(self, executor, mock_client):
+        """An unexpected turbo_bits value that bypasses parser validation must
+        raise QQLRuntimeError explicitly instead of silently coercing to BITS4."""
+        from qql.exceptions import QQLRuntimeError as QQLErr
+        qc = QuantizationConfig(type=QuantizationType.TURBO, turbo_bits=3.0)
+        with pytest.raises(QQLErr, match="Unsupported TURBO bit depth"):
+            executor._build_quantization_config(qc)