Skip to content

Commit cf77e7d

Browse files
committed
feat(parser): enhance hybrid model support with vector names
- Added support for specifying dense and sparse vector names in hybrid collections during INSERT and CREATE statements. - Updated the parser to handle new syntax for DENSE VECTOR and SPARSE VECTOR clauses. - Modified the InsertStmt and CreateCollectionStmt to include dense_vector and sparse_vector attributes. - Adjusted related tests to verify the correct handling of vector names in hybrid scenarios. test(tests): update tests for new hybrid vector functionality - Enhanced tests to cover new vector name features in INSERT and CREATE statements. - Updated assertions to check for correct vector names in hybrid configurations. - Refactored existing tests to ensure compatibility with the new parser changes.
1 parent 7b7d434 commit cf77e7d

15 files changed

Lines changed: 674 additions & 171 deletions

docs/collections.md

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,9 @@ Explicitly creates a new empty collection. Collections are also created automati
9191
CREATE COLLECTION <collection_name>
9292
CREATE COLLECTION <collection_name> HYBRID
9393
CREATE COLLECTION <collection_name> USING MODEL '<model_name>'
94+
CREATE COLLECTION <collection_name> USING VECTOR '<dense_vector_name>'
9495
CREATE COLLECTION <collection_name> USING HYBRID
95-
CREATE COLLECTION <collection_name> USING HYBRID DENSE MODEL '<model>'
96+
CREATE COLLECTION <collection_name> USING HYBRID [DENSE MODEL '<model>'] [DENSE VECTOR '<name>'] [SPARSE VECTOR '<name>']
9697
CREATE COLLECTION <collection_name> WITH VECTORS { on_disk: <bool> }
9798
CREATE COLLECTION <collection_name> WITH HNSW { m, ef_construct, full_scan_threshold, max_indexing_threads, on_disk, payload_m, inline_storage }
9899
CREATE COLLECTION <collection_name> WITH OPTIMIZERS { deleted_threshold, vacuum_min_vector_number, default_segment_number, max_segment_size, memmap_threshold, indexing_threshold, flush_interval_sec, max_optimization_threads, prevent_unoptimized }
@@ -117,6 +118,11 @@ Dense-only collection (standard, uses default model dimensions):
117118
CREATE COLLECTION research_papers
118119
```
119120

121+
QQL-created dense collections use the configured dense vector name (`dense` by default). You can choose a different name explicitly:
122+
```sql
123+
CREATE COLLECTION research_papers USING VECTOR 'body'
124+
```
125+
120126
Dense-only collection pinned to a specific model (768-dimensional):
121127
```sql
122128
CREATE COLLECTION research_papers USING MODEL 'BAAI/bge-base-en-v1.5'
@@ -127,6 +133,11 @@ Hybrid collection (dense + sparse BM25, default models):
127133
CREATE COLLECTION research_papers HYBRID
128134
```
129135

136+
Hybrid collection with explicit vector names:
137+
```sql
138+
CREATE COLLECTION research_papers USING HYBRID DENSE VECTOR 'emb' SPARSE VECTOR 'lex'
139+
```
140+
130141
Hybrid collection with a custom dense model:
131142
```sql
132143
CREATE COLLECTION research_papers USING HYBRID DENSE MODEL 'BAAI/bge-base-en-v1.5'
@@ -380,6 +391,7 @@ Replaces the stored dense vector for a **single point** identified by its ID. Th
380391
```
381392
UPDATE <collection> SET VECTOR WHERE id = '<point_id>' [<vector>]
382393
UPDATE <collection> SET VECTOR WHERE id = <integer_id> [<vector>]
394+
UPDATE <collection> SET VECTOR '<dense_vector_name>' WHERE id = '<point_id>' [<vector>]
383395
```
384396

385397
The vector is provided as a JSON-style float array `[v1, v2, ..., vN]`. The array length must match the collection's configured vector dimensions.
@@ -392,13 +404,17 @@ UPDATE articles SET VECTOR WHERE id = '3f2e1a4b-8c91-4d0e-b123-abc123def456' [0.
392404

393405
-- Replace vector by integer ID
394406
UPDATE articles SET VECTOR WHERE id = 42 [0.1, 0.2, 0.3, 0.4]
407+
408+
-- Replace a specific named vector
409+
UPDATE articles SET VECTOR 'body' WHERE id = '3f2e1a4b-8c91-4d0e-b123-abc123def456' [0.1, 0.2, 0.3, 0.4]
395410
```
396411

397412
**Notes:**
398413
- Only single-point updates are supported (by ID). Bulk or filter-based vector updates are not supported.
399414
- The point must already exist; this operation does not create new points.
400415
- The collection must exist; updating from a non-existent collection raises an error.
401-
- For hybrid collections, the dense vector named `"dense"` is updated. Sparse vectors are managed separately.
416+
- For named-vector collections, QQL updates the only dense vector when the target is unambiguous. Use `SET VECTOR '<name>'` when a collection has multiple dense vectors.
417+
- Sparse vectors are managed separately.
402418

403419
---
404420

docs/insert.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,9 @@ If you include an `id` field in `VALUES`, QQL uses it as the Qdrant point ID. Su
1616
```
1717
INSERT INTO COLLECTION <collection_name> VALUES {<dict>}
1818
INSERT INTO COLLECTION <collection_name> VALUES {<dict>} USING MODEL '<model_name>'
19+
INSERT INTO COLLECTION <collection_name> VALUES {<dict>} USING VECTOR '<dense_vector_name>'
1920
INSERT INTO COLLECTION <collection_name> VALUES {<dict>} USING HYBRID
20-
INSERT INTO COLLECTION <collection_name> VALUES {<dict>} USING HYBRID DENSE MODEL '<model>' SPARSE MODEL '<model>'
21+
INSERT INTO COLLECTION <collection_name> VALUES {<dict>} USING HYBRID [DENSE MODEL '<model>'] [DENSE VECTOR '<name>'] [SPARSE MODEL '<model>'] [SPARSE VECTOR '<name>']
2122
```
2223

2324
**Examples:**
@@ -49,6 +50,17 @@ Insert into a hybrid collection (dense + sparse BM25 vectors):
4950
INSERT INTO COLLECTION articles VALUES {'text': 'Attention is all you need'} USING HYBRID
5051
```
5152

53+
Insert into a specific named dense vector:
54+
```sql
55+
INSERT INTO COLLECTION articles VALUES {'text': 'hello world'} USING VECTOR 'body'
56+
```
57+
58+
Insert into a hybrid collection with external vector names:
59+
```sql
60+
INSERT INTO COLLECTION articles VALUES {'text': 'hello world'}
61+
USING HYBRID DENSE VECTOR 'emb' SPARSE VECTOR 'lex'
62+
```
63+
5264
Insert with custom models for both dense and sparse:
5365
```sql
5466
INSERT INTO COLLECTION articles VALUES {'text': 'hello world'}
@@ -67,6 +79,7 @@ INSERT INTO COLLECTION articles VALUES {'text': 'hello world'}
6779
- `id`, when provided, must be an unsigned integer or UUID string.
6880
- If the collection already exists with a different vector size (from a different model), an error is raised with a clear message.
6981
- Hybrid inserts require a hybrid collection (created with `CREATE COLLECTION ... HYBRID`, auto-created on the first `USING HYBRID` insert, or **auto-detected** — if you omit `USING HYBRID` but the target collection is already a hybrid collection, QQL detects this and uses the hybrid insert path automatically).
82+
- If a collection has multiple dense or sparse vectors, specify the target vector names explicitly.
7083

7184
---
7285

@@ -82,8 +95,9 @@ Each record may optionally include an `id` field. This is the preferred way to k
8295
```
8396
INSERT BULK INTO COLLECTION <collection_name> VALUES [<dict>, <dict>, ...]
8497
INSERT BULK INTO COLLECTION <collection_name> VALUES [<dict>, ...] USING MODEL '<model_name>'
98+
INSERT BULK INTO COLLECTION <collection_name> VALUES [<dict>, ...] USING VECTOR '<dense_vector_name>'
8599
INSERT BULK INTO COLLECTION <collection_name> VALUES [<dict>, ...] USING HYBRID
86-
INSERT BULK INTO COLLECTION <collection_name> VALUES [<dict>, ...] USING HYBRID DENSE MODEL '<model>' SPARSE MODEL '<model>'
100+
INSERT BULK INTO COLLECTION <collection_name> VALUES [<dict>, ...] USING HYBRID [DENSE MODEL '<model>'] [DENSE VECTOR '<name>'] [SPARSE MODEL '<model>'] [SPARSE VECTOR '<name>']
87101
```
88102

89103
**Examples:**

docs/programmatic.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ with Connection("http://localhost:6333") as conn:
139139
# Inspect collection diagnostics
140140
result = conn.run_query("SHOW COLLECTION notes")
141141
print(result.data["topology"]) # "dense" or "hybrid"
142-
print(result.data["vectors"]) # {"": {...}} or {"dense": {...}}
142+
print(result.data["vectors"]) # named vectors, or {"": {...}} for unnamed external collections
143143
print(result.data["payload_schema"]) # field index info, or None
144144
```
145145

@@ -150,6 +150,8 @@ with Connection("http://localhost:6333") as conn:
150150
| `url` | `str` | `"http://localhost:6333"` | Qdrant instance URL |
151151
| `secret` | `str \| None` | `None` | API key; `None` for unauthenticated |
152152
| `default_model` | `str \| None` | `None``sentence-transformers/all-MiniLM-L6-v2` | Dense embedding model used when no `USING MODEL` clause is given |
153+
| `default_dense_vector_name` | `str` | `"dense"` | Dense vector name used when QQL creates a collection and no explicit `USING VECTOR` name is given |
154+
| `default_sparse_vector_name` | `str` | `"sparse"` | Sparse vector name used when QQL creates a hybrid collection and no explicit sparse vector name is given |
153155

154156
### Power-user: `executor` property
155157

docs/reference.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,10 @@ Qdrant/bm25
3333
INSERT INTO docs VALUES {'text': 'hello'} USING MODEL 'BAAI/bge-small-en-v1.5'
3434
SEARCH docs SIMILAR TO 'hello' LIMIT 5 USING MODEL 'BAAI/bge-small-en-v1.5'
3535

36+
-- Explicit vector names
37+
INSERT INTO docs VALUES {'text': 'hello'} USING VECTOR 'body'
38+
SEARCH docs SIMILAR TO 'hello' LIMIT 5 USING VECTOR 'body'
39+
3640
-- Hybrid with custom dense model
3741
SEARCH docs SIMILAR TO 'hello' LIMIT 5 USING HYBRID DENSE MODEL 'BAAI/bge-base-en-v1.5'
3842

@@ -42,6 +46,10 @@ SEARCH docs SIMILAR TO 'hello' LIMIT 5 USING HYBRID FUSION 'dbsf'
4246
-- Hybrid with both custom
4347
SEARCH docs SIMILAR TO 'hello' LIMIT 5
4448
USING HYBRID DENSE MODEL 'BAAI/bge-base-en-v1.5' SPARSE MODEL 'prithivida/Splade_PP_en_v1'
49+
50+
-- Hybrid with external vector names
51+
SEARCH docs SIMILAR TO 'hello' LIMIT 5
52+
USING HYBRID DENSE VECTOR 'emb' SPARSE VECTOR 'lex'
4553
```
4654

4755
### Commonly available dense models (Fastembed)

docs/scripts.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -155,8 +155,8 @@ qql execute backup.qql
155155

156156
**Rules and notes:**
157157
- Points without a `'text'` payload field are **skipped** (counted in the footer comment).
158-
- Hybrid collections produce `CREATE COLLECTION <name> HYBRID` and `INSERT BULK ... USING HYBRID` statements.
159-
- Dense collections produce plain `CREATE COLLECTION <name>` and `INSERT BULK` statements.
158+
- Hybrid collections produce `CREATE COLLECTION <name> USING HYBRID ...` and matching `INSERT BULK ... USING HYBRID ...` statements, including vector names when the source collection uses named vectors.
159+
- Dense collections produce `CREATE COLLECTION <name> USING VECTOR '<name>'` for named vectors, or plain `CREATE COLLECTION <name>` for unnamed external collections.
160160
- All payload value types are preserved: strings, integers, floats, booleans (`true`/`false`), `null`, lists, and nested dicts.
161161
- Re-importing re-embeds all text using your currently configured model — use the same model as the original collection to preserve semantic accuracy.
162162
- Parent directories of the output path are created automatically.

docs/search.md

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,11 @@ An optional `WHERE` clause filters the candidate set **before** similarity ranki
1212
```
1313
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n>
1414
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING MODEL '<model_name>'
15+
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING VECTOR '<dense_vector_name>'
1516
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> [USING MODEL '<model>'] WHERE <filter>
1617
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING HYBRID
17-
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING HYBRID [FUSION 'rrf|dbsf'] [DENSE MODEL '<model>'] [SPARSE MODEL '<model>'] [WHERE <filter>]
18-
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING SPARSE [MODEL '<sparse_model>']
18+
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING HYBRID [FUSION 'rrf|dbsf'] [DENSE MODEL '<model>'] [DENSE VECTOR '<name>'] [SPARSE MODEL '<model>'] [SPARSE VECTOR '<name>'] [WHERE <filter>]
19+
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING SPARSE [MODEL '<sparse_model>'] [VECTOR '<sparse_vector_name>']
1920
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> EXACT
2021
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> [USING ...] [WHERE <filter>] [RERANK] WITH { hnsw_ef: <n>, exact: true|false, acorn: true|false, indexed_only: true|false, quantization: { ignore: true|false, rescore: true|false, oversampling: <n> }, mmr_diversity: <0..1>, mmr_candidates: <n> }
2122
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> [USING ...] [WHERE <filter>] RERANK [MODEL '<reranker_model>']
@@ -38,7 +39,17 @@ Hybrid search (combines dense semantic + sparse BM25 keyword retrieval via RRF b
3839
SEARCH articles SIMILAR TO 'attention mechanism' LIMIT 10 USING HYBRID
3940
```
4041

41-
Sparse-only search (queries only the `sparse` named vector — useful for pure keyword retrieval):
42+
Search a specific named dense vector:
43+
```sql
44+
SEARCH articles SIMILAR TO 'attention mechanism' LIMIT 10 USING VECTOR 'body'
45+
```
46+
47+
Hybrid search against external vector names:
48+
```sql
49+
SEARCH articles SIMILAR TO 'attention mechanism' LIMIT 10 USING HYBRID DENSE VECTOR 'emb' SPARSE VECTOR 'lex'
50+
```
51+
52+
Sparse-only search (queries a sparse vector — useful for pure keyword retrieval):
4253
```sql
4354
SEARCH medical_knowledge SIMILAR TO 'beta blocker contraindications' LIMIT 5 USING SPARSE
4455
```

src/qql/ast_nodes.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,8 @@ class InsertStmt:
207207
model: str | None # dense model; None → use config default
208208
hybrid: bool = False # if True, also embed + store sparse BM25 vector
209209
sparse_model: str | None = None # sparse model; None → SparseEmbedder.DEFAULT_MODEL
210+
dense_vector: str | None = None
211+
sparse_vector: str | None = None
210212

211213

212214
@dataclass(frozen=True)
@@ -216,6 +218,8 @@ class InsertBulkStmt:
216218
model: str | None # dense model; None → use config default
217219
hybrid: bool = False
218220
sparse_model: str | None = None
221+
dense_vector: str | None = None
222+
sparse_vector: str | None = None
219223

220224

221225
@dataclass(frozen=True)
@@ -225,6 +229,8 @@ class CreateCollectionStmt:
225229
model: str | None = None # dense model; None → use config default
226230
quantization: QuantizationConfig | None = None # optional QUANTIZE clause
227231
config: CollectionConfig | None = None
232+
dense_vector: str | None = None
233+
sparse_vector: str | None = None
228234

229235

230236
@dataclass(frozen=True)
@@ -287,6 +293,8 @@ class SearchStmt:
287293
with_clause: SearchWith | None = None
288294
group_by: str | None = None # GROUP BY field name; None → normal flat search
289295
group_size: int = 3 # max points per group (ignored when group_by is None)
296+
dense_vector: str | None = None
297+
sparse_vector: str | None = None
290298

291299

292300
@dataclass(frozen=True)
@@ -317,6 +325,7 @@ class UpdateVectorStmt:
317325
collection: str
318326
point_id: str | int
319327
vector: tuple[float, ...] # dense vector as immutable tuple (frozen=True compatible)
328+
vector_name: str | None = None
320329

321330

322331
@dataclass(frozen=True)

src/qql/cli.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,8 @@
2727
Insert a point. 'text' is required and auto-vectorized.
2828
Optional: include [yellow]'id'[/yellow] in VALUES as an integer or UUID
2929
Optional: [yellow]USING MODEL[/yellow] '<model>'
30-
Optional: [yellow]USING HYBRID[/yellow] [DENSE MODEL '<model>'] [SPARSE MODEL '<model>']
30+
Optional: [yellow]USING VECTOR[/yellow] '<dense_vector>'
31+
Optional: [yellow]USING HYBRID[/yellow] [DENSE MODEL '<model>'] [DENSE VECTOR '<name>'] [SPARSE MODEL '<model>'] [SPARSE VECTOR '<name>']
3132
3233
[yellow]INSERT BULK INTO COLLECTION[/yellow] <name> [yellow]VALUES[/yellow] [{[yellow]'text'[/yellow]: '...', ...}, ...]
3334
Batch insert multiple points in a single call. Each dict must contain 'text'.
@@ -37,7 +38,8 @@
3738
[yellow]CREATE COLLECTION[/yellow] <name> [[yellow]HYBRID[/yellow]]
3839
Create a new collection. Add HYBRID for dense+sparse BM25 vectors.
3940
Optional: [yellow]USING MODEL[/yellow] '<model>'
40-
Optional: [yellow]USING HYBRID[/yellow] [DENSE MODEL '<model>']
41+
Optional: [yellow]USING VECTOR[/yellow] '<dense_vector>'
42+
Optional: [yellow]USING HYBRID[/yellow] [DENSE MODEL '<model>'] [DENSE VECTOR '<name>'] [SPARSE VECTOR '<name>']
4143
Optional: [yellow]WITH VECTORS[/yellow] { on_disk: <bool> }
4244
Optional: [yellow]WITH HNSW[/yellow] { m, ef_construct, full_scan_threshold, max_indexing_threads, on_disk, payload_m, inline_storage }
4345
Optional: [yellow]WITH OPTIMIZERS[/yellow] { deleted_threshold, vacuum_min_vector_number, default_segment_number, max_segment_size, memmap_threshold, indexing_threshold, flush_interval_sec, max_optimization_threads, prevent_unoptimized }
@@ -87,8 +89,9 @@
8789
[yellow]SEARCH[/yellow] <name> [yellow]SIMILAR TO[/yellow] '<text>' [yellow]LIMIT[/yellow] <n>
8890
Semantic search by vector similarity.
8991
Optional: [yellow]USING MODEL[/yellow] '<model>'
90-
Optional: [yellow]USING HYBRID[/yellow] [FUSION 'rrf|dbsf'] [DENSE MODEL '<model>'] [SPARSE MODEL '<model>']
91-
Optional: [yellow]USING SPARSE[/yellow] [MODEL '<model>'] sparse-vector-only search
92+
Optional: [yellow]USING VECTOR[/yellow] '<dense_vector>'
93+
Optional: [yellow]USING HYBRID[/yellow] [FUSION 'rrf|dbsf'] [DENSE MODEL '<model>'] [DENSE VECTOR '<name>'] [SPARSE MODEL '<model>'] [SPARSE VECTOR '<name>']
94+
Optional: [yellow]USING SPARSE[/yellow] [MODEL '<model>'] [VECTOR '<name>'] sparse-vector-only search
9295
Optional: [yellow]WHERE[/yellow] <filter> (e.g. WHERE year > 2020 AND status = 'ok')
9396
Optional: [yellow]RERANK[/yellow] [MODEL '<model>'] rerank results with a cross-encoder
9497
Optional: [yellow]EXACT[/yellow] bypass HNSW and perform exact search
@@ -107,7 +110,7 @@
107110
[yellow]DELETE FROM[/yellow] <name> [yellow]WHERE id =[/yellow] '<id>'
108111
Delete a point by its ID.
109112
110-
[yellow]UPDATE[/yellow] <name> [yellow]SET VECTOR WHERE id =[/yellow] '<id>'|<int> [<vector>]
113+
[yellow]UPDATE[/yellow] <name> [yellow]SET VECTOR[/yellow] ['<dense_vector>'] [yellow]WHERE id =[/yellow] '<id>'|<int> [<vector>]
111114
Replace the dense vector for a single point by ID.
112115
The point must already exist. Vector is a float array: [0.1, 0.2, ..., 0.N]
113116

src/qql/config.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,17 @@
88
CONFIG_PATH = CONFIG_DIR / "config.json"
99

1010
DEFAULT_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
11+
DEFAULT_DENSE_VECTOR_NAME = "dense"
12+
DEFAULT_SPARSE_VECTOR_NAME = "sparse"
1113

1214

1315
@dataclass
1416
class QQLConfig:
1517
url: str
1618
secret: str | None = None
1719
default_model: str = DEFAULT_MODEL
20+
default_dense_vector_name: str = DEFAULT_DENSE_VECTOR_NAME
21+
default_sparse_vector_name: str = DEFAULT_SPARSE_VECTOR_NAME
1822

1923

2024
def save_config(cfg: QQLConfig) -> None:
@@ -33,6 +37,12 @@ def load_config() -> QQLConfig | None:
3337
url=data["url"],
3438
secret=data.get("secret"),
3539
default_model=data.get("default_model", DEFAULT_MODEL),
40+
default_dense_vector_name=data.get(
41+
"default_dense_vector_name", DEFAULT_DENSE_VECTOR_NAME
42+
),
43+
default_sparse_vector_name=data.get(
44+
"default_sparse_vector_name", DEFAULT_SPARSE_VECTOR_NAME
45+
),
3646
)
3747

3848

0 commit comments

Comments
 (0)