Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions docs/collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,9 @@ Explicitly creates a new empty collection. Collections are also created automati
CREATE COLLECTION <collection_name>
CREATE COLLECTION <collection_name> HYBRID
CREATE COLLECTION <collection_name> USING MODEL '<model_name>'
CREATE COLLECTION <collection_name> USING VECTOR '<dense_vector_name>'
CREATE COLLECTION <collection_name> USING HYBRID
CREATE COLLECTION <collection_name> USING HYBRID DENSE MODEL '<model>'
CREATE COLLECTION <collection_name> USING HYBRID [DENSE MODEL '<model>'] [DENSE VECTOR '<name>'] [SPARSE VECTOR '<name>']
CREATE COLLECTION <collection_name> WITH VECTORS { on_disk: <bool> }
CREATE COLLECTION <collection_name> WITH HNSW { m, ef_construct, full_scan_threshold, max_indexing_threads, on_disk, payload_m, inline_storage }
CREATE COLLECTION <collection_name> WITH OPTIMIZERS { deleted_threshold, vacuum_min_vector_number, default_segment_number, max_segment_size, memmap_threshold, indexing_threshold, flush_interval_sec, max_optimization_threads, prevent_unoptimized }
Expand All @@ -117,6 +118,11 @@ Dense-only collection (standard, uses default model dimensions):
CREATE COLLECTION research_papers
```

QQL-created dense collections use the configured dense vector name (`dense` by default). You can choose a different name explicitly:
```sql
CREATE COLLECTION research_papers USING VECTOR 'body'
```

Dense-only collection pinned to a specific model (768-dimensional):
```sql
CREATE COLLECTION research_papers USING MODEL 'BAAI/bge-base-en-v1.5'
Expand All @@ -127,6 +133,11 @@ Hybrid collection (dense + sparse BM25, default models):
CREATE COLLECTION research_papers HYBRID
```

Hybrid collection with explicit vector names:
```sql
CREATE COLLECTION research_papers USING HYBRID DENSE VECTOR 'emb' SPARSE VECTOR 'lex'
```

Hybrid collection with a custom dense model:
```sql
CREATE COLLECTION research_papers USING HYBRID DENSE MODEL 'BAAI/bge-base-en-v1.5'
Expand All @@ -150,6 +161,8 @@ QQL supports the same config blocks on both `CREATE COLLECTION` and `ALTER COLLE
- `WITH PARAMS { replication_factor, write_consistency_factor, read_fan_out_factor, read_fan_out_delay_ms, on_disk_payload }` on alter
- `ALTER COLLECTION ... QUANTIZE ...` supports the same quantization forms as create, plus `QUANTIZE DISABLED`

`ALTER COLLECTION ... WITH VECTORS { ... }` can update unnamed collections or named collections with one dense vector. Collections with multiple dense vectors are rejected because this syntax has no vector-name target.

Example:

```sql
Expand Down Expand Up @@ -380,6 +393,7 @@ Replaces the stored dense vector for a **single point** identified by its ID. Th
```
UPDATE <collection> SET VECTOR WHERE id = '<point_id>' [<vector>]
UPDATE <collection> SET VECTOR WHERE id = <integer_id> [<vector>]
UPDATE <collection> SET VECTOR '<dense_vector_name>' WHERE id = '<point_id>' [<vector>]
```

The vector is provided as a JSON-style float array `[v1, v2, ..., vN]`. The array length must match the collection's configured vector dimensions.
Expand All @@ -392,13 +406,17 @@ UPDATE articles SET VECTOR WHERE id = '3f2e1a4b-8c91-4d0e-b123-abc123def456' [0.

-- Replace vector by integer ID
UPDATE articles SET VECTOR WHERE id = 42 [0.1, 0.2, 0.3, 0.4]

-- Replace a specific named vector
UPDATE articles SET VECTOR 'body' WHERE id = '3f2e1a4b-8c91-4d0e-b123-abc123def456' [0.1, 0.2, 0.3, 0.4]
```

**Notes:**
- Only single-point updates are supported (by ID). Bulk or filter-based vector updates are not supported.
- The point must already exist; this operation does not create new points.
- The collection must exist; updating from a non-existent collection raises an error.
- For hybrid collections, the dense vector named `"dense"` is updated. Sparse vectors are managed separately.
- For named-vector collections, QQL updates the only dense vector when the target is unambiguous. Use `SET VECTOR '<name>'` when a collection has multiple dense vectors.
- Sparse vectors are managed separately.

---

Expand Down
18 changes: 16 additions & 2 deletions docs/insert.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,9 @@ If you include an `id` field in `VALUES`, QQL uses it as the Qdrant point ID. Su
```
INSERT INTO COLLECTION <collection_name> VALUES {<dict>}
INSERT INTO COLLECTION <collection_name> VALUES {<dict>} USING MODEL '<model_name>'
INSERT INTO COLLECTION <collection_name> VALUES {<dict>} USING VECTOR '<dense_vector_name>'
INSERT INTO COLLECTION <collection_name> VALUES {<dict>} USING HYBRID
INSERT INTO COLLECTION <collection_name> VALUES {<dict>} USING HYBRID DENSE MODEL '<model>' SPARSE MODEL '<model>'
INSERT INTO COLLECTION <collection_name> VALUES {<dict>} USING HYBRID [DENSE MODEL '<model>'] [DENSE VECTOR '<name>'] [SPARSE MODEL '<model>'] [SPARSE VECTOR '<name>']
```

**Examples:**
Expand Down Expand Up @@ -49,6 +50,17 @@ Insert into a hybrid collection (dense + sparse BM25 vectors):
INSERT INTO COLLECTION articles VALUES {'text': 'Attention is all you need'} USING HYBRID
```

Insert into a specific named dense vector:
```sql
INSERT INTO COLLECTION articles VALUES {'text': 'hello world'} USING VECTOR 'body'
```

Insert into a hybrid collection with external vector names:
```sql
INSERT INTO COLLECTION articles VALUES {'text': 'hello world'}
USING HYBRID DENSE VECTOR 'emb' SPARSE VECTOR 'lex'
```

Insert with custom models for both dense and sparse:
```sql
INSERT INTO COLLECTION articles VALUES {'text': 'hello world'}
Expand All @@ -67,6 +79,7 @@ INSERT INTO COLLECTION articles VALUES {'text': 'hello world'}
- `id`, when provided, must be an unsigned integer or UUID string.
- If the collection already exists with a different vector size (from a different model), an error is raised with a clear message.
- Hybrid inserts require a hybrid collection (created with `CREATE COLLECTION ... HYBRID`, auto-created on the first `USING HYBRID` insert, or **auto-detected** — if you omit `USING HYBRID` but the target collection is already a hybrid collection, QQL detects this and uses the hybrid insert path automatically).
- If a collection has multiple dense or sparse vectors, specify the target vector names explicitly.

---

Expand All @@ -82,8 +95,9 @@ Each record may optionally include an `id` field. This is the preferred way to k
```
INSERT BULK INTO COLLECTION <collection_name> VALUES [<dict>, <dict>, ...]
INSERT BULK INTO COLLECTION <collection_name> VALUES [<dict>, ...] USING MODEL '<model_name>'
INSERT BULK INTO COLLECTION <collection_name> VALUES [<dict>, ...] USING VECTOR '<dense_vector_name>'
INSERT BULK INTO COLLECTION <collection_name> VALUES [<dict>, ...] USING HYBRID
INSERT BULK INTO COLLECTION <collection_name> VALUES [<dict>, ...] USING HYBRID DENSE MODEL '<model>' SPARSE MODEL '<model>'
INSERT BULK INTO COLLECTION <collection_name> VALUES [<dict>, ...] USING HYBRID [DENSE MODEL '<model>'] [DENSE VECTOR '<name>'] [SPARSE MODEL '<model>'] [SPARSE VECTOR '<name>']
```

**Examples:**
Expand Down
4 changes: 3 additions & 1 deletion docs/programmatic.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ with Connection("http://localhost:6333") as conn:
# Inspect collection diagnostics
result = conn.run_query("SHOW COLLECTION notes")
print(result.data["topology"]) # "dense" or "hybrid"
print(result.data["vectors"]) # {"": {...}} or {"dense": {...}}
print(result.data["vectors"]) # named vectors, or {"": {...}} for unnamed external collections
print(result.data["payload_schema"]) # field index info, or None
```

Expand All @@ -150,6 +150,8 @@ with Connection("http://localhost:6333") as conn:
| `url` | `str` | `"http://localhost:6333"` | Qdrant instance URL |
| `secret` | `str \| None` | `None` | API key; `None` for unauthenticated |
| `default_model` | `str \| None` | `None` → `sentence-transformers/all-MiniLM-L6-v2` | Dense embedding model used when no `USING MODEL` clause is given |
| `default_dense_vector_name` | `str` | `"dense"` | Dense vector name used when QQL creates a collection and no explicit `USING VECTOR` name is given |
| `default_sparse_vector_name` | `str` | `"sparse"` | Sparse vector name used when QQL creates a hybrid collection and no explicit sparse vector name is given |

### Power-user: `executor` property

Expand Down
8 changes: 8 additions & 0 deletions docs/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@ Qdrant/bm25
INSERT INTO docs VALUES {'text': 'hello'} USING MODEL 'BAAI/bge-small-en-v1.5'
SEARCH docs SIMILAR TO 'hello' LIMIT 5 USING MODEL 'BAAI/bge-small-en-v1.5'

-- Explicit vector names
INSERT INTO docs VALUES {'text': 'hello'} USING VECTOR 'body'
SEARCH docs SIMILAR TO 'hello' LIMIT 5 USING VECTOR 'body'

-- Hybrid with custom dense model
SEARCH docs SIMILAR TO 'hello' LIMIT 5 USING HYBRID DENSE MODEL 'BAAI/bge-base-en-v1.5'

Expand All @@ -42,6 +46,10 @@ SEARCH docs SIMILAR TO 'hello' LIMIT 5 USING HYBRID FUSION 'dbsf'
-- Hybrid with both custom
SEARCH docs SIMILAR TO 'hello' LIMIT 5
USING HYBRID DENSE MODEL 'BAAI/bge-base-en-v1.5' SPARSE MODEL 'prithivida/Splade_PP_en_v1'

-- Hybrid with external vector names
SEARCH docs SIMILAR TO 'hello' LIMIT 5
USING HYBRID DENSE VECTOR 'emb' SPARSE VECTOR 'lex'
```

### Commonly available dense models (Fastembed)
Expand Down
8 changes: 4 additions & 4 deletions docs/scripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ Done. 41 point(s) written.
-- configured model (see: qql connect).
-- ============================================================

CREATE COLLECTION medical_records HYBRID
CREATE COLLECTION medical_records USING HYBRID DENSE VECTOR 'dense' SPARSE VECTOR 'sparse'

-- Batch 1 / 1 (records 1–41)
INSERT BULK INTO COLLECTION medical_records VALUES [
Expand All @@ -132,7 +132,7 @@ INSERT BULK INTO COLLECTION medical_records VALUES [
'peer_reviewed': true
},
...
] USING HYBRID
] USING HYBRID DENSE VECTOR 'dense' SPARSE VECTOR 'sparse'

-- ============================================================
-- End of dump
Expand All @@ -155,8 +155,8 @@ qql execute backup.qql

**Rules and notes:**
- Points without a `'text'` payload field are **skipped** (counted in the footer comment).
- Hybrid collections produce `CREATE COLLECTION <name> HYBRID` and `INSERT BULK ... USING HYBRID` statements.
- Dense collections produce plain `CREATE COLLECTION <name>` and `INSERT BULK` statements.
- Hybrid collections produce `CREATE COLLECTION <name> USING HYBRID ...` and matching `INSERT BULK ... USING HYBRID ...` statements, including vector names when the source collection uses named vectors.
- Dense collections produce `CREATE COLLECTION <name> USING VECTOR '<name>'` for named vectors, or plain `CREATE COLLECTION <name>` for unnamed external collections.
Comment thread
coderabbitai[bot] marked this conversation as resolved.
- All payload value types are preserved: strings, integers, floats, booleans (`true`/`false`), `null`, lists, and nested dicts.
- Re-importing re-embeds all text using your currently configured model — use the same model as the original collection to preserve semantic accuracy.
- Parent directories of the output path are created automatically.
17 changes: 14 additions & 3 deletions docs/search.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,11 @@ An optional `WHERE` clause filters the candidate set **before** similarity ranki
```
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n>
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING MODEL '<model_name>'
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING VECTOR '<dense_vector_name>'
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> [USING MODEL '<model>'] WHERE <filter>
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING HYBRID
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING HYBRID [FUSION 'rrf|dbsf'] [DENSE MODEL '<model>'] [SPARSE MODEL '<model>'] [WHERE <filter>]
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING SPARSE [MODEL '<sparse_model>']
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING HYBRID [FUSION 'rrf|dbsf'] [DENSE MODEL '<model>'] [DENSE VECTOR '<name>'] [SPARSE MODEL '<model>'] [SPARSE VECTOR '<name>'] [WHERE <filter>]
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING SPARSE [MODEL '<sparse_model>'] [VECTOR '<sparse_vector_name>']
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> EXACT
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> [USING ...] [WHERE <filter>] [RERANK] WITH { hnsw_ef: <n>, exact: true|false, acorn: true|false, indexed_only: true|false, quantization: { ignore: true|false, rescore: true|false, oversampling: <n> }, mmr_diversity: <0..1>, mmr_candidates: <n> }
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> [USING ...] [WHERE <filter>] RERANK [MODEL '<reranker_model>']
Expand All @@ -38,7 +39,17 @@ Hybrid search (combines dense semantic + sparse BM25 keyword retrieval via RRF b
SEARCH articles SIMILAR TO 'attention mechanism' LIMIT 10 USING HYBRID
```

Sparse-only search (queries only the `sparse` named vector — useful for pure keyword retrieval):
Search a specific named dense vector:
```sql
SEARCH articles SIMILAR TO 'attention mechanism' LIMIT 10 USING VECTOR 'body'
```

Hybrid search against external vector names:
```sql
SEARCH articles SIMILAR TO 'attention mechanism' LIMIT 10 USING HYBRID DENSE VECTOR 'emb' SPARSE VECTOR 'lex'
```

Sparse-only search (queries a sparse vector — useful for pure keyword retrieval):
```sql
SEARCH medical_knowledge SIMILAR TO 'beta blocker contraindications' LIMIT 5 USING SPARSE
```
Expand Down
11 changes: 10 additions & 1 deletion src/qql/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,13 @@
except PackageNotFoundError:
__version__ = "0.0.0+unknown"

from .config import DEFAULT_MODEL, QQLConfig, load_config
from .config import (
DEFAULT_DENSE_VECTOR_NAME,
DEFAULT_MODEL,
DEFAULT_SPARSE_VECTOR_NAME,
QQLConfig,
load_config,
)
from .connection import Connection
from .exceptions import QQLError, QQLRuntimeError, QQLSyntaxError
from .executor import ExecutionResult, Executor
Expand All @@ -15,6 +21,9 @@
__all__ = [
"__version__",
"Connection",
"DEFAULT_DENSE_VECTOR_NAME",
"DEFAULT_MODEL",
"DEFAULT_SPARSE_VECTOR_NAME",
"QQLConfig",
"QQLError",
"QQLRuntimeError",
Expand Down
9 changes: 9 additions & 0 deletions src/qql/ast_nodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,8 @@ class InsertStmt:
model: str | None # dense model; None → use config default
hybrid: bool = False # if True, also embed + store sparse BM25 vector
sparse_model: str | None = None # sparse model; None → SparseEmbedder.DEFAULT_MODEL
dense_vector: str | None = None
sparse_vector: str | None = None


@dataclass(frozen=True)
Expand All @@ -216,6 +218,8 @@ class InsertBulkStmt:
model: str | None # dense model; None → use config default
hybrid: bool = False
sparse_model: str | None = None
dense_vector: str | None = None
sparse_vector: str | None = None


@dataclass(frozen=True)
Expand All @@ -225,6 +229,8 @@ class CreateCollectionStmt:
model: str | None = None # dense model; None → use config default
quantization: QuantizationConfig | None = None # optional QUANTIZE clause
config: CollectionConfig | None = None
dense_vector: str | None = None
sparse_vector: str | None = None


@dataclass(frozen=True)
Expand Down Expand Up @@ -287,6 +293,8 @@ class SearchStmt:
with_clause: SearchWith | None = None
group_by: str | None = None # GROUP BY field name; None → normal flat search
group_size: int = 3 # max points per group (ignored when group_by is None)
dense_vector: str | None = None
sparse_vector: str | None = None


@dataclass(frozen=True)
Expand Down Expand Up @@ -317,6 +325,7 @@ class UpdateVectorStmt:
collection: str
point_id: str | int
vector: tuple[float, ...] # dense vector as immutable tuple (frozen=True compatible)
vector_name: str | None = None


@dataclass(frozen=True)
Expand Down
14 changes: 9 additions & 5 deletions src/qql/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@
Insert a point. 'text' is required and auto-vectorized.
Optional: include [yellow]'id'[/yellow] in VALUES as an integer or UUID
Optional: [yellow]USING MODEL[/yellow] '<model>'
Optional: [yellow]USING HYBRID[/yellow] [DENSE MODEL '<model>'] [SPARSE MODEL '<model>']
Optional: [yellow]USING VECTOR[/yellow] '<dense_vector>'
Optional: [yellow]USING HYBRID[/yellow] [DENSE MODEL '<model>'] [DENSE VECTOR '<name>'] [SPARSE MODEL '<model>'] [SPARSE VECTOR '<name>']

[yellow]INSERT BULK INTO COLLECTION[/yellow] <name> [yellow]VALUES[/yellow] [{[yellow]'text'[/yellow]: '...', ...}, ...]
Batch insert multiple points in a single call. Each dict must contain 'text'.
Expand All @@ -37,7 +38,8 @@
[yellow]CREATE COLLECTION[/yellow] <name> [[yellow]HYBRID[/yellow]]
Create a new collection. Add HYBRID for dense+sparse BM25 vectors.
Optional: [yellow]USING MODEL[/yellow] '<model>'
Optional: [yellow]USING HYBRID[/yellow] [DENSE MODEL '<model>']
Optional: [yellow]USING VECTOR[/yellow] '<dense_vector>'
Optional: [yellow]USING HYBRID[/yellow] [DENSE MODEL '<model>'] [DENSE VECTOR '<name>'] [SPARSE VECTOR '<name>']
Optional: [yellow]WITH VECTORS[/yellow] { on_disk: <bool> }
Optional: [yellow]WITH HNSW[/yellow] { m, ef_construct, full_scan_threshold, max_indexing_threads, on_disk, payload_m, inline_storage }
Optional: [yellow]WITH OPTIMIZERS[/yellow] { deleted_threshold, vacuum_min_vector_number, default_segment_number, max_segment_size, memmap_threshold, indexing_threshold, flush_interval_sec, max_optimization_threads, prevent_unoptimized }
Expand Down Expand Up @@ -87,8 +89,9 @@
[yellow]SEARCH[/yellow] <name> [yellow]SIMILAR TO[/yellow] '<text>' [yellow]LIMIT[/yellow] <n>
Semantic search by vector similarity.
Optional: [yellow]USING MODEL[/yellow] '<model>'
Optional: [yellow]USING HYBRID[/yellow] [FUSION 'rrf|dbsf'] [DENSE MODEL '<model>'] [SPARSE MODEL '<model>']
Optional: [yellow]USING SPARSE[/yellow] [MODEL '<model>'] sparse-vector-only search
Optional: [yellow]USING VECTOR[/yellow] '<dense_vector>'
Optional: [yellow]USING HYBRID[/yellow] [FUSION 'rrf|dbsf'] [DENSE MODEL '<model>'] [DENSE VECTOR '<name>'] [SPARSE MODEL '<model>'] [SPARSE VECTOR '<name>']
Optional: [yellow]USING SPARSE[/yellow] [MODEL '<model>'] [VECTOR '<name>'] sparse-vector-only search
Optional: [yellow]WHERE[/yellow] <filter> (e.g. WHERE year > 2020 AND status = 'ok')
Optional: [yellow]RERANK[/yellow] [MODEL '<model>'] rerank results with a cross-encoder
Optional: [yellow]EXACT[/yellow] bypass HNSW and perform exact search
Expand All @@ -107,7 +110,8 @@
[yellow]DELETE FROM[/yellow] <name> [yellow]WHERE id =[/yellow] '<id>'
Delete a point by its ID.

[yellow]UPDATE[/yellow] <name> [yellow]SET VECTOR WHERE id =[/yellow] '<id>'|<int> [<vector>]
[yellow]UPDATE[/yellow] <name> [yellow]SET VECTOR[/yellow] [yellow]WHERE id =[/yellow] '<id>'|<int> [<vector>]
[yellow]UPDATE[/yellow] <name> [yellow]SET VECTOR[/yellow] '<dense_vector_name>' [yellow]WHERE id =[/yellow] '<id>'|<int> [<vector>]
Replace the dense vector for a single point by ID.
The point must already exist. Vector is a float array: [0.1, 0.2, ..., 0.N]

Expand Down
10 changes: 10 additions & 0 deletions src/qql/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,17 @@
CONFIG_PATH = CONFIG_DIR / "config.json"

DEFAULT_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
DEFAULT_DENSE_VECTOR_NAME = "dense"
DEFAULT_SPARSE_VECTOR_NAME = "sparse"


@dataclass
class QQLConfig:
url: str
secret: str | None = None
default_model: str = DEFAULT_MODEL
default_dense_vector_name: str = DEFAULT_DENSE_VECTOR_NAME
default_sparse_vector_name: str = DEFAULT_SPARSE_VECTOR_NAME


def save_config(cfg: QQLConfig) -> None:
Expand All @@ -33,6 +37,12 @@ def load_config() -> QQLConfig | None:
url=data["url"],
secret=data.get("secret"),
default_model=data.get("default_model", DEFAULT_MODEL),
default_dense_vector_name=data.get(
"default_dense_vector_name", DEFAULT_DENSE_VECTOR_NAME
),
default_sparse_vector_name=data.get(
"default_sparse_vector_name", DEFAULT_SPARSE_VECTOR_NAME
),
)


Expand Down
Loading
Loading