Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 16 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
[![PyPI version](https://img.shields.io/pypi/v/qql-cli?color=blue&label=PyPI)](https://pypi.org/project/qql-cli/)
[![Python 3.12+](https://img.shields.io/pypi/pyversions/qql-cli)](https://pypi.org/project/qql-cli/)
[![MIT License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
[![Tests](https://img.shields.io/badge/tests-405%20passing-brightgreen)](tests/)
[![Tests](https://img.shields.io/badge/tests-500%20passing-brightgreen)](tests/)

Write `INSERT`, `SELECT`, `SEARCH`, `SCROLL`, `RECOMMEND`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, turbo, binary, product), SQL-style `WHERE` filters, script execution, and collection dump/restore.
Write `INSERT`, `SELECT`, `SEARCH`, `SCROLL`, `RECOMMEND`, `UPDATE`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, grouped search (GROUP BY), cross-encoder reranking, quantization (scalar, turbo, binary, product), SQL-style `WHERE` filters, script execution, and collection dump/restore.

```
qql> INSERT INTO COLLECTION notes VALUES {'text': 'Qdrant is a vector database', 'author': 'alice', 'year': 2024}
Expand Down Expand Up @@ -82,9 +82,9 @@ Full documentation lives in the [`docs/`](docs/) folder and at **[pavanjava.gith
|---|---|
| [Getting Started](docs/getting-started.md) | Installation, connecting, first queries |
| [INSERT / INSERT BULK](docs/insert.md) | Adding documents, batch inserts, payload types |
| [SEARCH / SELECT / SCROLL / RECOMMEND / Hybrid / RERANK](docs/search.md) | Semantic search, point retrieval, pagination, hybrid, reranking, recommendations |
| [SEARCH / SELECT / SCROLL / RECOMMEND / Hybrid / GROUP BY / RERANK](docs/search.md) | Semantic search, grouped search, point retrieval, pagination, hybrid, reranking, recommendations |
| [WHERE Filters](docs/filters.md) | Full SQL-style filter operators |
| [Collections & Quantization](docs/collections.md) | SHOW, CREATE, DROP, QUANTIZE (scalar/turbo/binary/product), CREATE INDEX |
| [Collections & Quantization](docs/collections.md) | SHOW, CREATE, DROP, QUANTIZE (scalar/turbo/binary/product), CREATE INDEX, UPDATE VECTOR, UPDATE PAYLOAD |
| [Scripts: EXECUTE / DUMP](docs/scripts.md) | Script files, collection backup/restore |
| [Programmatic Usage](docs/programmatic.md) | Use QQL as a Python library |
| [Reference: Models / Config / Errors](docs/reference.md) | Embedding models, config file, error reference |
Expand Down Expand Up @@ -128,6 +128,17 @@ SHOW COLLECTIONS
SHOW COLLECTION articles
DROP COLLECTION articles

-- Search with grouping
SEARCH articles SIMILAR TO 'query' LIMIT 5 GROUP BY category
SEARCH articles SIMILAR TO 'query' LIMIT 5 GROUP BY category GROUP_SIZE 3
SEARCH articles SIMILAR TO 'query' LIMIT 5 WHERE year >= 2020 GROUP BY category GROUP_SIZE 2
SEARCH articles SIMILAR TO 'query' LIMIT 5 USING HYBRID GROUP BY category

-- Update
UPDATE articles SET VECTOR WHERE id = '3f2e1a4b-...' [0.1, 0.2, 0.3, 0.4]
UPDATE articles SET PAYLOAD WHERE id = '3f2e1a4b-...' {'year': 2025, 'status': 'active'}
UPDATE articles SET PAYLOAD WHERE category = 'draft' {'status': 'published'}

-- Delete
DELETE FROM articles WHERE id = '3f2e1a4b-...'
DELETE FROM articles WHERE year < 2020
Expand All @@ -147,7 +158,7 @@ Tests do not require a running Qdrant instance — the Qdrant client is mocked.
pytest tests/ -v
```

Expected: **405 tests passing**.
Expected: **500 tests passing**.

---

Expand Down
65 changes: 65 additions & 0 deletions docs/collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -310,3 +310,68 @@ DELETE FROM articles WHERE year < 2020 AND status = 'draft'
**Notes:**
- If no points match the filter or ID, the operation succeeds silently with a count of 0.
- The collection itself must exist; deleting from a non-existent collection raises an error.

---

## UPDATE SET VECTOR — replace a point's dense vector

Replaces the stored dense vector for a **single point** identified by its ID. The point must already exist in the collection. Use this when you want to refresh an embedding without changing the payload.

**Syntax:**
```
UPDATE <collection> SET VECTOR WHERE id = '<point_id>' [<vector>]
UPDATE <collection> SET VECTOR WHERE id = <integer_id> [<vector>]
```

The vector is provided as a JSON-style float array `[v1, v2, ..., vN]`. The array length must match the collection's configured vector dimensions.

**Examples:**

```sql
-- Replace vector by UUID
UPDATE articles SET VECTOR WHERE id = '3f2e1a4b-8c91-4d0e-b123-abc123def456' [0.1, 0.2, 0.3, 0.4]

-- Replace vector by integer ID
UPDATE articles SET VECTOR WHERE id = 42 [0.1, 0.2, 0.3, 0.4]
```

**Notes:**
- Only single-point updates are supported (by ID). Bulk or filter-based vector updates are not supported.
- The point must already exist; this operation does not create new points.
- The collection must exist; updating from a non-existent collection raises an error.
- For hybrid collections, the dense vector named `"dense"` is updated. Sparse vectors are managed separately.

---

## UPDATE SET PAYLOAD — merge fields into a point's payload

Merges new key/value pairs into the payload of one or more points. **Existing fields not mentioned in the update are preserved** (additive merge, not a full replace). Use a `WHERE` filter to update multiple points at once.

**Syntax:**
```
UPDATE <collection> SET PAYLOAD WHERE id = '<point_id>' {<payload>}
UPDATE <collection> SET PAYLOAD WHERE id = <integer_id> {<payload>}
UPDATE <collection> SET PAYLOAD WHERE <filter> {<payload>}
```

**Examples:**

```sql
-- Update a single point by UUID
UPDATE articles SET PAYLOAD WHERE id = '3f2e1a4b-8c91-4d0e-b123-abc123def456' {'year': 2025, 'status': 'active'}

-- Update a single point by integer ID
UPDATE articles SET PAYLOAD WHERE id = 42 {'category': 'tech'}

-- Update all points matching a filter
UPDATE articles SET PAYLOAD WHERE category = 'draft' {'status': 'published'}

-- Compound filter update
UPDATE articles SET PAYLOAD WHERE year < 2020 AND status = 'draft' {'archived': true}
```

**Notes:**
- **Merge semantics:** only the fields in `{…}` are written; all other existing payload fields are preserved.
- If no points match the filter, the operation succeeds silently with no changes.
- The collection must exist; updating from a non-existent collection raises an error.
- All `WHERE` filter operators supported by `DELETE` are also supported here (see [WHERE Filters](filters.md)).
13 changes: 11 additions & 2 deletions docs/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ Tests do not require a running Qdrant instance — the Qdrant client is mocked.
pytest tests/ -v
```

Expected output: **405 tests passing**.
Expected output: **500 tests passing**.

---

Expand All @@ -174,14 +174,23 @@ Expected output: **405 tests passing**.
| `Connection failed: ...` | Qdrant unreachable at given URL | Check that Qdrant is running and the URL is correct |
| `INSERT requires a 'text' field in VALUES` | `text` key missing from the VALUES dict | Add `'text': '...'` to your dict |
| `Vector dimension mismatch: collection '...' expects X dims, but model produces Y dims` | Model used in INSERT differs from the one used to create the collection | Use `USING MODEL` to specify the same model as the collection was created with |
| `Collection '...' does not exist` | SEARCH / SCROLL / SELECT / DROP / DELETE on a non-existent collection | Check name spelling or run `SHOW COLLECTIONS` |
| `Collection '...' does not exist` | SEARCH / SCROLL / SELECT / DROP / DELETE / UPDATE on a non-existent collection | Check name spelling or run `SHOW COLLECTIONS` |
| `Unexpected token '...'; expected a QQL statement keyword` | Unrecognized statement | Check the query syntax and supported statement list |
| `SELECT requires a string or integer point id, got '...'` | `SELECT` used with a non-ID filter value | Use `SELECT * FROM <collection> WHERE id = '<id>'` or an integer ID |
| `Unterminated string literal (at position N)` | A string is missing its closing quote | Close the string with a matching `'` or `"` |
| `Unexpected character '@' (at position N)` | A character not part of QQL syntax | Remove or quote the offending character |
| `Expected a filter operator after field '...'` | Unknown operator in WHERE clause | Use one of: `=`, `!=`, `>`, `>=`, `<`, `<=`, `IN`, `NOT IN`, `BETWEEN`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `MATCH` |
| `Expected ')' ...` | Unclosed parenthesis in WHERE clause | Add the missing `)` to close the group |
| `Qdrant error during SEARCH: ...` | Hybrid search on a non-hybrid collection, or wrong vector names | Ensure the collection was created with `HYBRID` before using `USING HYBRID` in INSERT/SEARCH |
| `Qdrant error during GROUP BY SEARCH: ...` | GROUP BY on an unindexed field, or unsupported field type | Ensure the group-by field is indexed as `keyword` or `integer` via `CREATE INDEX` |
| `GROUP BY and RERANK cannot be combined ...` | Both GROUP BY and RERANK specified in the same SEARCH | Remove one of the two clauses |
| `Expected VECTOR or PAYLOAD after SET, got '...'` | Unknown keyword after SET in UPDATE | Use `UPDATE ... SET VECTOR ...` or `UPDATE ... SET PAYLOAD ...` |
| `Expected a vector list [...] after point ID in UPDATE SET VECTOR` | UPDATE SET VECTOR missing the `[...]` float array | Add the vector array: `UPDATE ... SET VECTOR WHERE id = '...' [0.1, 0.2, ...]` |
| `Qdrant error during UPDATE VECTOR: ...` | Point does not exist, or vector dimensions mismatch | Verify the point ID exists and the vector length matches the collection's dimensions |
| `Qdrant error during UPDATE PAYLOAD: ...` | Qdrant rejected the payload update | Check field values and collection state |
| `Vector elements must be numeric floats; boolean values are not allowed` | A boolean (`true` or `false`) was present in the vector array for `UPDATE SET VECTOR` — `float(True)` silently equals `1.0` in Python, so this is caught explicitly | Replace booleans with numeric floats: `UPDATE … [0.1, 0.2, …, 0.N]` |
| `Vector elements must be numeric; got invalid value: ...` | A non-numeric value (string or null) was present in the vector array for `UPDATE SET VECTOR` | Ensure all vector elements are floats: `UPDATE … [0.1, 0.2, …, 0.N]` |
| `GROUP_SIZE must be a positive integer, got N` | `GROUP_SIZE 0` or a negative value was specified | Use a positive integer: `GROUP_SIZE 3` |
| `Qdrant error during SCROLL: ...` | Qdrant rejected scroll request | Verify collection state, filter, and cursor (`AFTER`) value |
| `Unknown index type '...'` | Invalid schema type in CREATE INDEX | Use one of: `keyword`, `integer`, `float`, `bool`, `text`, `geo`, `datetime` |
| `Qdrant error during CREATE INDEX: ...` | Qdrant rejected the index creation | Check field name and collection state |
65 changes: 65 additions & 0 deletions docs/search.md
Original file line number Diff line number Diff line change
Expand Up @@ -343,3 +343,68 @@ SEARCH articles SIMILAR TO 'semantic search' LIMIT 5
| Large collections with keyword-heavy queries | `USING HYBRID RERANK` |

> **Note on scores:** After reranking, the `score` column shows the cross-encoder's raw logit (can be any real number, unbounded). Do not compare reranked scores to non-reranked cosine similarity scores.

---

## SEARCH … GROUP BY — grouped results

Returns the top-scoring points **grouped by a payload field value**. Instead of a single flat ranked list, results are organised into groups — each group contains the top-scoring points that share the same value for the specified field.

Useful for **result diversification**: e.g. "return the 3 best articles from each category", or "show the top 2 papers per author".

**Syntax:**
```
SEARCH <collection> SIMILAR TO '<query>' LIMIT <n> GROUP BY <field>
SEARCH <collection> SIMILAR TO '<query>' LIMIT <n> GROUP BY <field> GROUP_SIZE <m>
SEARCH <collection> SIMILAR TO '<query>' LIMIT <n> [WHERE <filter>] GROUP BY <field> [GROUP_SIZE <m>]
SEARCH <collection> SIMILAR TO '<query>' LIMIT <n> USING HYBRID GROUP BY <field> [GROUP_SIZE <m>]
```

- **`LIMIT <n>`** — maximum number of **groups** to return.
- **`GROUP_SIZE <m>`** — maximum number of points per group (default: **3**).
- **`GROUP BY <field>`** — the payload field whose values define the groups. **Must be a string (keyword) or number (integer) field** — this is enforced by Qdrant. Dot-notation is supported (e.g. `meta.author`). Array-valued fields are allowed: a point with multiple values for the field can appear in multiple groups. The field should be indexed as `keyword` or `integer` for best performance (see [CREATE INDEX](collections.md)).
- `WHERE` filters, `USING HYBRID`, and `USING MODEL` are all compatible with GROUP BY.
- **`GROUP BY` and `RERANK` cannot be combined** in the same statement — this raises a syntax error.

**Examples:**

Top 5 categories, up to 3 articles each (default group_size):
```sql
SEARCH articles SIMILAR TO 'machine learning' LIMIT 5 GROUP BY category
```

Top 3 authors, up to 2 papers each:
```sql
SEARCH papers SIMILAR TO 'neural networks' LIMIT 3 GROUP BY author GROUP_SIZE 2
```

Grouped search with a payload filter:
```sql
SEARCH articles SIMILAR TO 'deep learning' LIMIT 5 WHERE year >= 2022 GROUP BY category GROUP_SIZE 4
```

Grouped hybrid search:
```sql
SEARCH articles SIMILAR TO 'vector databases' LIMIT 4 USING HYBRID GROUP BY category GROUP_SIZE 3
```

**Output:**

```
✓ Found 3 group(s) by 'category' (grouped)
Group: machine-learning
Score │ ID │ Payload
────────┼──────────────────────────────────────┼────────────────────────────────────────
0.9312 │ 3f2e1a4b-8c91-4d0e-b123-abc123def456 │ {'text': '...', 'category': 'machine-learning'}
0.8845 │ 9a1b2c3d-4e5f-6789-abcd-ef0123456789 │ {'text': '...', 'category': 'machine-learning'}

Group: nlp
Score │ ID │ Payload
────────┼──────────────────────────────────────┼────────────────────────────────────────
0.9100 │ 1a2b3c4d-5e6f-7890-bcde-f01234567890 │ {'text': '...', 'category': 'nlp'}
```

> **Tip:** For GROUP BY to work efficiently, create a payload index on the grouping field first:
> ```sql
> CREATE INDEX ON COLLECTION articles FOR category TYPE keyword
> ```
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "qql-cli"
version = "2.2.0"
version = "2.3.0"
description = "QQL is a SQL-like query language and CLI for Qdrant vector database. Write INSERT, SEARCH, RECOMMEND, DELETE, and CREATE COLLECTION statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, turbo, binary, product), WHERE clause filters, script execution, and collection dump/restore."
readme = "README.md"
license = { file = "LICENSE" }
Expand Down
21 changes: 21 additions & 0 deletions src/qql/ast_nodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,8 @@ class SearchStmt:
rerank: bool = False # if True, apply cross-encoder reranking post-Qdrant
rerank_model: str | None = None # cross-encoder model; None → CrossEncoderEmbedder.DEFAULT_MODEL
with_clause: SearchWith | None = None
group_by: str | None = None # GROUP BY field name; None → normal flat search
group_size: int = 3 # max points per group (ignored when group_by is None)


@dataclass(frozen=True)
Expand All @@ -237,6 +239,23 @@ class DeleteStmt:
query_filter: FilterExpr | None = None


@dataclass(frozen=True)
class UpdateVectorStmt:
"""UPDATE <collection> SET VECTOR WHERE id = <id> [vector...]"""
collection: str
point_id: str | int
vector: tuple[float, ...] # dense vector as immutable tuple (frozen=True compatible)


@dataclass(frozen=True)
class UpdatePayloadStmt:
"""UPDATE <collection> SET PAYLOAD WHERE <filter|id> {payload}"""
collection: str
payload: dict[str, Any]
point_id: str | int | None = None # mutually exclusive with query_filter
query_filter: FilterExpr | None = None


# Union type for all top-level statement nodes
ASTNode = (
InsertStmt
Expand All @@ -251,4 +270,6 @@ class DeleteStmt:
| SearchStmt
| RecommendStmt
| DeleteStmt
| UpdateVectorStmt
| UpdatePayloadStmt
)
38 changes: 38 additions & 0 deletions src/qql/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@
Optional: [yellow]RERANK[/yellow] [MODEL '<model>'] rerank results with a cross-encoder
Optional: [yellow]EXACT[/yellow] bypass HNSW and perform exact search
Optional: [yellow]WITH[/yellow] { hnsw_ef: <int>, exact: <bool>, acorn: <bool> } search parameters
Optional: [yellow]GROUP BY[/yellow] <field> [[yellow]GROUP_SIZE[/yellow] <n>]
Group results by a payload field value (default GROUP_SIZE: 3).
Field must be keyword or integer type. RERANK and GROUP BY cannot be combined.

[yellow]RECOMMEND FROM[/yellow] <name> [yellow]POSITIVE IDS[/yellow] (<id>, ...)
Find points similar to known examples.
Expand All @@ -82,6 +85,15 @@
[yellow]DELETE FROM[/yellow] <name> [yellow]WHERE id =[/yellow] '<id>'
Delete a point by its ID.

[yellow]UPDATE[/yellow] <name> [yellow]SET VECTOR WHERE id =[/yellow] '<id>'|<int> [<vector>]
Replace the dense vector for a single point by ID.
The point must already exist. Vector is a float array: [0.1, 0.2, ..., 0.N]

[yellow]UPDATE[/yellow] <name> [yellow]SET PAYLOAD WHERE id =[/yellow] '<id>'|<int> {<payload>}
[yellow]UPDATE[/yellow] <name> [yellow]SET PAYLOAD WHERE[/yellow] <filter> {<payload>}
Merge new key/value pairs into a point's payload (additive; existing fields preserved).
Supports all WHERE filter operators. Filter-based updates affect all matching points.

Script files (in-shell):
[yellow]EXECUTE[/yellow] <path> or [yellow]\\e[/yellow] <path>
Run a .qql script file. Statements are executed in order.
Expand Down Expand Up @@ -458,6 +470,32 @@ def _run_and_print(executor: Executor, query: str) -> None:
console.print(_format_collection_diagnostics(result.data))
return

# Pretty-print grouped search results (GROUP BY)
if (
isinstance(result.data, list)
and result.data
and isinstance(result.data[0], dict)
and "group_id" in result.data[0]
):
for group in result.data:
console.print(f"\n[bold cyan]Group: {group['group_id']}[/bold cyan]")
hits = group.get("hits", [])
if hits:
tbl = Table(show_header=True, header_style="bold")
tbl.add_column("Score", style="green", no_wrap=True, justify="right")
tbl.add_column("ID")
tbl.add_column("Payload")
for hit in hits:
tbl.add_row(
str(hit["score"]),
str(hit["id"]),
str(hit.get("payload", {})),
)
console.print(tbl)
else:
console.print(" (no hits)")
return

# Pretty-print search results
if isinstance(result.data, list) and result.data and isinstance(result.data[0], dict) and "score" in result.data[0]:
table = Table(show_header=True, header_style="bold cyan")
Expand Down
Loading
Loading