Skip to content

Commit 62c119d

Browse files
committed
Merge branch 'main' into feat/pr25-merge-fix
# Conflicts: # README.md # docs/getting-started.md # docs/index.html # docs/reference.md # docs/search.md # src/qql/ast_nodes.py # src/qql/cli.py # src/qql/executor.py # src/qql/parser.py # src/qql/script.py # tests/test_executor.py # tests/test_lexer.py # tests/test_parser.py # tests/test_script.py
2 parents db8ccb4 + 9e70cfc commit 62c119d

16 files changed

Lines changed: 353 additions & 23 deletions

README.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@
55
[![PyPI version](https://img.shields.io/pypi/v/qql-cli?color=blue&label=PyPI)](https://pypi.org/project/qql-cli/)
66
[![Python 3.12+](https://img.shields.io/pypi/pyversions/qql-cli)](https://pypi.org/project/qql-cli/)
77
[![MIT License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
8-
[![Tests](https://img.shields.io/badge/tests-375%20passing-brightgreen)](tests/)
8+
[![Tests](https://img.shields.io/badge/tests-405%20passing-brightgreen)](tests/)
99

10-
Write `INSERT`, `SELECT`, `SEARCH`, `RECOMMEND`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, turbo, binary, product), SQL-style `WHERE` filters, script execution, and collection dump/restore.
10+
Write `INSERT`, `SELECT`, `SEARCH`, `SCROLL`, `RECOMMEND`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, turbo, binary, product), SQL-style `WHERE` filters, script execution, and collection dump/restore.
1111

1212
```
1313
qql> INSERT INTO COLLECTION notes VALUES {'text': 'Qdrant is a vector database', 'author': 'alice', 'year': 2024}
@@ -48,7 +48,7 @@ Your query string
4848
Qdrant instance
4949
```
5050

51-
When you run `INSERT`, the `text` field is automatically converted into a dense vector using [Fastembed](https://github.com/qdrant/fastembed). In **hybrid mode** (`USING HYBRID`), a sparse BM25 vector is also generated alongside the dense vector, and searches use Qdrant's Reciprocal Rank Fusion (RRF) to merge the results of both retrieval methods.
51+
When you run `INSERT`, the `text` field is automatically converted into a dense vector using [Fastembed](https://github.com/qdrant/fastembed). In **hybrid mode** (`USING HYBRID`), a sparse BM25 vector is also generated alongside the dense vector, and searches use Qdrant's Reciprocal Rank Fusion (RRF) by default to merge the results of both retrieval methods. You can switch hybrid search to DBSF with `FUSION 'dbsf'`.
5252

5353
---
5454

@@ -82,7 +82,7 @@ Full documentation lives in the [`docs/`](docs/) folder and at **[pavanjava.gith
8282
|---|---|
8383
| [Getting Started](docs/getting-started.md) | Installation, connecting, first queries |
8484
| [INSERT / INSERT BULK](docs/insert.md) | Adding documents, batch inserts, payload types |
85-
| [SEARCH / SELECT / RECOMMEND / Hybrid / RERANK](docs/search.md) | Semantic search, point retrieval, hybrid, reranking, recommendations |
85+
| [SEARCH / SELECT / SCROLL / RECOMMEND / Hybrid / RERANK](docs/search.md) | Semantic search, point retrieval, pagination, hybrid, reranking, recommendations |
8686
| [WHERE Filters](docs/filters.md) | Full SQL-style filter operators |
8787
| [Collections & Quantization](docs/collections.md) | CREATE, DROP, QUANTIZE (scalar/turbo/binary/product), CREATE INDEX |
8888
| [Scripts: EXECUTE / DUMP](docs/scripts.md) | Script files, collection backup/restore |
@@ -102,8 +102,14 @@ INSERT BULK INTO COLLECTION articles VALUES [{'text': '...'}, {'text': '...'}]
102102
SEARCH articles SIMILAR TO 'query' LIMIT 10
103103
SEARCH articles SIMILAR TO 'query' LIMIT 10 WHERE year >= 2020
104104
SEARCH articles SIMILAR TO 'query' LIMIT 10 USING HYBRID
105+
SEARCH articles SIMILAR TO 'query' LIMIT 10 USING HYBRID FUSION 'dbsf'
105106
SEARCH articles SIMILAR TO 'query' LIMIT 10 USING HYBRID RERANK
106107

108+
-- Scroll
109+
SCROLL FROM articles LIMIT 50
110+
SCROLL FROM articles WHERE year >= 2024 LIMIT 50
111+
SCROLL FROM articles AFTER 'cursor-id' LIMIT 50
112+
107113
-- Recommend
108114
RECOMMEND FROM articles POSITIVE IDS (1001, 1002) LIMIT 5
109115

@@ -140,7 +146,7 @@ Tests do not require a running Qdrant instance — the Qdrant client is mocked.
140146
pytest tests/ -v
141147
```
142148

143-
Expected: **375 tests passing**.
149+
Expected: **405 tests passing**.
144150

145151
---
146152

docs/getting-started.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Your query string
2424
Qdrant instance
2525
```
2626

27-
When you run `INSERT`, the `text` field is automatically converted into a dense vector using [Fastembed](https://github.com/qdrant/fastembed). In **hybrid mode** (`USING HYBRID`), a sparse BM25 vector is also generated alongside the dense vector, and searches use Qdrant's Reciprocal Rank Fusion (RRF) to merge the results of both retrieval methods.
27+
When you run `INSERT`, the `text` field is automatically converted into a dense vector using [Fastembed](https://github.com/qdrant/fastembed). In **hybrid mode** (`USING HYBRID`), a sparse BM25 vector is also generated alongside the dense vector, and searches use Qdrant's Reciprocal Rank Fusion (RRF) by default to merge the results of both retrieval methods. You can override that with `FUSION 'dbsf'` on hybrid searches.
2828

2929
---
3030

@@ -138,6 +138,9 @@ SEARCH notes SIMILAR TO 'vector storage engines' LIMIT 3
138138
-- Filter results
139139
SEARCH notes SIMILAR TO 'vector databases' LIMIT 5 WHERE year >= 2023
140140

141+
-- Browse with pagination
142+
SCROLL FROM notes LIMIT 10
143+
141144
-- List all collections
142145
SHOW COLLECTIONS
143146

@@ -150,7 +153,7 @@ SELECT * FROM notes WHERE id = 1
150153
## Next Steps
151154

152155
- [INSERT / INSERT BULK](insert.md) — adding documents
153-
- [SEARCH / SELECT / RECOMMEND / Hybrid / RERANK](search.md) — querying
156+
- [SEARCH / SELECT / SCROLL / RECOMMEND / Hybrid / RERANK](search.md) — querying
154157
- [WHERE Filters](filters.md) — payload filtering
155158
- [Collections & Quantization](collections.md) — managing collections
156159
- [Scripts: EXECUTE / DUMP](scripts.md) — automating with script files

docs/index.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ <h1>QQL</h1>
114114
<a href="https://pypi.org/project/qql-cli/"><img src="https://img.shields.io/pypi/v/qql-cli?color=blue&label=PyPI" alt="PyPI version" /></a>
115115
<a href="https://pypi.org/project/qql-cli/"><img src="https://img.shields.io/pypi/pyversions/qql-cli" alt="Python versions" /></a>
116116
<a href="https://github.com/pavanjava/qql/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License" /></a>
117-
<a href="https://github.com/pavanjava/qql/actions"><img src="https://img.shields.io/badge/tests-375%20passing-brightgreen" alt="375 tests" /></a>
117+
<a href="https://github.com/pavanjava/qql/actions"><img src="https://img.shields.io/badge/tests-405%20passing-brightgreen" alt="405 tests" /></a>
118118
</div>
119119

120120
<pre><span class="cmt"># Install</span>
@@ -148,8 +148,8 @@ <h3>INSERT / INSERT BULK</h3>
148148
<p>Adding documents, batch inserts, payload types</p>
149149
</a>
150150
<a class="card" href="search">
151-
<h3>SEARCH / SELECT / RECOMMEND</h3>
152-
<p>Semantic search, point retrieval, hybrid search, reranking, recommendations</p>
151+
<h3>SEARCH / SELECT / SCROLL / RECOMMEND</h3>
152+
<p>Semantic search, point retrieval, pagination, hybrid search, reranking, recommendations</p>
153153
</a>
154154
<a class="card" href="filters">
155155
<h3>WHERE Filters</h3>

docs/programmatic.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,15 @@ result = run_query(
4040
for hit in result.data:
4141
print(hit["score"], hit["payload"])
4242

43+
# Scroll / pagination
44+
result = run_query(
45+
"SCROLL FROM notes LIMIT 2",
46+
url="http://localhost:6333",
47+
)
48+
for point in result.data["points"]:
49+
print(point["id"], point["payload"])
50+
print(result.data["next_offset"])
51+
4352
# Bulk insert (all records embedded and upserted in one call)
4453
result = run_query(
4554
"""INSERT BULK INTO COLLECTION notes VALUES [
@@ -120,6 +129,7 @@ class ExecutionResult:
120129
| INSERT BULK | `None` (count in `result.message`) |
121130
| SELECT | `{"id": str, "payload": dict}` or `None` when not found |
122131
| SEARCH | `[{"id": str, "score": float, "payload": dict}, ...]` |
132+
| SCROLL | `{"points": [{"id": str, "payload": dict}, ...], "next_offset": str \| None}` |
123133
| RECOMMEND | `[{"id": str, "score": float, "payload": dict}, ...]` |
124134
| SHOW COLLECTIONS | `["name1", "name2", ...]` |
125135
| CREATE COLLECTION | `None` |

docs/reference.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@ SEARCH docs SIMILAR TO 'hello' LIMIT 5 USING MODEL 'BAAI/bge-small-en-v1.5'
3636
-- Hybrid with custom dense model
3737
SEARCH docs SIMILAR TO 'hello' LIMIT 5 USING HYBRID DENSE MODEL 'BAAI/bge-base-en-v1.5'
3838

39+
-- Hybrid with explicit fusion strategy
40+
SEARCH docs SIMILAR TO 'hello' LIMIT 5 USING HYBRID FUSION 'dbsf'
41+
3942
-- Hybrid with both custom
4043
SEARCH docs SIMILAR TO 'hello' LIMIT 5
4144
USING HYBRID DENSE MODEL 'BAAI/bge-base-en-v1.5' SPARSE MODEL 'prithivida/Splade_PP_en_v1'
@@ -159,7 +162,7 @@ Tests do not require a running Qdrant instance — the Qdrant client is mocked.
159162
pytest tests/ -v
160163
```
161164

162-
Expected output: **375 tests passing**.
165+
Expected output: **405 tests passing**.
163166

164167
---
165168

@@ -171,13 +174,14 @@ Expected output: **375 tests passing**.
171174
| `Connection failed: ...` | Qdrant unreachable at given URL | Check that Qdrant is running and the URL is correct |
172175
| `INSERT requires a 'text' field in VALUES` | `text` key missing from the VALUES dict | Add `'text': '...'` to your dict |
173176
| `Vector dimension mismatch: collection '...' expects X dims, but model produces Y dims` | Model used in INSERT differs from the one used to create the collection | Use `USING MODEL` to specify the same model as the collection was created with |
174-
| `Collection '...' does not exist` | SEARCH / SELECT / DROP / DELETE on a non-existent collection | Check name spelling or run `SHOW COLLECTIONS` |
177+
| `Collection '...' does not exist` | SEARCH / SCROLL / SELECT / DROP / DELETE on a non-existent collection | Check name spelling or run `SHOW COLLECTIONS` |
175178
| `Unexpected token '...'; expected a QQL statement keyword` | Unrecognized statement | Check the query syntax and supported statement list |
176179
| `SELECT requires a string or integer point id, got '...'` | `SELECT` used with a non-ID filter value | Use `SELECT * FROM <collection> WHERE id = '<id>'` or an integer ID |
177180
| `Unterminated string literal (at position N)` | A string is missing its closing quote | Close the string with a matching `'` or `"` |
178181
| `Unexpected character '@' (at position N)` | A character not part of QQL syntax | Remove or quote the offending character |
179182
| `Expected a filter operator after field '...'` | Unknown operator in WHERE clause | Use one of: `=`, `!=`, `>`, `>=`, `<`, `<=`, `IN`, `NOT IN`, `BETWEEN`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `MATCH` |
180183
| `Expected ')' ...` | Unclosed parenthesis in WHERE clause | Add the missing `)` to close the group |
181184
| `Qdrant error during SEARCH: ...` | Hybrid search on a non-hybrid collection, or wrong vector names | Ensure the collection was created with `HYBRID` before using `USING HYBRID` in INSERT/SEARCH |
185+
| `Qdrant error during SCROLL: ...` | Qdrant rejected scroll request | Verify collection state, filter, and cursor (`AFTER`) value |
182186
| `Unknown index type '...'` | Invalid schema type in CREATE INDEX | Use one of: `keyword`, `integer`, `float`, `bool`, `text`, `geo`, `datetime` |
183187
| `Qdrant error during CREATE INDEX: ...` | Qdrant rejected the index creation | Check field name and collection state |

docs/search.md

Lines changed: 35 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# SEARCH, SELECT, RECOMMEND, Hybrid Search & Reranking
1+
# SEARCH, SELECT, SCROLL, RECOMMEND, Hybrid Search & Reranking
22

33
---
44

@@ -14,7 +14,7 @@ SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n>
1414
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING MODEL '<model_name>'
1515
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> [USING MODEL '<model>'] WHERE <filter>
1616
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING HYBRID
17-
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING HYBRID [DENSE MODEL '<model>'] [SPARSE MODEL '<model>'] [WHERE <filter>]
17+
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING HYBRID [FUSION 'rrf|dbsf'] [DENSE MODEL '<model>'] [SPARSE MODEL '<model>'] [WHERE <filter>]
1818
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING SPARSE [MODEL '<sparse_model>']
1919
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> EXACT
2020
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> [USING ...] [WHERE <filter>] [RERANK] WITH { hnsw_ef: <n>, exact: true|false, acorn: true|false }
@@ -33,7 +33,7 @@ Search only papers published after 2020:
3333
SEARCH articles SIMILAR TO 'deep learning' LIMIT 10 WHERE year > 2020
3434
```
3535

36-
Hybrid search (combines dense semantic + sparse BM25 keyword retrieval via RRF):
36+
Hybrid search (combines dense semantic + sparse BM25 keyword retrieval via RRF by default):
3737
```sql
3838
SEARCH articles SIMILAR TO 'attention mechanism' LIMIT 10 USING HYBRID
3939
```
@@ -120,15 +120,41 @@ SEARCH articles SIMILAR TO 'RAG' LIMIT 10 WHERE tag = 'li' WITH { acorn: true }
120120

121121
---
122122

123+
## SCROLL — pagination / browsing
124+
125+
Use `SCROLL` to iterate through points in a collection page by page.
126+
127+
**Syntax:**
128+
```sql
129+
SCROLL FROM <collection_name> LIMIT <n>
130+
SCROLL FROM <collection_name> WHERE <filter> LIMIT <n>
131+
SCROLL FROM <collection_name> AFTER '<point_id>' LIMIT <n>
132+
SCROLL FROM <collection_name> WHERE <filter> AFTER <point_id> LIMIT <n>
133+
```
134+
135+
**Examples:**
136+
```sql
137+
SCROLL FROM articles LIMIT 50
138+
SCROLL FROM articles WHERE year >= 2024 LIMIT 50
139+
SCROLL FROM articles AFTER 'cursor-id' LIMIT 50
140+
```
141+
142+
**Behavior:**
143+
- Returns points in ID order with payloads.
144+
- Returns a `next_offset` cursor when more points are available.
145+
- Use `AFTER <next_offset>` to fetch the next page.
146+
147+
---
148+
123149
## Hybrid Search (USING HYBRID)
124150

125-
Hybrid search combines **dense semantic vectors** and **sparse BM25 keyword vectors** in a single query and merges the results with Qdrant's **Reciprocal Rank Fusion (RRF)** algorithm. This typically outperforms either method alone.
151+
Hybrid search combines **dense semantic vectors** and **sparse BM25 keyword vectors** in a single query. By default QQL merges the two result sets with Qdrant's **Reciprocal Rank Fusion (RRF)** algorithm, and you can optionally switch to **DBSF** with a `FUSION` clause.
126152

127153
### How it works internally
128154

129155
1. Both a dense vector (`TextEmbedding`) and a sparse BM25 vector (`SparseTextEmbedding`) are generated from your query text.
130156
2. Qdrant fetches the top candidates from each index independently (`prefetch limit = LIMIT × 4`).
131-
3. The two result lists are merged using RRF — a rank-based fusion that does not require score normalization.
157+
3. The two result lists are merged using the selected fusion strategy (`RRF` by default, or `DBSF` when requested).
132158
4. The final top-N results are returned.
133159

134160
### Step 1: Create a hybrid collection
@@ -161,6 +187,9 @@ SEARCH articles SIMILAR TO 'transformer architecture' LIMIT 10 USING HYBRID
161187
-- Hybrid search with a WHERE filter
162188
SEARCH articles SIMILAR TO 'attention' LIMIT 10 USING HYBRID WHERE year >= 2017
163189

190+
-- Hybrid with DBSF fusion
191+
SEARCH articles SIMILAR TO 'hybrid retrieval' LIMIT 10 USING HYBRID FUSION 'dbsf'
192+
164193
-- Hybrid with custom dense model
165194
SEARCH articles SIMILAR TO 'embeddings' LIMIT 5
166195
USING HYBRID DENSE MODEL 'BAAI/bge-base-en-v1.5'
@@ -176,6 +205,7 @@ SEARCH articles SIMILAR TO 'sparse retrieval' LIMIT 5
176205
|---|---|
177206
| Dense model | configured default (`sentence-transformers/all-MiniLM-L6-v2`) |
178207
| Sparse model | `Qdrant/bm25` |
208+
| Fusion | `rrf` |
179209

180210
### Dense vs. hybrid — when to use which
181211

src/qql/ast_nodes.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,13 +186,22 @@ class SelectStmt:
186186
point_id: str | int
187187

188188

189+
@dataclass(frozen=True)
190+
class ScrollStmt:
191+
collection: str
192+
limit: int
193+
query_filter: FilterExpr | None = None
194+
after: str | int | None = None
195+
196+
189197
@dataclass(frozen=True)
190198
class SearchStmt:
191199
collection: str
192200
query_text: str
193201
limit: int
194202
model: str | None # dense model; None → use config default
195203
hybrid: bool = False # if True, use prefetch+RRF hybrid search
204+
fusion: str | None = None # hybrid fusion strategy; None → default rrf
196205
sparse_only: bool = False # if True, query only the sparse vector (no dense)
197206
sparse_model: str | None = None # sparse model for hybrid/sparse-only; None → SparseEmbedder.DEFAULT_MODEL
198207
query_filter: FilterExpr | None = None # optional WHERE clause; default keeps existing tests valid
@@ -232,6 +241,7 @@ class DeleteStmt:
232241
| DropCollectionStmt
233242
| ShowCollectionsStmt
234243
| SelectStmt
244+
| ScrollStmt
235245
| SearchStmt
236246
| RecommendStmt
237247
| DeleteStmt

src/qql/cli.py

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,13 +49,18 @@
4949
[yellow]SHOW COLLECTIONS[/yellow]
5050
List all collections in the connected Qdrant instance.
5151
52+
[yellow]SCROLL FROM[/yellow] <name> [yellow]LIMIT[/yellow] <n>
53+
Paginate points by ID order.
54+
Optional: [yellow]WHERE[/yellow] <filter>
55+
Optional: [yellow]AFTER[/yellow] '<id>'|<int>
56+
5257
[yellow]SELECT * FROM[/yellow] <name> [yellow]WHERE id =[/yellow] '<id>'|<int>
5358
Retrieve a single point by its ID and return its payload.
5459
5560
[yellow]SEARCH[/yellow] <name> [yellow]SIMILAR TO[/yellow] '<text>' [yellow]LIMIT[/yellow] <n>
5661
Semantic search by vector similarity.
5762
Optional: [yellow]USING MODEL[/yellow] '<model>'
58-
Optional: [yellow]USING HYBRID[/yellow] [DENSE MODEL '<model>'] [SPARSE MODEL '<model>']
63+
Optional: [yellow]USING HYBRID[/yellow] [FUSION 'rrf|dbsf'] [DENSE MODEL '<model>'] [SPARSE MODEL '<model>']
5964
Optional: [yellow]USING SPARSE[/yellow] [MODEL '<model>'] sparse-vector-only search
6065
Optional: [yellow]WHERE[/yellow] <filter> (e.g. WHERE year > 2020 AND status = 'ok')
6166
Optional: [yellow]RERANK[/yellow] [MODEL '<model>'] rerank results with a cross-encoder
@@ -403,6 +408,20 @@ def _run_and_print(executor: Executor, query: str) -> None:
403408
console.print(table)
404409
return
405410

411+
# Pretty-print scroll results
412+
if isinstance(result.data, dict) and "points" in result.data and "next_offset" in result.data:
413+
points = result.data["points"]
414+
if points:
415+
table = Table(show_header=True, header_style="bold cyan")
416+
table.add_column("ID")
417+
table.add_column("Payload")
418+
for point in points:
419+
table.add_row(point["id"], str(point["payload"]))
420+
console.print(table)
421+
if result.data["next_offset"] is not None:
422+
console.print(f"[dim]next_offset: {result.data['next_offset']}[/dim]")
423+
return
424+
406425
# Pretty-print SELECT result
407426
if isinstance(result.data, dict) and "id" in result.data and "payload" in result.data:
408427
table = Table(show_header=True, header_style="bold cyan")

0 commit comments

Comments
 (0)