Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
[![MIT License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
[![Tests](https://img.shields.io/badge/tests-375%20passing-brightgreen)](tests/)

Write `INSERT`, `SEARCH`, `RECOMMEND`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, turbo, binary, product), SQL-style `WHERE` filters, script execution, and collection dump/restore.
Write `INSERT`, `SEARCH`, `SCROLL`, `RECOMMEND`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, turbo, binary, product), SQL-style `WHERE` filters, script execution, and collection dump/restore.

```
qql> INSERT INTO COLLECTION notes VALUES {'text': 'Qdrant is a vector database', 'author': 'alice', 'year': 2024}
Expand Down Expand Up @@ -82,7 +82,7 @@ Full documentation lives in the [`docs/`](docs/) folder and at **[pavanjava.gith
|---|---|
| [Getting Started](docs/getting-started.md) | Installation, connecting, first queries |
| [INSERT / INSERT BULK](docs/insert.md) | Adding documents, batch inserts, payload types |
| [SEARCH / RECOMMEND / Hybrid / RERANK](docs/search.md) | Semantic search, hybrid, reranking, recommendations |
| [SEARCH / SCROLL / RECOMMEND / Hybrid / RERANK](docs/search.md) | Semantic search, pagination, hybrid, reranking, recommendations |
| [WHERE Filters](docs/filters.md) | Full SQL-style filter operators |
| [Collections & Quantization](docs/collections.md) | CREATE, DROP, QUANTIZE (scalar/turbo/binary/product), CREATE INDEX |
| [Scripts: EXECUTE / DUMP](docs/scripts.md) | Script files, collection backup/restore |
Expand All @@ -104,6 +104,11 @@ SEARCH articles SIMILAR TO 'query' LIMIT 10 WHERE year >= 2020
SEARCH articles SIMILAR TO 'query' LIMIT 10 USING HYBRID
SEARCH articles SIMILAR TO 'query' LIMIT 10 USING HYBRID RERANK

-- Scroll
SCROLL FROM articles LIMIT 50
SCROLL FROM articles WHERE year >= 2024 LIMIT 50
SCROLL FROM articles AFTER 'cursor-id' LIMIT 50

-- Recommend
RECOMMEND FROM articles POSITIVE IDS (1001, 1002) LIMIT 5

Expand Down
5 changes: 4 additions & 1 deletion docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,9 @@ SEARCH notes SIMILAR TO 'vector storage engines' LIMIT 3
-- Filter results
SEARCH notes SIMILAR TO 'vector databases' LIMIT 5 WHERE year >= 2023

-- Browse with pagination
SCROLL FROM notes LIMIT 10

-- List all collections
SHOW COLLECTIONS
```
Expand All @@ -147,7 +150,7 @@ SHOW COLLECTIONS
## Next Steps

- [INSERT / INSERT BULK](insert.md) — adding documents
- [SEARCH / RECOMMEND / Hybrid / RERANK](search.md) — querying
- [SEARCH / SCROLL / RECOMMEND / Hybrid / RERANK](search.md) — querying
- [WHERE Filters](filters.md) — payload filtering
- [Collections & Quantization](collections.md) — managing collections
- [Scripts: EXECUTE / DUMP](scripts.md) — automating with script files
Expand Down
4 changes: 2 additions & 2 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -148,8 +148,8 @@ <h3>INSERT / INSERT BULK</h3>
<p>Adding documents, batch inserts, payload types</p>
</a>
<a class="card" href="search">
<h3>SEARCH / RECOMMEND</h3>
<p>Semantic search, hybrid search, reranking, recommendations</p>
<h3>SEARCH / SCROLL / RECOMMEND</h3>
<p>Semantic search, pagination, hybrid search, reranking, recommendations</p>
</a>
<a class="card" href="filters">
<h3>WHERE Filters</h3>
Expand Down
10 changes: 10 additions & 0 deletions docs/programmatic.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,15 @@ result = run_query(
for hit in result.data:
print(hit["score"], hit["payload"])

# Scroll / pagination
result = run_query(
"SCROLL FROM notes LIMIT 2",
url="http://localhost:6333",
)
for point in result.data["points"]:
print(point["id"], point["payload"])
print(result.data["next_offset"])

# Bulk insert (all records embedded and upserted in one call)
result = run_query(
"""INSERT BULK INTO COLLECTION notes VALUES [
Expand Down Expand Up @@ -112,6 +121,7 @@ class ExecutionResult:
| INSERT (hybrid) | `{"id": int \| "<uuid>", "collection": "<name>"}` |
| INSERT BULK | `None` (count in `result.message`) |
| SEARCH | `[{"id": str, "score": float, "payload": dict}, ...]` |
| SCROLL | `{"points": [{"id": str, "payload": dict}, ...], "next_offset": str \| None}` |
| RECOMMEND | `[{"id": str, "score": float, "payload": dict}, ...]` |
| SHOW COLLECTIONS | `["name1", "name2", ...]` |
| CREATE COLLECTION | `None` |
Expand Down
3 changes: 2 additions & 1 deletion docs/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,12 +171,13 @@ Expected output: **375 tests passing**.
| `Connection failed: ...` | Qdrant unreachable at given URL | Check that Qdrant is running and the URL is correct |
| `INSERT requires a 'text' field in VALUES` | `text` key missing from the VALUES dict | Add `'text': '...'` to your dict |
| `Vector dimension mismatch: collection '...' expects X dims, but model produces Y dims` | Model used in INSERT differs from the one used to create the collection | Use `USING MODEL` to specify the same model as the collection was created with |
| `Collection '...' does not exist` | SEARCH / DROP / DELETE on a non-existent collection | Check name spelling or run `SHOW COLLECTIONS` |
| `Collection '...' does not exist` | SEARCH / SCROLL / DROP / DELETE on a non-existent collection | Check name spelling or run `SHOW COLLECTIONS` |
| `Unexpected token '...'; expected a QQL statement keyword` | Unrecognized statement | Check the query syntax; QQL does not support SQL SELECT |
| `Unterminated string literal (at position N)` | A string is missing its closing quote | Close the string with a matching `'` or `"` |
| `Unexpected character '@' (at position N)` | A character not part of QQL syntax | Remove or quote the offending character |
| `Expected a filter operator after field '...'` | Unknown operator in WHERE clause | Use one of: `=`, `!=`, `>`, `>=`, `<`, `<=`, `IN`, `NOT IN`, `BETWEEN`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `MATCH` |
| `Expected ')' ...` | Unclosed parenthesis in WHERE clause | Add the missing `)` to close the group |
| `Qdrant error during SEARCH: ...` | Hybrid search on a non-hybrid collection, or wrong vector names | Ensure the collection was created with `HYBRID` before using `USING HYBRID` in INSERT/SEARCH |
| `Qdrant error during SCROLL: ...` | Qdrant rejected scroll request | Verify collection state, filter, and cursor (`AFTER`) value |
| `Unknown index type '...'` | Invalid schema type in CREATE INDEX | Use one of: `keyword`, `integer`, `float`, `bool`, `text`, `geo`, `datetime` |
| `Qdrant error during CREATE INDEX: ...` | Qdrant rejected the index creation | Check field name and collection state |
28 changes: 27 additions & 1 deletion docs/search.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# SEARCH, RECOMMEND, Hybrid Search & Reranking
# SEARCH, SCROLL, RECOMMEND, Hybrid Search & Reranking

---

Expand Down Expand Up @@ -98,6 +98,32 @@ SEARCH articles SIMILAR TO 'RAG' LIMIT 10 WHERE tag = 'li' WITH { acorn: true }

---

## SCROLL — pagination / browsing

Use `SCROLL` to iterate through points in a collection page by page.

**Syntax:**
```sql
SCROLL FROM <collection_name> LIMIT <n>
SCROLL FROM <collection_name> WHERE <filter> LIMIT <n>
SCROLL FROM <collection_name> AFTER '<point_id>' LIMIT <n>
SCROLL FROM <collection_name> WHERE <filter> AFTER <point_id> LIMIT <n>
```

**Examples:**
```sql
SCROLL FROM articles LIMIT 50
SCROLL FROM articles WHERE year >= 2024 LIMIT 50
SCROLL FROM articles AFTER 'cursor-id' LIMIT 50
```

**Behavior:**
- Returns points in ID order with payloads.
- Returns a `next_offset` cursor when more points are available.
- Use `AFTER <next_offset>` to fetch the next page.

---

## Hybrid Search (USING HYBRID)

Hybrid search combines **dense semantic vectors** and **sparse BM25 keyword vectors** in a single query and merges the results with Qdrant's **Reciprocal Rank Fusion (RRF)** algorithm. This typically outperforms either method alone.
Expand Down
9 changes: 9 additions & 0 deletions src/qql/ast_nodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,14 @@ class ShowCollectionsStmt:
pass


@dataclass(frozen=True)
class ScrollStmt:
collection: str
limit: int
query_filter: FilterExpr | None = None
after: str | int | None = None


@dataclass(frozen=True)
class SearchStmt:
collection: str
Expand Down Expand Up @@ -225,6 +233,7 @@ class DeleteStmt:
| CreateIndexStmt
| DropCollectionStmt
| ShowCollectionsStmt
| ScrollStmt
| SearchStmt
| RecommendStmt
| DeleteStmt
Expand Down
19 changes: 19 additions & 0 deletions src/qql/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,11 @@
[yellow]SHOW COLLECTIONS[/yellow]
List all collections in the connected Qdrant instance.

[yellow]SCROLL FROM[/yellow] <name> [yellow]LIMIT[/yellow] <n>
Paginate points by ID order.
Optional: [yellow]WHERE[/yellow] <filter>
Optional: [yellow]AFTER[/yellow] '<id>'|<int>

[yellow]SEARCH[/yellow] <name> [yellow]SIMILAR TO[/yellow] '<text>' [yellow]LIMIT[/yellow] <n>
Semantic search by vector similarity.
Optional: [yellow]USING MODEL[/yellow] '<model>'
Expand Down Expand Up @@ -400,5 +405,19 @@ def _run_and_print(executor: Executor, query: str) -> None:
console.print(table)
return

# Pretty-print scroll results
if isinstance(result.data, dict) and "points" in result.data and "next_offset" in result.data:
points = result.data["points"]
if points:
table = Table(show_header=True, header_style="bold cyan")
table.add_column("ID")
table.add_column("Payload")
for point in points:
table.add_row(point["id"], str(point["payload"]))
console.print(table)
if result.data["next_offset"] is not None:
console.print(f"[dim]next_offset: {result.data['next_offset']}[/dim]")
return

# Fallback: print data as-is
console.print(result.data)
35 changes: 35 additions & 0 deletions src/qql/executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@
QuantizationConfig,
QuantizationType,
RecommendStmt,
ScrollStmt,
SearchStmt,
SearchWith,
ShowCollectionsStmt,
Expand Down Expand Up @@ -115,6 +116,8 @@ def execute(self, node: ASTNode) -> ExecutionResult:
return self._execute_drop(node)
if isinstance(node, ShowCollectionsStmt):
return self._execute_show(node)
if isinstance(node, ScrollStmt):
return self._execute_scroll(node)
if isinstance(node, SearchStmt):
return self._execute_search(node)
if isinstance(node, RecommendStmt):
Expand Down Expand Up @@ -412,6 +415,38 @@ def _execute_show(self, node: ShowCollectionsStmt) -> ExecutionResult:
data=names,
)

def _execute_scroll(self, node: ScrollStmt) -> ExecutionResult:
if not self._client.collection_exists(node.collection):
raise QQLRuntimeError(f"Collection '{node.collection}' does not exist")

scroll_filter: Filter | None = None
if node.query_filter is not None:
scroll_filter = self._wrap_as_filter(
self._build_qdrant_filter(node.query_filter)
)

try:
records, next_offset = self._client.scroll(
collection_name=node.collection,
scroll_filter=scroll_filter,
limit=node.limit,
offset=node.after,
with_payload=True,
with_vectors=False,
)
except UnexpectedResponse as e:
raise QQLRuntimeError(f"Qdrant error during SCROLL: {e}") from e

points = [
{"id": str(rec.id), "payload": rec.payload or {}}
for rec in records
]
return ExecutionResult(
success=True,
message=f"Scrolled {len(points)} point(s) from '{node.collection}'",
data={"points": points, "next_offset": None if next_offset is None else str(next_offset)},
)

def _execute_search(self, node: SearchStmt) -> ExecutionResult:
if not self._client.collection_exists(node.collection):
raise QQLRuntimeError(f"Collection '{node.collection}' does not exist")
Expand Down
4 changes: 4 additions & 0 deletions src/qql/lexer.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ class TokenKind(Enum):
DROP = auto()
SHOW = auto()
COLLECTIONS = auto()
SCROLL = auto()
SEARCH = auto()
RECOMMEND = auto()
POSITIVE = auto()
Expand All @@ -47,6 +48,7 @@ class TokenKind(Enum):
OFFSET = auto()
SCORE = auto()
THRESHOLD = auto()
AFTER = auto()
LOOKUP = auto()
VECTOR = auto()
DELETE = auto()
Expand Down Expand Up @@ -123,6 +125,7 @@ class TokenKind(Enum):
"DROP": TokenKind.DROP,
"SHOW": TokenKind.SHOW,
"COLLECTIONS": TokenKind.COLLECTIONS,
"SCROLL": TokenKind.SCROLL,
"SEARCH": TokenKind.SEARCH,
"RECOMMEND": TokenKind.RECOMMEND,
"POSITIVE": TokenKind.POSITIVE,
Expand All @@ -135,6 +138,7 @@ class TokenKind(Enum):
"OFFSET": TokenKind.OFFSET,
"SCORE": TokenKind.SCORE,
"THRESHOLD": TokenKind.THRESHOLD,
"AFTER": TokenKind.AFTER,
"LOOKUP": TokenKind.LOOKUP,
"VECTOR": TokenKind.VECTOR,
"DELETE": TokenKind.DELETE,
Expand Down
54 changes: 43 additions & 11 deletions src/qql/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
QuantizationConfig,
QuantizationType,
RecommendStmt,
ScrollStmt,
SearchStmt,
SearchWith,
ShowCollectionsStmt,
Expand Down Expand Up @@ -61,6 +62,8 @@ def parse(self) -> ASTNode:
node = self._parse_drop()
elif tok.kind == TokenKind.SHOW:
node = self._parse_show()
elif tok.kind == TokenKind.SCROLL:
node = self._parse_scroll()
elif tok.kind == TokenKind.SEARCH:
node = self._parse_search()
elif tok.kind == TokenKind.RECOMMEND:
Expand Down Expand Up @@ -288,6 +291,32 @@ def _parse_show(self) -> ShowCollectionsStmt:
self._expect(TokenKind.COLLECTIONS)
return ShowCollectionsStmt()

def _parse_scroll(self) -> ScrollStmt:
self._expect(TokenKind.SCROLL)
self._expect(TokenKind.FROM)
collection = self._parse_identifier()

query_filter: FilterExpr | None = None
after: str | int | None = None

if self._peek().kind == TokenKind.WHERE:
self._advance()
query_filter = self._parse_filter_expr()

if self._peek().kind == TokenKind.AFTER:
self._advance()
after = self._parse_point_id_value("SCROLL AFTER")

self._expect(TokenKind.LIMIT)
limit = int(self._expect(TokenKind.INTEGER).value)

return ScrollStmt(
collection=collection,
limit=limit,
query_filter=query_filter,
after=after,
)

def _parse_search(self) -> SearchStmt:
self._expect(TokenKind.SEARCH)
collection = self._parse_identifier()
Expand Down Expand Up @@ -457,17 +486,7 @@ def _parse_delete(self) -> DeleteStmt:
if self._peek().kind == TokenKind.ID:
self._advance()
self._expect(TokenKind.EQUALS)
tok = self._peek()
if tok.kind == TokenKind.STRING:
self._advance()
point_id: str | int = tok.value
elif tok.kind == TokenKind.INTEGER:
self._advance()
point_id = int(tok.value)
else:
raise QQLSyntaxError(
f"Expected string or integer for point id, got '{tok.value}'", tok.pos
)
point_id = self._parse_point_id_value("DELETE")
return DeleteStmt(collection=collection, point_id=point_id)

query_filter = self._parse_filter_expr()
Expand Down Expand Up @@ -694,6 +713,19 @@ def _parse_point_id_list(self) -> tuple[str | int, ...]:
self._expect(TokenKind.RPAREN)
return tuple(items)

def _parse_point_id_value(self, statement: str) -> str | int:
tok = self._peek()
if tok.kind == TokenKind.STRING:
self._advance()
return tok.value
if tok.kind == TokenKind.INTEGER:
self._advance()
return int(tok.value)
raise QQLSyntaxError(
f"{statement} requires a string or integer point id, got '{tok.value}'",
tok.pos,
)

# ── Dict / value parsers (for INSERT VALUES) ──────────────────────────

def _parse_identifier(self) -> str:
Expand Down
3 changes: 2 additions & 1 deletion src/qql/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
TokenKind.CREATE,
TokenKind.DROP,
TokenKind.SHOW,
TokenKind.SCROLL,
TokenKind.SEARCH,
TokenKind.RECOMMEND,
TokenKind.DELETE,
Expand Down Expand Up @@ -54,7 +55,7 @@ def split_statements(tokens: list[Token]) -> list[list[Token]]:
"""Split a flat token list into per-statement chunks.

A new chunk begins whenever a statement-starter keyword (INSERT, CREATE,
DROP, SHOW, SEARCH, RECOMMEND, DELETE) is encountered at
DROP, SHOW, SCROLL, SEARCH, RECOMMEND, DELETE) is encountered at
brace/bracket/paren depth 0.
The EOF sentinel is consumed and never included in any chunk.
"""
Expand Down
Loading
Loading