Skip to content

Commit fdf4843

Browse files
committed
feat: Full Qdrant Recommend API support
Add OFFSET, SCORE THRESHOLD, WITH clause, LOOKUP FROM, and USING to RECOMMEND statements, closing the gap with the Qdrant query_points recommend surface. Parser: - Parse LOOKUP FROM <collection> [VECTOR '<name>'] for cross-collection recommendation - Parse USING '<vector_name>' to target a specific named vector - Parse OFFSET <n> for pagination - Parse SCORE THRESHOLD <f> for minimum score filtering - Parse WITH { exact: true, hnsw_ef: <n> } for query-time search params Executor: - Wire offset, score_threshold, search_params, using, and lookup_from to Qdrant query_points() - Use LookupLocation for cross-collection ID lookups Tests: - Parser coverage for all new clauses and combined forms - Executor coverage verifying forwarding to the Qdrant client Docs: - Update README with full RECOMMEND syntax reference - Update sample_v2.qql with OFFSET, SCORE THRESHOLD, WITH, and USING examples
1 parent e95465d commit fdf4843

8 files changed

Lines changed: 346 additions & 10 deletions

File tree

README.md

Lines changed: 58 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -392,10 +392,16 @@ This is useful when you already know which stored points represent the kind of r
392392

393393
**Syntax:**
394394
```sql
395-
RECOMMEND FROM <collection_name> POSITIVE IDS (1001, 1002) LIMIT <n>
396-
RECOMMEND FROM <collection_name> POSITIVE IDS (1001, 1002) NEGATIVE IDS (1003) LIMIT <n>
397-
RECOMMEND FROM <collection_name> POSITIVE IDS (1001) STRATEGY 'best_score' LIMIT <n>
398-
RECOMMEND FROM <collection_name> POSITIVE IDS (1001) LIMIT <n> WHERE <filter>
395+
RECOMMEND FROM <collection_name> POSITIVE IDS (<id>, ...) LIMIT <n>
396+
RECOMMEND FROM <collection_name> POSITIVE IDS (<id>, ...) NEGATIVE IDS (<id>, ...) LIMIT <n>
397+
RECOMMEND FROM <collection_name> POSITIVE IDS (<id>, ...) STRATEGY '<strategy>' LIMIT <n>
398+
RECOMMEND FROM <collection_name> POSITIVE IDS (<id>, ...) LIMIT <n> WHERE <filter>
399+
RECOMMEND FROM <collection_name> POSITIVE IDS (<id>, ...) LIMIT <n> OFFSET <n>
400+
RECOMMEND FROM <collection_name> POSITIVE IDS (<id>, ...) LIMIT <n> SCORE THRESHOLD <f>
401+
RECOMMEND FROM <collection_name> POSITIVE IDS (<id>, ...) LIMIT <n> WITH { exact: true, hnsw_ef: <n> }
402+
RECOMMEND FROM <collection_name> POSITIVE IDS (<id>, ...) LIMIT <n> LOOKUP FROM <collection>
403+
RECOMMEND FROM <collection_name> POSITIVE IDS (<id>, ...) LIMIT <n> LOOKUP FROM <collection> VECTOR '<name>'
404+
RECOMMEND FROM <collection_name> POSITIVE IDS (<id>, ...) LIMIT <n> USING '<vector_name>'
399405
```
400406

401407
**Examples:**
@@ -420,12 +426,60 @@ Recommend only within a filtered subset:
420426
RECOMMEND FROM articles POSITIVE IDS (1001) LIMIT 5 WHERE year >= 2020 AND status = 'published'
421427
```
422428

429+
Paginate recommendations (skip first 5, return next 10):
430+
```sql
431+
RECOMMEND FROM articles POSITIVE IDS (1001) LIMIT 10 OFFSET 5
432+
```
433+
434+
Filter out low-confidence recommendations:
435+
```sql
436+
RECOMMEND FROM articles POSITIVE IDS (1001) LIMIT 10 SCORE THRESHOLD 0.5
437+
```
438+
439+
Exact KNN baseline for recommendations:
440+
```sql
441+
RECOMMEND FROM articles POSITIVE IDS (1001) LIMIT 5 WITH { exact: true }
442+
```
443+
444+
Cross-collection recommend (look up example IDs from another collection):
445+
```sql
446+
RECOMMEND FROM target_collection
447+
POSITIVE IDS ('a')
448+
LOOKUP FROM source_collection VECTOR 'dense'
449+
LIMIT 5
450+
```
451+
452+
Recommend using a specific named vector in the target collection:
453+
```sql
454+
RECOMMEND FROM articles
455+
POSITIVE IDS (1001)
456+
USING 'sparse'
457+
LIMIT 5
458+
```
459+
460+
Full-featured recommend:
461+
```sql
462+
RECOMMEND FROM articles
463+
POSITIVE IDS (1001, 1002)
464+
NEGATIVE IDS (1009)
465+
STRATEGY 'best_score'
466+
LOOKUP FROM other_collection VECTOR 'dense'
467+
USING 'dense'
468+
LIMIT 10
469+
OFFSET 5
470+
SCORE THRESHOLD 0.5
471+
WHERE year >= 2020
472+
WITH { exact: true }
473+
```
474+
423475
**Supported strategies:**
424476

425477
- `average_vector`
426478
- `best_score`
427479
- `sum_scores`
428480

481+
**Clause order:** `POSITIVE IDS``NEGATIVE IDS``STRATEGY``LOOKUP FROM``USING``LIMIT``OFFSET``SCORE THRESHOLD``WHERE``WITH`
482+
429483
---
430484

431485
### Query-Time Search Params (`EXACT`, `WITH`)

resources/sample_v2.qql

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,25 @@ RECOMMEND FROM qql_sample_v2
7474
LIMIT 3
7575
WHERE department = 'neurology'
7676

77+
-- Recommend with pagination and score threshold
78+
RECOMMEND FROM qql_sample_v2
79+
POSITIVE IDS (2001)
80+
LIMIT 5
81+
OFFSET 2
82+
SCORE THRESHOLD 0.3
83+
84+
-- Recommend with exact KNN baseline
85+
RECOMMEND FROM qql_sample_v2
86+
POSITIVE IDS (2001)
87+
LIMIT 3
88+
WITH { exact: true }
89+
90+
-- Recommend using sparse vector instead of dense
91+
RECOMMEND FROM qql_sample_v2_hybrid
92+
POSITIVE IDS (4001)
93+
LIMIT 3
94+
USING 'sparse'
95+
7796
-- Hybrid collection
7897
CREATE COLLECTION qql_sample_v2_hybrid HYBRID
7998

src/qql/ast_nodes.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,11 @@ class RecommendStmt:
178178
limit: int = 10
179179
strategy: str | None = None
180180
query_filter: FilterExpr | None = None
181+
offset: int = 0
182+
score_threshold: float | None = None
183+
with_clause: SearchWith | None = None
184+
lookup_from: tuple[str, str | None] | None = None
185+
using: str | None = None
181186

182187

183188
@dataclass(frozen=True)

src/qql/executor.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
HasIdCondition,
1818
IsEmptyCondition,
1919
IsNullCondition,
20+
LookupLocation,
2021
MatchAny,
2122
MatchExcept,
2223
MatchPhrase,
@@ -509,12 +510,26 @@ def _execute_recommend(self, node: RecommendStmt) -> ExecutionResult:
509510
strategy=self._parse_recommend_strategy(node.strategy),
510511
)
511512

513+
search_params = self._build_search_params(node.with_clause)
514+
515+
lookup_from: LookupLocation | None = None
516+
if node.lookup_from is not None:
517+
lookup_from = LookupLocation(
518+
collection=node.lookup_from[0],
519+
vector=node.lookup_from[1],
520+
)
521+
512522
try:
513523
response = self._client.query_points(
514524
collection_name=node.collection,
515525
query=RecommendQuery(recommend=recommend_input),
516526
limit=node.limit,
527+
offset=node.offset or None,
517528
query_filter=qdrant_filter,
529+
search_params=search_params,
530+
score_threshold=node.score_threshold,
531+
using=node.using,
532+
lookup_from=lookup_from,
518533
)
519534
except UnexpectedResponse as e:
520535
raise QQLRuntimeError(f"Qdrant error during RECOMMEND: {e}") from e

src/qql/lexer.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,11 @@ class TokenKind(Enum):
3333
SIMILAR = auto()
3434
TO = auto()
3535
LIMIT = auto()
36+
OFFSET = auto()
37+
SCORE = auto()
38+
THRESHOLD = auto()
39+
LOOKUP = auto()
40+
VECTOR = auto()
3641
DELETE = auto()
3742
FROM = auto()
3843
WHERE = auto()
@@ -103,6 +108,11 @@ class TokenKind(Enum):
103108
"SIMILAR": TokenKind.SIMILAR,
104109
"TO": TokenKind.TO,
105110
"LIMIT": TokenKind.LIMIT,
111+
"OFFSET": TokenKind.OFFSET,
112+
"SCORE": TokenKind.SCORE,
113+
"THRESHOLD": TokenKind.THRESHOLD,
114+
"LOOKUP": TokenKind.LOOKUP,
115+
"VECTOR": TokenKind.VECTOR,
106116
"DELETE": TokenKind.DELETE,
107117
"FROM": TokenKind.FROM,
108118
"WHERE": TokenKind.WHERE,

src/qql/parser.py

Lines changed: 60 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -297,21 +297,58 @@ def _parse_recommend(self) -> RecommendStmt:
297297
self._advance()
298298
strategy = self._expect(TokenKind.STRING).value
299299

300+
lookup_from: tuple[str, str | None] | None = None
301+
if self._peek().kind == TokenKind.LOOKUP:
302+
self._advance()
303+
self._expect(TokenKind.FROM)
304+
lookup_collection = self._parse_identifier()
305+
lookup_vector: str | None = None
306+
if self._peek().kind == TokenKind.VECTOR:
307+
self._advance()
308+
lookup_vector = self._expect(TokenKind.STRING).value
309+
lookup_from = (lookup_collection, lookup_vector)
310+
311+
using: str | None = None
312+
if self._peek().kind == TokenKind.USING:
313+
self._advance()
314+
using = self._expect(TokenKind.STRING).value
315+
300316
self._expect(TokenKind.LIMIT)
301317
limit = int(self._expect(TokenKind.INTEGER).value)
302318

319+
offset: int = 0
320+
if self._peek().kind == TokenKind.OFFSET:
321+
self._advance()
322+
offset = int(self._expect(TokenKind.INTEGER).value)
323+
324+
score_threshold: float | None = None
325+
if self._peek().kind == TokenKind.SCORE:
326+
self._advance()
327+
self._expect(TokenKind.THRESHOLD)
328+
score_threshold = float(self._expect(TokenKind.FLOAT).value)
329+
303330
query_filter: FilterExpr | None = None
304331
if self._peek().kind == TokenKind.WHERE:
305332
self._advance()
306333
query_filter = self._parse_filter_expr()
307334

335+
with_clause: SearchWith | None = None
336+
if self._peek().kind == TokenKind.WITH:
337+
self._advance()
338+
with_clause = self._parse_with_clause()
339+
308340
return RecommendStmt(
309341
collection=collection,
310342
positive_ids=positive_ids,
311343
negative_ids=negative_ids,
312344
limit=limit,
313345
strategy=strategy,
314346
query_filter=query_filter,
347+
offset=offset,
348+
score_threshold=score_threshold,
349+
with_clause=with_clause,
350+
lookup_from=lookup_from,
351+
using=using,
315352
)
316353

317354
def _parse_delete(self) -> DeleteStmt:
@@ -456,12 +493,29 @@ def _parse_predicate(self) -> FilterExpr:
456493
def _parse_field_path(self) -> str:
457494
"""Dot-notation paths are already single IDENTIFIER tokens from the lexer."""
458495
tok = self._peek()
459-
if tok.kind != TokenKind.IDENTIFIER:
460-
raise QQLSyntaxError(
461-
f"Expected a field name, got '{tok.value}'", tok.pos
462-
)
463-
self._advance()
464-
return tok.value
496+
if tok.kind == TokenKind.IDENTIFIER:
497+
self._advance()
498+
return tok.value
499+
# Allow bare keywords to serve as field names (e.g. score, limit),
500+
# but not filter operator keywords or literal tokens.
501+
if tok.kind not in {
502+
TokenKind.AND, TokenKind.OR, TokenKind.NOT,
503+
TokenKind.IN, TokenKind.BETWEEN, TokenKind.IS,
504+
TokenKind.NULL, TokenKind.EMPTY, TokenKind.MATCH,
505+
TokenKind.ANY, TokenKind.PHRASE,
506+
TokenKind.STRING, TokenKind.INTEGER, TokenKind.FLOAT,
507+
TokenKind.LPAREN, TokenKind.RPAREN,
508+
TokenKind.LBRACE, TokenKind.RBRACE,
509+
TokenKind.LBRACKET, TokenKind.RBRACKET,
510+
TokenKind.COMMA, TokenKind.COLON, TokenKind.EQUALS,
511+
TokenKind.NOT_EQUALS, TokenKind.GT, TokenKind.GTE,
512+
TokenKind.LT, TokenKind.LTE, TokenKind.EOF,
513+
}:
514+
self._advance()
515+
return tok.value
516+
raise QQLSyntaxError(
517+
f"Expected a field name, got '{tok.value}'", tok.pos
518+
)
465519

466520
def _parse_literal(self) -> str | int | float:
467521
"""STRING | INTEGER | FLOAT"""

tests/test_executor.py

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -516,6 +516,111 @@ def test_recommend_nonexistent_collection_raises(self, executor, mock_client):
516516
with pytest.raises(QQLRuntimeError, match="does not exist"):
517517
executor.execute(node)
518518

519+
def test_recommend_forwards_offset(self, executor, mock_client, mocker):
520+
mock_client.collection_exists.return_value = True
521+
mock_response = mocker.MagicMock()
522+
mock_response.points = []
523+
mock_client.query_points.return_value = mock_response
524+
525+
node = RecommendStmt(
526+
collection="notes", positive_ids=("a",), limit=5, offset=10
527+
)
528+
executor.execute(node)
529+
assert mock_client.query_points.call_args.kwargs["offset"] == 10
530+
531+
def test_recommend_forwards_score_threshold(self, executor, mock_client, mocker):
532+
mock_client.collection_exists.return_value = True
533+
mock_response = mocker.MagicMock()
534+
mock_response.points = []
535+
mock_client.query_points.return_value = mock_response
536+
537+
node = RecommendStmt(
538+
collection="notes", positive_ids=("a",), limit=5, score_threshold=0.5
539+
)
540+
executor.execute(node)
541+
assert mock_client.query_points.call_args.kwargs["score_threshold"] == pytest.approx(0.5)
542+
543+
def test_recommend_forwards_using(self, executor, mock_client, mocker):
544+
mock_client.collection_exists.return_value = True
545+
mock_response = mocker.MagicMock()
546+
mock_response.points = []
547+
mock_client.query_points.return_value = mock_response
548+
549+
node = RecommendStmt(
550+
collection="notes", positive_ids=("a",), limit=5, using="sparse"
551+
)
552+
executor.execute(node)
553+
assert mock_client.query_points.call_args.kwargs["using"] == "sparse"
554+
555+
def test_recommend_forwards_lookup_from(self, executor, mock_client, mocker):
556+
from qdrant_client.models import LookupLocation
557+
558+
mock_client.collection_exists.return_value = True
559+
mock_response = mocker.MagicMock()
560+
mock_response.points = []
561+
mock_client.query_points.return_value = mock_response
562+
563+
node = RecommendStmt(
564+
collection="notes",
565+
positive_ids=("a",),
566+
limit=5,
567+
lookup_from=("source", "dense"),
568+
)
569+
executor.execute(node)
570+
lookup = mock_client.query_points.call_args.kwargs["lookup_from"]
571+
assert isinstance(lookup, LookupLocation)
572+
assert lookup.collection == "source"
573+
assert lookup.vector == "dense"
574+
575+
def test_recommend_forwards_lookup_from_without_vector(self, executor, mock_client, mocker):
576+
from qdrant_client.models import LookupLocation
577+
578+
mock_client.collection_exists.return_value = True
579+
mock_response = mocker.MagicMock()
580+
mock_response.points = []
581+
mock_client.query_points.return_value = mock_response
582+
583+
node = RecommendStmt(
584+
collection="notes",
585+
positive_ids=("a",),
586+
limit=5,
587+
lookup_from=("source", None),
588+
)
589+
executor.execute(node)
590+
lookup = mock_client.query_points.call_args.kwargs["lookup_from"]
591+
assert isinstance(lookup, LookupLocation)
592+
assert lookup.collection == "source"
593+
assert lookup.vector is None
594+
595+
def test_recommend_forwards_search_params(self, executor, mock_client, mocker):
596+
mock_client.collection_exists.return_value = True
597+
mock_response = mocker.MagicMock()
598+
mock_response.points = []
599+
mock_client.query_points.return_value = mock_response
600+
601+
node = RecommendStmt(
602+
collection="notes",
603+
positive_ids=("a",),
604+
limit=5,
605+
with_clause=SearchWith(exact=True, hnsw_ef=128),
606+
)
607+
executor.execute(node)
608+
search_params = mock_client.query_points.call_args.kwargs["search_params"]
609+
assert search_params.exact is True
610+
assert search_params.hnsw_ef == 128
611+
612+
def test_recommend_offset_zero_passes_none(self, executor, mock_client, mocker):
613+
mock_client.collection_exists.return_value = True
614+
mock_response = mocker.MagicMock()
615+
mock_response.points = []
616+
mock_client.query_points.return_value = mock_response
617+
618+
node = RecommendStmt(
619+
collection="notes", positive_ids=("a",), limit=5, offset=0
620+
)
621+
executor.execute(node)
622+
assert mock_client.query_points.call_args.kwargs["offset"] is None
623+
519624

520625
class TestDelete:
521626
def test_delete_calls_qdrant_delete(self, executor, mock_client):

0 commit comments

Comments
 (0)