Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 118 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@ qql> SEARCH notes SIMILAR TO 'vector databases' LIMIT 5 USING HYBRID
Score │ ID │ Payload
────────┼──────────────────────────────────────┼──────────────────────────────────────
0.9102 │ 3f2e1a4b-8c91-4d0e-b123-abc123def456 │ {'text': 'Qdrant is a ...', 'author': 'alice', 'year': 2024}

qql> SEARCH notes SIMILAR TO 'vector databases' LIMIT 5 USING HYBRID RERANK
✓ Found 1 result(s) (hybrid, reranked)
Score │ ID │ Payload
────────┼──────────────────────────────────────┼──────────────────────────────────────
5.3714 │ 3f2e1a4b-8c91-4d0e-b123-abc123def456 │ {'text': 'Qdrant is a ...', 'author': 'alice', 'year': 2024}
```

---
Expand All @@ -32,6 +38,7 @@ qql> SEARCH notes SIMILAR TO 'vector databases' LIMIT 5 USING HYBRID
- [SEARCH — find similar points](#search--find-similar-points)
- [WHERE Clause Filters](#where-clause-filters)
- [Hybrid Search (USING HYBRID)](#hybrid-search-using-hybrid)
- [Cross-Encoder Reranking (RERANK)](#cross-encoder-reranking-rerank)
- [SHOW COLLECTIONS — list collections](#show-collections--list-collections)
- [CREATE COLLECTION — create a collection](#create-collection--create-a-collection)
- [DROP COLLECTION — delete a collection](#drop-collection--delete-a-collection)
Expand Down Expand Up @@ -244,6 +251,7 @@ SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING MODEL '<model
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> [USING MODEL '<model>'] WHERE <filter>
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING HYBRID
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> USING HYBRID [DENSE MODEL '<model>'] [SPARSE MODEL '<model>'] [WHERE <filter>]
SEARCH <collection_name> SIMILAR TO '<query_text>' LIMIT <n> [USING ...] [WHERE <filter>] RERANK [MODEL '<reranker_model>']
```

**Examples:**
Expand Down Expand Up @@ -508,6 +516,95 @@ Both can be overridden independently with `DENSE MODEL` and `SPARSE MODEL`.

---

### Cross-Encoder Reranking (RERANK)

Appending `RERANK` to any SEARCH statement activates a **second-pass relevance scoring** step using a [cross-encoder](https://www.sbert.net/examples/applications/cross-encoder/README.html) model. Unlike bi-encoders (which encode query and document independently), a cross-encoder processes the **(query, document)** pair jointly, producing a more accurate relevance score at the cost of extra compute.

#### How it works internally

1. Qdrant executes the normal dense or hybrid search, but fetches `LIMIT × 4` candidates instead of just `LIMIT` — giving the reranker enough material to work with.
2. Each candidate's `payload["text"]` is paired with the original query text.
3. The cross-encoder scores all (query, document) pairs in one batch.
4. Results are sorted **descending by cross-encoder score** and sliced to `LIMIT`.
5. The `score` column in the output reflects the cross-encoder relevance score (raw logits — higher is more relevant).

#### Syntax

```
SEARCH <name> SIMILAR TO '<query>' LIMIT <n> RERANK
SEARCH <name> SIMILAR TO '<query>' LIMIT <n> RERANK MODEL '<cross_encoder_model>'
```

`RERANK` must come **after** any `USING` and `WHERE` clauses:

```
SEARCH ... LIMIT n [USING ...] [WHERE ...] RERANK [MODEL '...']
```

#### Examples

Dense search + rerank (default cross-encoder):
```sql
SEARCH articles SIMILAR TO 'machine learning for healthcare' LIMIT 5 RERANK
```

Hybrid search + rerank (best of all three worlds):
```sql
SEARCH articles SIMILAR TO 'attention mechanism in transformers' LIMIT 10 USING HYBRID RERANK
```

Dense search + WHERE filter + rerank:
```sql
SEARCH articles SIMILAR TO 'deep learning' LIMIT 10 WHERE year > 2020 RERANK
```

Custom cross-encoder model:
```sql
SEARCH articles SIMILAR TO 'semantic search' LIMIT 5
RERANK MODEL 'cross-encoder/ms-marco-MiniLM-L-6-v2'
```

All clauses combined:
```sql
SEARCH articles SIMILAR TO 'neural IR' LIMIT 10
USING HYBRID DENSE MODEL 'BAAI/bge-base-en-v1.5'
WHERE year >= 2020
RERANK MODEL 'cross-encoder/ms-marco-MiniLM-L-6-v2'
```

#### Default cross-encoder model

```
cross-encoder/ms-marco-MiniLM-L-6-v2
```

- A lightweight but effective passage reranker fine-tuned on MS MARCO.
- Downloaded on first use and cached locally by Fastembed.
- No additional packages needed — `TextCrossEncoder` is included in the `fastembed` package.

#### Commonly available cross-encoder models (Fastembed)

| Model | Notes |
|---|---|
| `cross-encoder/ms-marco-MiniLM-L-6-v2` | Default. Fast and accurate for passage reranking |
| `cross-encoder/ms-marco-MiniLM-L-12-v2` | Larger, higher quality, slower |
| `BAAI/bge-reranker-base` | BGE reranker, strong general-purpose performance |
| `BAAI/bge-reranker-large` | Highest quality BGE reranker, slower |

#### When to use RERANK

| Situation | Recommendation |
|---|---|
| High-precision retrieval (legal, medical, research) | Add `RERANK` |
| Small LIMIT (top-3 or top-5 results) | Very effective — reranker focuses precision |
| Low latency required | Skip `RERANK` (adds ~100–500 ms per batch) |
| Large collections with keyword-heavy queries | `USING HYBRID RERANK` for best coverage + precision |
| General-purpose semantic search | Optional; `RERANK` improves quality at mild cost |

> **Note on scores:** After reranking, the `score` column shows the cross-encoder's raw logit (can be any real number, unbounded). Do not compare reranked scores to non-reranked cosine similarity scores — they are on different scales.

---

### SHOW COLLECTIONS — list collections

Lists all collections in the connected Qdrant instance.
Expand Down Expand Up @@ -670,6 +767,25 @@ SEARCH docs SIMILAR TO 'hello' LIMIT 5
| `prithivida/Splade_PP_en_v1` | SPLADE++ — strong keyword + semantic overlap |
| `Qdrant/Unicoil` | UniCOIL sparse encoder |

### Cross-encoder reranking (RERANK default)

```
cross-encoder/ms-marco-MiniLM-L-6-v2
```

- A passage reranker fine-tuned on MS MARCO.
- No new dependencies — `TextCrossEncoder` is included in the `fastembed` package.
- Override with `RERANK MODEL '<model_name>'`.

### Commonly available cross-encoder models (Fastembed)

| Model | Notes |
|---|---|
| `cross-encoder/ms-marco-MiniLM-L-6-v2` | Default. Fast passage reranker |
| `cross-encoder/ms-marco-MiniLM-L-12-v2` | Larger, higher quality |
| `BAAI/bge-reranker-base` | Strong general-purpose reranker |
| `BAAI/bge-reranker-large` | Highest quality, slower |

> Models are downloaded automatically on first use and cached by Fastembed. Loading a new model for the first time takes a few seconds.

### Model consistency rule
Expand Down Expand Up @@ -847,7 +963,7 @@ qql/
│ ├── lexer.py # Tokenizer: string → List[Token]
│ ├── ast_nodes.py # Frozen dataclasses for each statement and filter type
│ ├── parser.py # Recursive descent parser: tokens → AST node
│ ├── embedder.py # Embedder (dense) + SparseEmbedder (BM25) with per-model cache
│ ├── embedder.py # Embedder (dense) + SparseEmbedder (BM25) + CrossEncoderEmbedder (rerank)
│ └── executor.py # AST node → Qdrant client call + filter + hybrid search
└── tests/
├── test_lexer.py # Tokenizer unit tests (keywords, operators, dot-paths, hybrid tokens)
Expand All @@ -865,7 +981,7 @@ Tests do not require a running Qdrant instance — the Qdrant client is mocked.
pytest tests/ -v
```

Expected output: **169 tests passing**.
Expected output: **193 tests passing**.

---

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "qql-cli"
version = "1.0.0"
version = "1.1.0"
description = "A SQL-like query language CLI wrapper for Qdrant vector database"
readme = "README.md"
license = { file = "LICENSE" }
Expand Down
2 changes: 2 additions & 0 deletions src/qql/ast_nodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,8 @@ class SearchStmt:
hybrid: bool = False # if True, use prefetch+RRF hybrid search
sparse_model: str | None = None # sparse model for hybrid; None → SparseEmbedder.DEFAULT_MODEL
query_filter: FilterExpr | None = None # optional WHERE clause; default keeps existing tests valid
rerank: bool = False # if True, apply cross-encoder reranking post-Qdrant
rerank_model: str | None = None # cross-encoder model; None → CrossEncoderEmbedder.DEFAULT_MODEL


@dataclass(frozen=True)
Expand Down
1 change: 1 addition & 0 deletions src/qql/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
Optional: [yellow]USING MODEL[/yellow] '<model>'
Optional: [yellow]USING HYBRID[/yellow] [DENSE MODEL '<model>'] [SPARSE MODEL '<model>']
Optional: [yellow]WHERE[/yellow] <filter> (e.g. WHERE year > 2020 AND status = 'ok')
Optional: [yellow]RERANK[/yellow] [MODEL '<model>'] rerank results with a cross-encoder

[yellow]DELETE FROM[/yellow] <name> [yellow]WHERE id =[/yellow] '<id>'
Delete a point by its ID.
Expand Down
31 changes: 31 additions & 0 deletions src/qql/embedder.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,34 @@ def query_embed(self, text: str) -> dict[str, list]:
"""Embed a query string (BM25 applies different IDF weighting at query time)."""
result = next(iter(self._model.query_embed(text))) # type: ignore[attr-defined]
return {"indices": result.indices.tolist(), "values": result.values.tolist()}


class CrossEncoderEmbedder:
"""Cross-encoder reranker using fastembed.TextCrossEncoder.

Jointly encodes (query, document) pairs to produce relevance scores.
Higher score = more relevant. No new package dependencies —
TextCrossEncoder is included in the fastembed package bundled with
qdrant-client[fastembed].
"""

DEFAULT_MODEL = "cross-encoder/ms-marco-MiniLM-L-6-v2"

# Class-level cache mirrors Embedder's pattern
_cache: dict[str, object] = {}

def __init__(self, model_name: str = DEFAULT_MODEL) -> None:
self._model_name = model_name
if model_name not in CrossEncoderEmbedder._cache:
from fastembed import TextCrossEncoder

CrossEncoderEmbedder._cache[model_name] = TextCrossEncoder(model_name)
self._model = CrossEncoderEmbedder._cache[model_name]

def rerank(self, query: str, documents: list[str]) -> list[float]:
"""Return a relevance score for each (query, document) pair.

Scores are raw logits — higher means more relevant.
The returned list is the same length as ``documents`` and in the same order.
"""
return list(self._model.rerank(query, documents)) # type: ignore[attr-defined]
46 changes: 43 additions & 3 deletions src/qql/executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,9 @@
ShowCollectionsStmt,
)
from .config import QQLConfig
from .embedder import Embedder, SparseEmbedder
from .embedder import CrossEncoderEmbedder, Embedder, SparseEmbedder

_RERANK_FETCH_MULTIPLIER = 4
from .exceptions import QQLRuntimeError


Expand Down Expand Up @@ -234,6 +236,10 @@ def _execute_search(self, node: SearchStmt) -> ExecutionResult:
self._build_qdrant_filter(node.query_filter)
)

# When reranking is requested, fetch more candidates so the reranker has
# enough material to reorder; only `node.limit` results are returned.
fetch_limit = node.limit * _RERANK_FETCH_MULTIPLIER if node.rerank else node.limit

# ── Hybrid SEARCH: prefetch dense+sparse, fuse with RRF ───────────
if node.hybrid:
dense_model = node.model or self._config.default_model
Expand Down Expand Up @@ -264,7 +270,7 @@ def _execute_search(self, node: SearchStmt) -> ExecutionResult:
),
],
query=FusionQuery(fusion=Fusion.RRF),
limit=node.limit,
limit=fetch_limit,
query_filter=qdrant_filter,
)
except UnexpectedResponse as e:
Expand All @@ -274,6 +280,15 @@ def _execute_search(self, node: SearchStmt) -> ExecutionResult:
{"id": str(h.id), "score": round(h.score, 4), "payload": h.payload}
for h in response.points
]

if node.rerank:
results = self._apply_reranking(node.query_text, results, node.limit, node.rerank_model)
return ExecutionResult(
success=True,
message=f"Found {len(results)} result(s) (hybrid, reranked)",
data=results,
)

return ExecutionResult(
success=True,
message=f"Found {len(results)} result(s) (hybrid)",
Expand All @@ -289,7 +304,7 @@ def _execute_search(self, node: SearchStmt) -> ExecutionResult:
response = self._client.query_points(
collection_name=node.collection,
query=vector,
limit=node.limit,
limit=fetch_limit,
query_filter=qdrant_filter,
)
except UnexpectedResponse as e:
Expand All @@ -299,12 +314,37 @@ def _execute_search(self, node: SearchStmt) -> ExecutionResult:
{"id": str(h.id), "score": round(h.score, 4), "payload": h.payload}
for h in response.points
]

if node.rerank:
results = self._apply_reranking(node.query_text, results, node.limit, node.rerank_model)
return ExecutionResult(
success=True,
message=f"Found {len(results)} result(s) (reranked)",
data=results,
)

return ExecutionResult(
success=True,
message=f"Found {len(results)} result(s)",
data=results,
)

def _apply_reranking(
self,
query: str,
results: list[dict],
limit: int,
rerank_model: str | None,
) -> list[dict]:
"""Re-score candidates with a cross-encoder and return top-``limit`` results."""
model_name = rerank_model or CrossEncoderEmbedder.DEFAULT_MODEL
reranker = CrossEncoderEmbedder(model_name)
texts = [r["payload"].get("text", "") for r in results]
scores = reranker.rerank(query, texts)
for r, s in zip(results, scores):
r["score"] = round(float(s), 4)
return sorted(results, key=lambda r: r["score"], reverse=True)[:limit]

def _execute_delete(self, node: DeleteStmt) -> ExecutionResult:
if not self._client.collection_exists(node.collection):
raise QQLRuntimeError(f"Collection '{node.collection}' does not exist")
Expand Down
2 changes: 2 additions & 0 deletions src/qql/lexer.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ class TokenKind(Enum):
HYBRID = auto()
DENSE = auto()
SPARSE = auto()
RERANK = auto()
CREATE = auto()
DROP = auto()
SHOW = auto()
Expand Down Expand Up @@ -75,6 +76,7 @@ class TokenKind(Enum):
"HYBRID": TokenKind.HYBRID,
"DENSE": TokenKind.DENSE,
"SPARSE": TokenKind.SPARSE,
"RERANK": TokenKind.RERANK,
"CREATE": TokenKind.CREATE,
"DROP": TokenKind.DROP,
"SHOW": TokenKind.SHOW,
Expand Down
10 changes: 10 additions & 0 deletions src/qql/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,14 @@ def _parse_search(self) -> SearchStmt:
if self._peek().kind == TokenKind.WHERE:
self._advance() # consume WHERE
query_filter = self._parse_filter_expr()
rerank: bool = False
rerank_model: str | None = None
if self._peek().kind == TokenKind.RERANK:
self._advance() # consume RERANK
rerank = True
if self._peek().kind == TokenKind.MODEL:
self._advance() # consume MODEL
rerank_model = self._expect(TokenKind.STRING).value
return SearchStmt(
collection=collection,
query_text=query_text,
Expand All @@ -162,6 +170,8 @@ def _parse_search(self) -> SearchStmt:
hybrid=hybrid,
sparse_model=sparse_model,
query_filter=query_filter,
rerank=rerank,
rerank_model=rerank_model,
)

def _parse_delete(self) -> DeleteStmt:
Expand Down
Loading
Loading