Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,499 changes: 50 additions & 1,449 deletions README.md

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions docs/_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
theme: minima
title: "QQL — Qdrant Query Language"
description: "SQL-like query language and CLI for Qdrant vector database — INSERT, SEARCH, hybrid search, reranking, quantization, and more."
url: "https://pavanjava.github.io/qql"
baseurl: "/qql"
repository: "pavanjava/qql"

# Disable Jekyll processing of the HTML file (it has its own styling)
include:
- index.html
216 changes: 216 additions & 0 deletions docs/collections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
# Managing Collections

---

## SHOW COLLECTIONS — list collections

Lists all collections in the connected Qdrant instance.

```sql
SHOW COLLECTIONS
```

**Output:**
```
✓ 3 collection(s) found
┌──────────────────┐
│ Collection │
├──────────────────┤
│ articles │
│ notes │
│ products │
└──────────────────┘
```

---

## CREATE COLLECTION — create a collection

Explicitly creates a new empty collection. Collections are also created automatically on the first INSERT, so this command is optional — use it when you want to pre-create a collection before inserting data.

**Syntax:**
```
CREATE COLLECTION <collection_name>
CREATE COLLECTION <collection_name> HYBRID
CREATE COLLECTION <collection_name> USING MODEL '<model_name>'
CREATE COLLECTION <collection_name> USING HYBRID
CREATE COLLECTION <collection_name> USING HYBRID DENSE MODEL '<model>'
```

Any of the above forms can be followed by an optional `QUANTIZE` clause — see [Quantization](#quantization--quantize-clause) below.

**Examples:**

Dense-only collection (standard, uses default model dimensions):
```sql
CREATE COLLECTION research_papers
```

Dense-only collection pinned to a specific model (768-dimensional):
```sql
CREATE COLLECTION research_papers USING MODEL 'BAAI/bge-base-en-v1.5'
```

Hybrid collection (dense + sparse BM25, default models):
```sql
CREATE COLLECTION research_papers HYBRID
```

Hybrid collection with a custom dense model:
```sql
CREATE COLLECTION research_papers USING HYBRID DENSE MODEL 'BAAI/bge-base-en-v1.5'
```

When `USING MODEL` is omitted, the collection uses the **default embedding model's dimensions** (384 for `all-MiniLM-L6-v2`). If the collection already exists, the command succeeds with a message and does nothing.

---

## Quantization — QUANTIZE clause

Quantization reduces the memory footprint of vector collections and speeds up search at the cost of a small, controllable accuracy loss. QQL supports all three Qdrant quantization strategies via an optional `QUANTIZE` clause appended to `CREATE COLLECTION`.

**Three strategies:**

| Type | Compression | Accuracy Loss | Best For |
|---|---|---|---|
| `SCALAR` | 4× (float32 → int8) | < 1% | Most collections — best balance |
| `BINARY` | 32× (float32 → 1-bit) | Higher | High-dimensional vectors (768+), speed priority |
| `PRODUCT` | 4× (configurable) | Variable | Memory-constrained deployments |

**Full syntax:**
```
CREATE COLLECTION <name> ... QUANTIZE SCALAR [QUANTILE <0.0–1.0>] [ALWAYS RAM]
CREATE COLLECTION <name> ... QUANTIZE BINARY [ALWAYS RAM]
CREATE COLLECTION <name> ... QUANTIZE PRODUCT [ALWAYS RAM]
```

- **`QUANTILE <float>`** — (scalar only) calibration quantile for the INT8 conversion; defaults to Qdrant's built-in default (0.99) when omitted.
- **`ALWAYS RAM`** — keep the **quantized** vectors in RAM at all times, regardless of the collection's `on_disk` setting. Improves search throughput at the cost of higher RAM usage for the compressed index. The original full-precision vectors are stored and managed independently of this flag. Supported by all three quantization types.
- **`QUANTIZE`** always appears **after** all other clauses (`HYBRID`, `USING MODEL`, etc.).
- For `PRODUCT`, the compression ratio is fixed at **4×** in this version.
- When used with `HYBRID` collections, quantization applies only to the **dense** vector.

**Examples:**

Scalar quantization (recommended default):
```sql
CREATE COLLECTION research_papers QUANTIZE SCALAR
```

Scalar with explicit calibration and quantized vectors pinned to RAM:
```sql
CREATE COLLECTION research_papers QUANTIZE SCALAR QUANTILE 0.95 ALWAYS RAM
```

Binary quantization for large high-dimensional embeddings:
```sql
CREATE COLLECTION research_papers QUANTIZE BINARY
```

Product quantization for maximum memory savings:
```sql
CREATE COLLECTION research_papers QUANTIZE PRODUCT ALWAYS RAM
```

Combined with hybrid collection:
```sql
CREATE COLLECTION research_papers HYBRID QUANTIZE SCALAR
```

Combined with a pinned model:
```sql
CREATE COLLECTION research_papers USING MODEL 'BAAI/bge-base-en-v1.5' QUANTIZE SCALAR QUANTILE 0.99
```

**Valid combinations:**

| Base form | + QUANTIZE SCALAR | + QUANTIZE BINARY | + QUANTIZE PRODUCT |
|---|---|---|---|
| `CREATE COLLECTION name` | ✓ | ✓ | ✓ |
| `... HYBRID` | ✓ | ✓ | ✓ |
| `... USING MODEL 'x'` | ✓ | ✓ | ✓ |
| `... USING HYBRID` | ✓ | ✓ | ✓ |
| `... USING HYBRID DENSE MODEL 'x'` | ✓ | ✓ | ✓ |

> INSERT and SEARCH on quantized collections work exactly the same as on non-quantized ones — no changes to INSERT or SEARCH syntax are needed.

---

## CREATE INDEX — create a payload index

Creates a payload index on a collection field. Payload indexes speed up `WHERE` clause filtering by allowing Qdrant to efficiently match on indexed fields.

**Syntax:**
```
CREATE INDEX ON COLLECTION <collection_name> FOR <field_name> TYPE <schema_type>
```

**Supported schema types:**

| Type | Description |
|---|---|
| `keyword` | Exact string match (e.g. status, category) |
| `integer` | Whole numbers |
| `float` | Decimal numbers |
| `bool` | Boolean values |
| `text` | Full-text search (enables `MATCH` operators) |
| `geo` | Geospatial coordinates |
| `datetime` | Date/time values |

**Examples:**

```sql
CREATE INDEX ON COLLECTION articles FOR category TYPE keyword
CREATE INDEX ON COLLECTION articles FOR year TYPE integer
CREATE INDEX ON COLLECTION articles FOR title TYPE text
CREATE INDEX ON COLLECTION articles FOR meta.author TYPE keyword
```

**Rules:**
- The collection must already exist. Raises an error otherwise.
- Indexes are idempotent — creating the same index twice succeeds silently.

---

## DROP COLLECTION — delete a collection

Permanently deletes a collection and **all points inside it**. This operation is irreversible.

```sql
DROP COLLECTION old_experiments
```

Raises an error if the collection does not exist.

---

## DELETE — remove points

Deletes one or more points from a collection by specific ID or by a `WHERE` filter.

**Syntax:**
```
DELETE FROM <collection_name> WHERE id = '<point_id>'
DELETE FROM <collection_name> WHERE id = <integer_id>
DELETE FROM <collection_name> WHERE <filter>
```

**Examples:**

```sql
-- Delete by UUID
DELETE FROM articles WHERE id = '3f2e1a4b-8c91-4d0e-b123-abc123def456'

-- Delete by integer ID
DELETE FROM articles WHERE id = 42

-- Delete all points matching a filter
DELETE FROM articles WHERE category = 'archived'

-- Delete with a compound filter
DELETE FROM articles WHERE year < 2020 AND status = 'draft'
```

**Notes:**
- If no points match the filter or ID, the operation succeeds silently with a count of 0.
- The collection itself must exist; deleting from a non-existent collection raises an error.
166 changes: 166 additions & 0 deletions docs/filters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# WHERE Clause Filters

The `WHERE` clause lets you filter on any payload field using SQL-style predicates. All standard comparison, range, membership, null-check, and full-text operators are supported.

`WHERE` works on `SEARCH`, `RECOMMEND`, and `DELETE` statements.

---

## Equality and inequality

```sql
-- Exact match
SEARCH articles SIMILAR TO 'ml' LIMIT 10 WHERE category = 'paper'

-- Not equal
SEARCH articles SIMILAR TO 'ml' LIMIT 10 WHERE status != 'draft'
```

---

## Range comparisons

```sql
SEARCH articles SIMILAR TO 'ai' LIMIT 5 WHERE score > 0.8
SEARCH articles SIMILAR TO 'ai' LIMIT 5 WHERE year < 2024
SEARCH articles SIMILAR TO 'ai' LIMIT 5 WHERE score >= 0.75
SEARCH articles SIMILAR TO 'ai' LIMIT 5 WHERE year <= 2023
```

---

## BETWEEN … AND

```sql
-- Inclusive range (equivalent to year >= 2018 AND year <= 2023)
SEARCH articles SIMILAR TO 'history of ai' LIMIT 10 WHERE year BETWEEN 2018 AND 2023
```

---

## IN and NOT IN

```sql
SEARCH articles SIMILAR TO 'retrieval' LIMIT 10 WHERE status IN ('published', 'reviewed')
SEARCH articles SIMILAR TO 'retrieval' LIMIT 10 WHERE status NOT IN ('deleted', 'archived')
```

---

## IS NULL and IS NOT NULL

```sql
SEARCH articles SIMILAR TO 'peer review' LIMIT 5 WHERE reviewer IS NULL
SEARCH articles SIMILAR TO 'peer review' LIMIT 5 WHERE reviewer IS NOT NULL
```

---

## IS EMPTY and IS NOT EMPTY

```sql
SEARCH articles SIMILAR TO 'untagged' LIMIT 5 WHERE tags IS EMPTY
SEARCH articles SIMILAR TO 'categorized' LIMIT 5 WHERE tags IS NOT EMPTY
```

---

## Full-text MATCH

```sql
-- All terms must appear in the field (requires a Qdrant full-text index)
SEARCH articles SIMILAR TO 'search' LIMIT 10 WHERE title MATCH 'vector database'

-- Any term can match
SEARCH articles SIMILAR TO 'search' LIMIT 10 WHERE title MATCH ANY 'embedding retrieval'

-- Exact phrase must appear
SEARCH articles SIMILAR TO 'search' LIMIT 10 WHERE title MATCH PHRASE 'semantic search'
```

> To use `MATCH` operators efficiently, create a full-text index first:
> ```sql
> CREATE INDEX ON COLLECTION articles FOR title TYPE text
> ```

---

## AND, OR, NOT — logical operators

Operator precedence: `NOT` (highest) > `AND` > `OR` (lowest). Use parentheses to override.

```sql
-- AND: both conditions must be true
SEARCH articles SIMILAR TO 'nlp' LIMIT 10 WHERE category = 'paper' AND year >= 2020

-- OR: either condition can be true
SEARCH articles SIMILAR TO 'llm' LIMIT 10 WHERE source = 'arxiv' OR source = 'pubmed'

-- NOT: negate a condition
SEARCH articles SIMILAR TO 'benchmark' LIMIT 10 WHERE NOT status = 'draft'

-- Parentheses to group OR inside AND
SEARCH articles SIMILAR TO 'conference paper' LIMIT 10
WHERE (source = 'arxiv' OR source = 'ieee') AND year >= 2022

-- NOT on a parenthesized group
SEARCH articles SIMILAR TO 'x' LIMIT 5 WHERE NOT (status = 'draft' OR status = 'deleted')
```

---

## Dot-notation for nested fields

```sql
SEARCH articles SIMILAR TO 'wikipedia' LIMIT 5 WHERE meta.source = 'web'
SEARCH cities SIMILAR TO 'large city' LIMIT 5 WHERE country.cities[].population > 1000000
```

---

## WHERE also works in hybrid mode

```sql
SEARCH articles SIMILAR TO 'deep learning' LIMIT 10
USING HYBRID WHERE year BETWEEN 2020 AND 2024 AND status = 'published'
```

---

## WHERE in DELETE

```sql
-- Delete by filter
DELETE FROM articles WHERE category = 'archived'

-- Delete with compound filter
DELETE FROM articles WHERE year < 2020 AND status = 'draft'
```

---

## Full filter reference

| WHERE syntax | Description |
|---|---|
| `field = 'x'` | Exact match |
| `field != 'x'` | Not equal |
| `field > n` | Greater than |
| `field >= n` | Greater than or equal |
| `field < n` | Less than |
| `field <= n` | Less than or equal |
| `field BETWEEN a AND b` | Inclusive range |
| `field IN ('a', 'b')` | Value in list |
| `field NOT IN ('a', 'b')` | Value not in list |
| `field IS NULL` | Field absent or null |
| `field IS NOT NULL` | Field present and non-null |
| `field IS EMPTY` | Field is an empty list |
| `field IS NOT EMPTY` | Field is a non-empty list |
| `field MATCH 'text'` | All terms present (full-text) |
| `field MATCH ANY 'text'` | Any term present (full-text) |
| `field MATCH PHRASE 'text'` | Exact phrase present (full-text) |
| `A AND B` | Both conditions must hold |
| `A OR B` | Either condition must hold |
| `NOT A` | Condition must not hold |
| `(A OR B) AND C` | Parentheses for grouping |
| `meta.source = 'x'` | Dot-notation nested field |
Loading
Loading