pavanjava · srimon12 · May 16, 2026 · May 16, 2026 · May 16, 2026
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@
 [![PyPI version](https://img.shields.io/pypi/v/qql-cli?color=blue&label=PyPI)](https://pypi.org/project/qql-cli/)
 [![Python 3.12+](https://img.shields.io/pypi/pyversions/qql-cli)](https://pypi.org/project/qql-cli/)
 [![MIT License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
-[![Tests](https://img.shields.io/badge/tests-500%20passing-brightgreen)](tests/)
+[![Tests](https://img.shields.io/badge/tests-549%20passing-brightgreen)](tests/)
 
 Write `INSERT`, `SELECT`, `SEARCH`, `SCROLL`, `RECOMMEND`, `UPDATE`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, grouped search (GROUP BY), cross-encoder reranking, quantization (scalar, turbo, binary, product), SQL-style `WHERE` filters, script execution, and collection dump/restore.
 
@@ -50,6 +50,18 @@ Your query string
 
 When you run `INSERT`, the `text` field is automatically converted into a dense vector using [Fastembed](https://github.com/qdrant/fastembed). In **hybrid mode** (`USING HYBRID`), a sparse BM25 vector is also generated alongside the dense vector, and searches use Qdrant's Reciprocal Rank Fusion (RRF) by default to merge the results of both retrieval methods. You can switch hybrid search to DBSF with `FUSION 'dbsf'`.
 
+QQL also exposes a **programmatic API** for use inside Python applications — no CLI required:
+
+```python
+from qql import Connection
+
+with Connection("http://localhost:6333") as conn:
+    conn.run_query("INSERT INTO COLLECTION notes VALUES {'text': 'Qdrant is fast'}")
+    result = conn.run_query("SEARCH notes SIMILAR TO 'vector database' LIMIT 5")
+    for hit in result.data:
+        print(hit["score"], hit["payload"])
+```
+
 ---
 
 ## Installation
@@ -86,7 +98,7 @@ Full documentation lives in the [`docs/`](docs/) folder and at **[pavanjava.gith
 | [WHERE Filters](docs/filters.md) | Full SQL-style filter operators |
 | [Collections & Quantization](docs/collections.md) | SHOW, CREATE, DROP, QUANTIZE (scalar/turbo/binary/product), CREATE INDEX, UPDATE VECTOR, UPDATE PAYLOAD |
 | [Scripts: EXECUTE / DUMP](docs/scripts.md) | Script files, collection backup/restore |
-| [Programmatic Usage](docs/programmatic.md) | Use QQL as a Python library |
+| [Programmatic Usage](docs/programmatic.md) | Use QQL as a Python library via `Connection` or `run_query()` |
 | [Reference: Models / Config / Errors](docs/reference.md) | Embedding models, config file, error reference |
 
 ---
@@ -166,7 +178,7 @@ Tests do not require a running Qdrant instance — the Qdrant client is mocked.
 pytest tests/ -v
 ```
 
-Expected: **500 tests passing**.
+Expected: **549 tests passing**.
 
 ---
 

diff --git a/docs/programmatic.md b/docs/programmatic.md
@@ -4,98 +4,203 @@ QQL can be used as a Python library without the CLI.
 
 ---
 
-## `run_query()` — high-level API
+## `Connection` — Primary API
+
+`Connection` is the recommended way to use QQL programmatically. It opens a
+single connection to Qdrant once and reuses it for every `run_query()` call —
+more efficient than the legacy `run_query()` function, which creates a new
+client on every invocation.
+
+### Basic usage
 
 ```python
-from qql import run_query
+from qql import Connection
+
+conn = Connection("http://localhost:6333")
 
 # Insert a document (dense-only)
-result = run_query(
-    "INSERT INTO COLLECTION notes VALUES {'text': 'hello world', 'author': 'alice', 'year': 2024}",
-    url="http://localhost:6333",
+result = conn.run_query(
+    "INSERT INTO COLLECTION notes VALUES {'text': 'hello world', 'author': 'alice', 'year': 2024}"
 )
 print(result.message)   # "Inserted 1 point [<id>]"
-print(result.data)      # {"id": 1001 or "<uuid>", "collection": "notes"}
+print(result.data)      # {"id": "<uuid>", "collection": "notes"}
 
-# Insert with hybrid vectors
-result = run_query(
-    "INSERT INTO COLLECTION notes VALUES {'text': 'hello world'} USING HYBRID",
-    url="http://localhost:6333",
-)
-print(result.message)   # "Inserted 1 point [<id>] (hybrid)"
-
-# Dense search with WHERE filter
-result = run_query(
-    "SEARCH notes SIMILAR TO 'hello' LIMIT 5 WHERE year >= 2023 AND author != 'bot'",
-    url="http://localhost:6333",
+# Search
+result = conn.run_query(
+    "SEARCH notes SIMILAR TO 'hello' LIMIT 5 WHERE year >= 2023"
 )
 for hit in result.data:
     print(hit["score"], hit["payload"])
 
-# Hybrid search with WHERE filter
-result = run_query(
-    "SEARCH notes SIMILAR TO 'hello' LIMIT 5 USING HYBRID WHERE year >= 2023",
-    url="http://localhost:6333",
-)
-for hit in result.data:
-    print(hit["score"], hit["payload"])
+conn.close()
+```
 
-# Scroll / pagination
-result = run_query(
-    "SCROLL FROM notes LIMIT 2",
-    url="http://localhost:6333",
-)
-for point in result.data["points"]:
-    print(point["id"], point["payload"])
-print(result.data["next_offset"])
+### Context manager (preferred)
 
-# Bulk insert (all records embedded and upserted in one call)
-result = run_query(
-    """INSERT BULK INTO COLLECTION notes VALUES [
-      {'id': 1, 'text': 'first document', 'year': 2023},
-      {'id': 2, 'text': 'second document', 'year': 2024}
-    ]""",
-    url="http://localhost:6333",
-)
-print(result.message)   # "Inserted 2 points"
+The context manager guarantees the HTTP connection pool is released even if an
+exception occurs:
 
-# Recommend similar points using known IDs as positive examples
-result = run_query(
-    "RECOMMEND FROM notes POSITIVE IDS (1, 2) NEGATIVE IDS (3) LIMIT 5",
-    url="http://localhost:6333",
-)
-for hit in result.data:
-    print(hit["score"], hit["payload"])
+```python
+from qql import Connection
 
-# Retrieve a point by ID
-result = run_query(
-    "SELECT * FROM notes WHERE id = 1",
-    url="http://localhost:6333",
-)
-print(result.data)      # {"id": "1", "payload": {...}}
+with Connection("http://localhost:6333") as conn:
+    # All queries share the same connection
+    conn.run_query(
+        "INSERT INTO COLLECTION notes VALUES {'text': 'hello world'} USING HYBRID"
+    )
+    result = conn.run_query(
+        "SEARCH notes SIMILAR TO 'hello' LIMIT 5 USING HYBRID WHERE year >= 2023"
+    )
+    for hit in result.data:
+        print(hit["score"], hit["payload"])
+```
+
+### Qdrant Cloud
+
+```python
+from qql import Connection
+
+with Connection("https://<your-cluster>.qdrant.io", secret="<your-api-key>") as conn:
+    result = conn.run_query("SHOW COLLECTIONS")
+    print(result.data)
+```
+
+### Custom embedding model
 
-# Delete by filter
+```python
+from qql import Connection
+
+with Connection(
+    "http://localhost:6333",
+    default_model="BAAI/bge-base-en-v1.5",
+) as conn:
+    conn.run_query(
+        "INSERT INTO COLLECTION articles VALUES {'text': 'Attention is all you need'}"
+    )
+```
+
+### All statement examples
+
+```python
+from qql import Connection
+
+with Connection("http://localhost:6333") as conn:
+
+    # Hybrid insert
+    conn.run_query(
+        "INSERT INTO COLLECTION notes VALUES {'text': 'hello world'} USING HYBRID"
+    )
+
+    # Dense search with WHERE filter
+    result = conn.run_query(
+        "SEARCH notes SIMILAR TO 'hello' LIMIT 5 WHERE year >= 2023 AND author != 'bot'"
+    )
+    for hit in result.data:
+        print(hit["score"], hit["payload"])
+
+    # Hybrid search
+    result = conn.run_query(
+        "SEARCH notes SIMILAR TO 'hello' LIMIT 5 USING HYBRID WHERE year >= 2023"
+    )
+
+    # Scroll / pagination
+    result = conn.run_query("SCROLL FROM notes LIMIT 2")
+    for point in result.data["points"]:
+        print(point["id"], point["payload"])
+    next_cursor = result.data["next_offset"]   # str | int | None
+
+    # Continue pagination
+    if next_cursor is not None:
+        result = conn.run_query(f"SCROLL FROM notes AFTER '{next_cursor}' LIMIT 2")
+
+    # Bulk insert
+    result = conn.run_query(
+        """INSERT BULK INTO COLLECTION notes VALUES [
+          {'id': 1, 'text': 'first document', 'year': 2023},
+          {'id': 2, 'text': 'second document', 'year': 2024}
+        ]"""
+    )
+    print(result.message)   # "Inserted 2 points"
+
+    # Recommend similar points
+    result = conn.run_query(
+        "RECOMMEND FROM notes POSITIVE IDS (1, 2) NEGATIVE IDS (3) LIMIT 5"
+    )
+    for hit in result.data:
+        print(hit["score"], hit["payload"])
+
+    # Retrieve a point by ID
+    result = conn.run_query("SELECT * FROM notes WHERE id = 1")
+    print(result.data)      # {"id": "1", "payload": {...}}
+
+    # Delete by filter
+    conn.run_query("DELETE FROM notes WHERE year < 2023")
+
+    # Inspect collection diagnostics
+    result = conn.run_query("SHOW COLLECTION notes")
+    print(result.data["topology"])         # "dense" or "hybrid"
+    print(result.data["vectors"])          # {"": {...}} or {"dense": {...}}
+    print(result.data["payload_schema"])   # field index info, or None
+```
+
+### `Connection` parameters
+
+| Parameter | Type | Default | Description |
+|---|---|---|---|
+| `url` | `str` | `"http://localhost:6333"` | Qdrant instance URL |
+| `secret` | `str \| None` | `None` | API key; `None` for unauthenticated |
+| `default_model` | `str \| None` | `None` → `sentence-transformers/all-MiniLM-L6-v2` | Dense embedding model used when no `USING MODEL` clause is given |
+
+### Power-user: `executor` property
+
+For low-level access to the pipeline, use `conn.executor` directly:
+
+```python
+from qql import Connection
+from qql.lexer import Lexer
+from qql.parser import Parser
+
+with Connection("http://localhost:6333") as conn:
+    tokens = Lexer().tokenize("SEARCH docs SIMILAR TO 'hello' LIMIT 5")
+    node = Parser(tokens).parse()
+    result = conn.executor.execute(node)
+```
+
+---
+
+## `run_query()` — Legacy one-shot API
+
+> **Note:** `run_query()` is kept for backward compatibility. It creates a new
+> `Connection` (and therefore a new `QdrantClient`) on every call. For
+> workloads that issue more than one query, use `Connection` instead.
+
+```python
+from qql import run_query
+
+# Insert a document
 result = run_query(
-    "DELETE FROM notes WHERE year < 2023",
+    "INSERT INTO COLLECTION notes VALUES {'text': 'hello world', 'author': 'alice', 'year': 2024}",
     url="http://localhost:6333",
 )
-print(result.message)   # "Deleted N point(s)"
+print(result.message)
 
-# Inspect collection diagnostics
+# Search
 result = run_query(
-    "SHOW COLLECTION notes",
+    "SEARCH notes SIMILAR TO 'hello' LIMIT 5 WHERE year >= 2023",
     url="http://localhost:6333",
 )
-print(result.data["topology"])         # "dense" or "hybrid"
-print(result.data["vectors"])          # {"": {...}} or {"dense": {...}, ...}
-print(result.data["payload_schema"])   # {"field": {"type": "keyword", ...}, ...} or None
+for hit in result.data:
+    print(hit["score"], hit["payload"])
 ```
 
+`run_query()` accepts the same `url`, `secret`, and `default_model` parameters
+as `Connection.__init__()`.
+
 ---
 
 ## Low-level pipeline API
 
-For more control, use the pipeline directly:
+For full control, use the Lexer → Parser → Executor pipeline directly:
 
 ```python
 from qdrant_client import QdrantClient
@@ -117,9 +222,12 @@ for hit in result.data:
     print(hit["score"], hit["payload"])
 ```
 
+This is equivalent to what `Connection` does internally, giving you full
+control over the client lifecycle and config.
+
 ---
 
-## ExecutionResult
+## `ExecutionResult`
 
 All operations return an `ExecutionResult`:
 

diff --git a/docs/reference.md b/docs/reference.md
@@ -133,9 +133,10 @@ qql/
 ├── pyproject.toml          # Package config; installs the `qql` CLI command
 ├── src/
 │   └── qql/
-│       ├── __init__.py     # Public API: run_query()
+│       ├── __init__.py     # Public API: Connection, run_query()
 │       ├── cli.py          # CLI entry point: connect, disconnect, execute, dump, REPL
 │       ├── config.py       # QQLConfig dataclass + ~/.qql/config.json I/O
+│       ├── connection.py   # Connection class — stateful programmatic API
 │       ├── exceptions.py   # QQLError, QQLSyntaxError, QQLRuntimeError
 │       ├── lexer.py        # Tokenizer: string → List[Token]
 │       ├── ast_nodes.py    # Frozen dataclasses for each statement and filter type
@@ -148,6 +149,7 @@ qql/
     ├── test_lexer.py       # Tokenizer unit tests
     ├── test_parser.py      # Parser unit tests
     ├── test_executor.py    # Executor unit tests (mocked Qdrant client)
+    ├── test_connection.py  # Connection class unit tests (mocked Qdrant client)
     ├── test_script.py      # Script runner unit tests
     └── test_dumper.py      # Dumper unit tests
 ```
@@ -162,7 +164,7 @@ Tests do not require a running Qdrant instance — the Qdrant client is mocked.
 pytest tests/ -v
 ```
 
-Expected output: **500 tests passing**.
+Expected output: **549 tests passing**.
 
 ---