diff --git a/README.md b/README.md index bbb9151..a54f25a 100644 --- a/README.md +++ b/README.md @@ -35,6 +35,7 @@ qql> SEARCH notes SIMILAR TO 'vector databases' LIMIT 5 USING HYBRID RERANK - [The QQL Shell](#the-qql-shell) - [All QQL Operations](#all-qql-operations) - [INSERT — add a point](#insert--add-a-point) + - [INSERT BULK — batch insert](#insert-bulk--batch-insert-multiple-points) - [SEARCH — find similar points](#search--find-similar-points) - [Query-Time Search Params (`EXACT`, `WITH`)](#query-time-search-params-exact-with) - [WHERE Clause Filters](#where-clause-filters) @@ -44,6 +45,9 @@ qql> SEARCH notes SIMILAR TO 'vector databases' LIMIT 5 USING HYBRID RERANK - [CREATE COLLECTION — create a collection](#create-collection--create-a-collection) - [DROP COLLECTION — delete a collection](#drop-collection--delete-a-collection) - [DELETE — remove a point](#delete--remove-a-point) +- [Script Files](#script-files) + - [EXECUTE — run a script file](#execute--run-a-qql-script-file) + - [DUMP COLLECTION — export to script](#dump-collection--export-collection-to-a-qql-script-file) - [Embedding Models](#embedding-models) - [Value Types in Dictionaries](#value-types-in-dictionaries) - [Configuration File](#configuration-file) @@ -838,6 +842,158 @@ To find a point's ID, run a SEARCH first and copy the ID from the results table. --- +## Script Files + +QQL supports reading from and writing to `.qql` script files, making it easy to automate bulk operations, seed databases, and back up collections. + +--- + +### EXECUTE — run a .qql script file + +Execute a file containing multiple QQL statements in sequence. Each statement is parsed and executed in order. `--` comments are stripped before parsing. + +**CLI usage:** +```bash +qql execute /path/to/script.qql + +# Stop on first error instead of continuing through all statements +qql execute /path/to/script.qql --stop-on-error +``` + +**In-shell usage (inside the QQL REPL):** +``` +qql> EXECUTE /path/to/script.qql +qql> \e /path/to/script.qql +``` + +**Script format:** + +```sql +-- This is a comment — the entire line is ignored +-- ============================================================ +-- QQL Script — populate articles collection +-- ============================================================ + +-- Step 1: create the collection +CREATE COLLECTION articles + +-- Step 2: bulk insert records +INSERT BULK INTO COLLECTION articles VALUES [ + {'text': 'Neural networks learn representations', 'year': 2023}, + {'text': 'Attention mechanisms in transformers', 'year': 2024} +] + +-- Step 3: verify +SHOW COLLECTIONS +``` + +**Rules:** +- `--` to end-of-line is a comment and is ignored (inline or full-line) +- Statements can span multiple lines (e.g. `INSERT BULK ... VALUES [...]`) +- Blank lines between statements are ignored +- By default all statements run even if one fails; use `--stop-on-error` to halt early + +**Example output:** +``` +Executing: /path/to/script.qql + +[1/3] CREATE COLLECTION articles + ✓ Collection 'articles' created (384-dimensional vectors, cosine distance) +[2/3] INSERT BULK INTO COLLECTION articles VALUES [ … + ✓ Inserted 2 points +[3/3] SHOW COLLECTIONS + ✓ 1 collection(s) found + +Done. 3/3 statement(s) succeeded. +``` + +--- + +### DUMP COLLECTION — export collection to a .qql script file + +Export every point in a collection to a `.qql` script file. The generated file is valid QQL — it can be re-imported with `qql execute` to restore or migrate the collection. Points are written in batches of 50 as `INSERT BULK` statements. + +**CLI usage:** +```bash +qql dump +``` + +**In-shell usage (inside the QQL REPL):** +``` +qql> DUMP COLLECTION +``` + +**Example:** +```bash +qql dump medical_records /tmp/medical_records.qql +``` + +``` +Dumping: 'medical_records' → /tmp/medical_records.qql + + Collection type : hybrid (dense + sparse) + Points : 41 + Batches : 1 (50 points/batch) + + [1/1] wrote 41 point(s) + +Done. 41 point(s) written. +``` + +**Generated file structure:** +```sql +-- ============================================================ +-- QQL Dump — collection: medical_records +-- Generated : 2026-04-19 14:32:11 +-- Points : 41 +-- Type : hybrid (dense + sparse) +-- Note : Re-importing re-embeds all text using the +-- configured model (see: qql connect). +-- ============================================================ + +CREATE COLLECTION medical_records HYBRID + +-- Batch 1 / 1 (records 1–41) +INSERT BULK INTO COLLECTION medical_records VALUES [ + { + 'text': 'Alzheimers disease is characterized by...', + 'title': 'Alzheimers Disease Overview', + 'department': 'neurology', + 'year': 2023, + 'peer_reviewed': true + }, + ... +] USING HYBRID + +-- ============================================================ +-- End of dump +-- Written : 41 +-- Skipped : 0 (no 'text' field) +-- ============================================================ +``` + +**Round-trip workflow — backup and restore:** +```bash +# 1. Dump the collection +qql dump medical_records backup.qql + +# 2. Drop it +qql> DROP COLLECTION medical_records + +# 3. Restore from the dump +qql execute backup.qql +``` + +**Rules and notes:** +- Points without a `'text'` payload field are **skipped** (counted in the footer comment). +- Hybrid collections produce `CREATE COLLECTION HYBRID` and `INSERT BULK ... USING HYBRID` statements. +- Dense collections produce plain `CREATE COLLECTION ` and `INSERT BULK` statements. +- All payload value types are preserved: strings, integers, floats, booleans (`true`/`false`), `null`, lists, and nested dicts. +- Re-importing re-embeds all text using your currently configured model — use the same model as the original collection to preserve semantic accuracy. +- Parent directories of the output path are created automatically. + +--- + ## Embedding Models QQL uses [Fastembed](https://github.com/qdrant/fastembed) to convert text into vectors locally — no external API call is needed. diff --git a/src/qql/cli.py b/src/qql/cli.py index 3b63d8e..0f7f996 100644 --- a/src/qql/cli.py +++ b/src/qql/cli.py @@ -56,6 +56,15 @@ [yellow]DELETE FROM[/yellow] [yellow]WHERE id =[/yellow] '' Delete a point by its ID. +Script files (in-shell): + [yellow]EXECUTE[/yellow] or [yellow]\\e[/yellow] + Run a .qql script file. Statements are executed in order. + Lines starting with [yellow]--[/yellow] are treated as comments and ignored. + + [yellow]DUMP[/yellow] or [yellow]DUMP COLLECTION[/yellow] + Export all points in a collection to a .qql script file. + The file can be re-imported with EXECUTE. + Keyboard shortcuts: ← → arrows move cursor within the current line ↑ ↓ arrows scroll through command history @@ -119,6 +128,109 @@ def disconnect() -> None: console.print("Disconnected. Config removed.") +# ── execute ──────────────────────────────────────────────────────────────────── + +@main.command() +@click.argument("file", type=click.Path(exists=True, readable=True)) +@click.option( + "--stop-on-error", + is_flag=True, + default=False, + help="Halt execution on the first statement error (default: continue all).", +) +def execute(file: str, stop_on_error: bool) -> None: + """Execute a .qql script file against the connected Qdrant instance. + + Lines beginning with -- are treated as comments and skipped. + Each QQL statement is executed in order and its result is printed. + """ + from qdrant_client import QdrantClient + + cfg = load_config() + if cfg is None: + err_console.print( + "[bold red]Not connected.[/bold red] " + "Run: [bold]qql connect --url [/bold]" + ) + sys.exit(1) + + try: + client = QdrantClient(url=cfg.url, api_key=cfg.secret) + client.get_collections() + except Exception as e: + err_console.print(f"[bold red]Connection failed:[/bold red] {e}") + sys.exit(1) + + from .executor import Executor + from .script import run_script + + executor = Executor(client, cfg) + console.print(f"[bold cyan]Executing:[/bold cyan] {file}\n") + + ok, fail = run_script(file, executor, console, err_console, stop_on_error) + total = ok + fail + + if fail == 0: + console.print( + f"\n[bold green]Done.[/bold green] " + f"{total}/{total} statement(s) succeeded." + ) + else: + console.print( + f"\n[bold yellow]Done.[/bold yellow] " + f"{ok}/{total} succeeded, [bold red]{fail} failed[/bold red]." + ) + sys.exit(1) + + +# ── dump ─────────────────────────────────────────────────────────────────────── + +@main.command() +@click.argument("collection") +@click.argument("output", type=click.Path()) +def dump(collection: str, output: str) -> None: + """Dump a collection to a .qql script file. + + OUTPUT is the path for the generated .qql file. + The file contains CREATE COLLECTION + INSERT BULK statements and can be + re-imported with: qql execute + """ + from qdrant_client import QdrantClient + + cfg = load_config() + if cfg is None: + err_console.print( + "[bold red]Not connected.[/bold red] " + "Run: [bold]qql connect --url [/bold]" + ) + sys.exit(1) + + try: + client = QdrantClient(url=cfg.url, api_key=cfg.secret) + client.get_collections() + except Exception as e: + err_console.print(f"[bold red]Connection failed:[/bold red] {e}") + sys.exit(1) + + from .dumper import dump_collection + + console.print( + f"[bold cyan]Dumping:[/bold cyan] '{collection}' → {output}\n" + ) + written, skipped = dump_collection(collection, output, client, console, err_console) + + if written == 0 and skipped == 0: + # collection not found — error already printed by dump_collection + sys.exit(1) + + console.print( + f"\n[bold green]Done.[/bold green] " + f"{written} point(s) written" + + (f", [yellow]{skipped} skipped[/yellow] (no 'text' field)" if skipped else "") + + f"." + ) + + # ── REPL ─────────────────────────────────────────────────────────────────────── def _launch_repl(cfg: QQLConfig) -> None: @@ -161,6 +273,62 @@ def _launch_repl(cfg: QQLConfig) -> None: console.print(HELP_TEXT) continue + # ── EXECUTE / \e — run a .qql script file ────────── + if low.startswith("execute ") or low.startswith("\\e "): + script_path = query.split(None, 1)[1].strip() + from .script import run_script + ok, fail = run_script(script_path, executor, console, err_console) + total = ok + fail + if fail == 0: + console.print( + f"[bold green]Done.[/bold green] " + f"{total}/{total} statement(s) succeeded." + ) + else: + console.print( + f"[bold yellow]Done.[/bold yellow] " + f"{ok}/{total} succeeded, [bold red]{fail} failed[/bold red]." + ) + continue + + # ── DUMP [COLLECTION] — export collection to .qql ── + # Accepts both: + # DUMP COLLECTION + # DUMP + if low.startswith("dump "): + parts = query.split(None, 3) # up to 4 tokens + if len(parts) >= 2 and parts[1].lower() == "collection": + # DUMP COLLECTION + if len(parts) < 4: + err_console.print( + "[bold red]Usage:[/bold red] DUMP COLLECTION " + ) + continue + coll_name, out_path = parts[2], parts[3] + else: + # DUMP + if len(parts) < 3: + err_console.print( + "[bold red]Usage:[/bold red] DUMP " + ) + continue + coll_name, out_path = parts[1], parts[2] + from .dumper import dump_collection + console.print( + f"[bold cyan]Dumping:[/bold cyan] '{coll_name}' → {out_path}\n" + ) + written, skipped = dump_collection( + coll_name, out_path, client, console, err_console + ) + if written > 0 or skipped == 0: + console.print( + f"[bold green]Done.[/bold green] " + f"{written} point(s) written" + + (f", [yellow]{skipped} skipped[/yellow] (no 'text' field)" if skipped else "") + + "." + ) + continue + _run_and_print(executor, query) diff --git a/src/qql/dumper.py b/src/qql/dumper.py new file mode 100644 index 0000000..e2b590b --- /dev/null +++ b/src/qql/dumper.py @@ -0,0 +1,210 @@ +"""QQL collection dumper — exports a Qdrant collection to a .qql script file. + +The generated file contains: + 1. A header comment with metadata + 2. CREATE COLLECTION [HYBRID] + 3. One INSERT BULK statement per batch of _DUMP_BATCH_SIZE points + 4. A footer comment with totals + +The file is valid QQL and can be re-executed with ``qql execute ``. +Points that lack a ``'text'`` payload field are skipped (with a warning +comment written into the file). +""" +from __future__ import annotations + +import math +from datetime import datetime +from pathlib import Path +from typing import Any + +from qdrant_client import QdrantClient +from rich.console import Console + +_DUMP_BATCH_SIZE = 50 + + +# ── Value serializer ────────────────────────────────────────────────────────── + + +def _serialize_value(v: Any) -> str: + """Recursively convert a Python payload value to valid QQL syntax.""" + if v is None: + return "null" + if v is True: + return "true" + if v is False: + return "false" + if isinstance(v, int): + return str(v) + if isinstance(v, float): + return repr(v) + if isinstance(v, str): + escaped = v.replace("\\", "\\\\").replace("'", "\\'") + return f"'{escaped}'" + if isinstance(v, list): + items = ", ".join(_serialize_value(i) for i in v) + return f"[{items}]" + if isinstance(v, dict): + return _serialize_dict(v, indent=4) + # Fallback: stringify + return f"'{v}'" + + +def _serialize_dict(d: dict[str, Any], indent: int = 4) -> str: + """Serialize a dict to a multi-line QQL ``{...}`` block.""" + pad = " " * indent + lines = ["{"] + items = list(d.items()) + for i, (key, value) in enumerate(items): + comma = "," if i < len(items) - 1 else "" + lines.append(f"{pad}'{key}': {_serialize_value(value)}{comma}") + lines.append("}") + return "\n".join(lines) + + +# ── Collection type detection ───────────────────────────────────────────────── + + +def _is_hybrid(collection: str, client: QdrantClient) -> bool: + """Return True if the collection uses named vectors (dense + sparse).""" + info = client.get_collection(collection) + vectors = info.config.params.vectors # type: ignore[union-attr] + return isinstance(vectors, dict) + + +# ── Main entry point ────────────────────────────────────────────────────────── + + +def dump_collection( + collection: str, + output_path: str, + client: QdrantClient, + console: Console, + err_console: Console, +) -> tuple[int, int]: + """Export every point in *collection* to a .qql script at *output_path*. + + Returns ``(points_written, points_skipped)`` counts. + Points without a ``'text'`` key are skipped and counted in *points_skipped*. + """ + if not client.collection_exists(collection): + err_console.print( + f"[bold red]Error:[/bold red] Collection '{collection}' does not exist." + ) + return 0, 0 + + hybrid = _is_hybrid(collection, client) + col_type = "hybrid (dense + sparse)" if hybrid else "dense" + using_clause = " USING HYBRID" if hybrid else "" + + # ── First pass: count total points for the header ───────────────────── + count_info = client.count(collection_name=collection, exact=True) + total_points = count_info.count + total_batches = max(1, math.ceil(total_points / _DUMP_BATCH_SIZE)) + + console.print( + f" Collection type : [cyan]{col_type}[/cyan]\n" + f" Points : [cyan]{total_points}[/cyan]\n" + f" Batches : [cyan]{total_batches}[/cyan] " + f"([dim]{_DUMP_BATCH_SIZE} points/batch[/dim])\n" + ) + + out = Path(output_path) + out.parent.mkdir(parents=True, exist_ok=True) + + written = 0 + skipped = 0 + batch_num = 0 + + with out.open("w", encoding="utf-8") as f: + # ── Header comment ──────────────────────────────────────────────── + ts = datetime.now().strftime("%Y-%m-%d %H:%M:%S") + f.write( + f"-- ============================================================\n" + f"-- QQL Dump — collection: {collection}\n" + f"-- Generated : {ts}\n" + f"-- Points : {total_points}\n" + f"-- Type : {col_type}\n" + f"-- Note : Re-importing re-embeds all text using the\n" + f"-- configured model (see: qql connect).\n" + f"-- ============================================================\n" + f"\n" + ) + + # ── CREATE statement ────────────────────────────────────────────── + hybrid_suffix = " HYBRID" if hybrid else "" + f.write(f"CREATE COLLECTION {collection}{hybrid_suffix}\n\n") + + # ── Paginate and write INSERT BULK batches ──────────────────────── + offset = None + while True: + records, next_offset = client.scroll( + collection_name=collection, + limit=_DUMP_BATCH_SIZE, + offset=offset, + with_payload=True, + with_vectors=False, + ) + + if not records: + break + + batch_num += 1 + batch_start = (batch_num - 1) * _DUMP_BATCH_SIZE + 1 + batch_end = batch_start + len(records) - 1 + + # Filter points that have a 'text' field + valid = [] + for rec in records: + payload = rec.payload or {} + if "text" not in payload: + skipped += 1 + continue + valid.append(payload) + + if valid: + f.write( + f"-- Batch {batch_num} / {total_batches}" + f" (records {batch_start}–{batch_end})\n" + ) + f.write( + f"INSERT BULK INTO COLLECTION {collection} VALUES [\n" + ) + for i, payload in enumerate(valid): + dict_str = _serialize_dict(payload, indent=4) + # Indent the entire dict block by 2 spaces + indented = "\n".join( + " " + line for line in dict_str.splitlines() + ) + comma = "," if i < len(valid) - 1 else "" + f.write(f"{indented}{comma}\n") + written += 1 + f.write(f"]{using_clause}\n\n") + else: + # All records in this batch were skipped + f.write( + f"-- Batch {batch_num} / {total_batches}" + f" (records {batch_start}–{batch_end})" + f" — all skipped (no 'text' field)\n\n" + ) + + console.print( + f" [dim][[{batch_num}/{total_batches}]][/dim] " + f"wrote {len(valid)} point(s)" + + (f", skipped {len(records) - len(valid)}" if len(records) != len(valid) else "") + ) + + if next_offset is None: + break + offset = next_offset + + # ── Footer comment ──────────────────────────────────────────────── + f.write( + f"-- ============================================================\n" + f"-- End of dump\n" + f"-- Written : {written}\n" + f"-- Skipped : {skipped} (no 'text' field)\n" + f"-- ============================================================\n" + ) + + return written, skipped diff --git a/src/qql/script.py b/src/qql/script.py new file mode 100644 index 0000000..9b138a1 --- /dev/null +++ b/src/qql/script.py @@ -0,0 +1,156 @@ +"""QQL script runner — executes .qql files containing multiple statements. + +Pipeline: + 1. strip_comments() — remove -- … to-end-of-line comments + 2. split_statements() — tokenize once, split on statement-starter + keywords at brace/bracket depth 0 + 3. run_script() — parse + execute each chunk, print progress +""" +from __future__ import annotations + +from pathlib import Path + +from rich.console import Console + +from .exceptions import QQLError +from .executor import Executor +from .lexer import Lexer, Token, TokenKind +from .parser import Parser + +# ── Token sets ──────────────────────────────────────────────────────────────── + +_STMT_STARTERS = { + TokenKind.INSERT, + TokenKind.CREATE, + TokenKind.DROP, + TokenKind.SHOW, + TokenKind.SEARCH, + TokenKind.DELETE, +} + +_DEPTH_OPEN = {TokenKind.LBRACE, TokenKind.LBRACKET, TokenKind.LPAREN} +_DEPTH_CLOSE = {TokenKind.RBRACE, TokenKind.RBRACKET, TokenKind.RPAREN} + +# ── Public helpers ──────────────────────────────────────────────────────────── + + +def strip_comments(text: str) -> str: + """Remove ``-- ...`` to-end-of-line comments from every line. + + The check is byte-level: ``--`` inside a string literal would also be + stripped, but that edge case does not occur in practice for QQL scripts. + """ + lines: list[str] = [] + for line in text.splitlines(): + idx = line.find("--") + if idx != -1: + line = line[:idx] + lines.append(line) + return "\n".join(lines) + + +def split_statements(tokens: list[Token]) -> list[list[Token]]: + """Split a flat token list into per-statement chunks. + + A new chunk begins whenever a statement-starter keyword (INSERT, CREATE, + DROP, SHOW, SEARCH, DELETE) is encountered at brace/bracket/paren depth 0. + The EOF sentinel is consumed and never included in any chunk. + """ + chunks: list[list[Token]] = [] + current: list[Token] = [] + depth = 0 + + for tok in tokens: + if tok.kind == TokenKind.EOF: + break + if tok.kind in _DEPTH_OPEN: + depth += 1 + elif tok.kind in _DEPTH_CLOSE: + depth -= 1 + + # New statement starts when we see a starter at the top level + if tok.kind in _STMT_STARTERS and depth == 0 and current: + chunks.append(current) + current = [] + + current.append(tok) + + if current: + chunks.append(current) + + return chunks + + +def _stmt_label(chunk: list[Token], max_len: int = 70) -> str: + """Build a short human-readable label from a statement's token list.""" + parts: list[str] = [] + total = 0 + for tok in chunk: + word = tok.value if tok.kind != TokenKind.STRING else f"'{tok.value}'" + if total + len(word) + 1 > max_len: + parts.append("…") + break + parts.append(word) + total += len(word) + 1 + return " ".join(parts) + + +# ── Main entry point ────────────────────────────────────────────────────────── + + +def run_script( + path: str, + executor: Executor, + console: Console, + err_console: Console, + stop_on_error: bool = False, +) -> tuple[int, int]: + """Parse and execute every statement in *path*. + + Returns ``(succeeded, failed)`` counts. + Prints per-statement progress to *console* / *err_console*. + If *stop_on_error* is True, halts on the first failure. + """ + try: + text = Path(path).read_text(encoding="utf-8") + except OSError as e: + err_console.print(f"[bold red]Cannot read file:[/bold red] {e}") + return 0, 1 + + cleaned = strip_comments(text) + tokens = Lexer().tokenize(cleaned) + chunks = split_statements(tokens) + + if not chunks: + console.print("[yellow]No statements found in script.[/yellow]") + return 0, 0 + + n = len(chunks) + succeeded = 0 + failed = 0 + eof_tok = Token(TokenKind.EOF, "", 0) + + for i, chunk in enumerate(chunks, 1): + label = _stmt_label(chunk) + console.print(f"[dim][[{i}/{n}]][/dim] {label}") + + try: + node = Parser(chunk + [eof_tok]).parse() + result = executor.execute(node) + except QQLError as e: + err_console.print(f" [bold red]✗[/bold red] {e}") + failed += 1 + if stop_on_error: + break + continue + except Exception as e: + err_console.print(f" [bold red]✗ Unexpected error:[/bold red] {e}") + failed += 1 + if stop_on_error: + break + continue + + console.print(f" [bold green]✓[/bold green] {result.message}") + succeeded += 1 + + return succeeded, failed diff --git a/tests/test_dumper.py b/tests/test_dumper.py new file mode 100644 index 0000000..feb556a --- /dev/null +++ b/tests/test_dumper.py @@ -0,0 +1,224 @@ +"""Tests for the QQL collection dumper (src/qql/dumper.py).""" +from __future__ import annotations + +import pytest +from rich.console import Console + +from qql.dumper import ( + _DUMP_BATCH_SIZE, + _is_hybrid, + _serialize_dict, + _serialize_value, + dump_collection, +) + + +# ── Helpers ─────────────────────────────────────────────────────────────────── + + +def null_console() -> Console: + return Console(quiet=True) + + +def _make_record(mocker, payload: dict): + """Create a mock Qdrant ScoredPoint / Record with the given payload.""" + rec = mocker.MagicMock() + rec.payload = payload + return rec + + +def _make_client(mocker, *, exists=True, hybrid=False, points=None, total=None): + """Build a mock QdrantClient for dump tests. + + *points* is a list of payload dicts. scroll() returns them all in one + batch when len(points) <= _DUMP_BATCH_SIZE, else two batches. + """ + points = points or [] + client = mocker.MagicMock() + client.collection_exists.return_value = exists + + # get_collection — return hybrid or dense vector config + if hybrid: + client.get_collection.return_value.config.params.vectors = {"dense": object()} + else: + # non-dict → dense-only + client.get_collection.return_value.config.params.vectors = mocker.MagicMock( + spec=[] # not a dict + ) + + # count + cnt = mocker.MagicMock() + cnt.count = total if total is not None else len(points) + client.count.return_value = cnt + + # scroll — single-batch by default + records = [mocker.MagicMock(payload=p) for p in points] + client.scroll.return_value = (records, None) + + return client + + +# ── _serialize_value ────────────────────────────────────────────────────────── + + +class TestSerializeValue: + def test_string(self): + assert _serialize_value("hello world") == "'hello world'" + + def test_string_escapes_single_quote(self): + assert _serialize_value("it's") == r"'it\'s'" + + def test_string_escapes_backslash(self): + assert _serialize_value("a\\b") == "'a\\\\b'" + + def test_int(self): + assert _serialize_value(42) == "42" + + def test_negative_int(self): + assert _serialize_value(-7) == "-7" + + def test_float(self): + result = _serialize_value(3.14) + assert "3.14" in result + + def test_bool_true(self): + assert _serialize_value(True) == "true" + + def test_bool_false(self): + assert _serialize_value(False) == "false" + + def test_none(self): + assert _serialize_value(None) == "null" + + def test_list(self): + assert _serialize_value([1, 2, 3]) == "[1, 2, 3]" + + def test_nested_list_of_strings(self): + result = _serialize_value(["a", "b"]) + assert result == "['a', 'b']" + + def test_dict_produces_braces(self): + result = _serialize_value({"key": "val"}) + assert "{" in result and "}" in result + assert "'key'" in result + assert "'val'" in result + + +# ── _is_hybrid ──────────────────────────────────────────────────────────────── + + +class TestIsHybrid: + def test_dict_vectors_is_hybrid(self, mocker): + client = mocker.MagicMock() + client.get_collection.return_value.config.params.vectors = {"dense": object()} + assert _is_hybrid("col", client) is True + + def test_scalar_vectors_is_not_hybrid(self, mocker): + client = mocker.MagicMock() + client.get_collection.return_value.config.params.vectors = mocker.MagicMock( + spec=[] + ) + assert _is_hybrid("col", client) is False + + +# ── dump_collection ─────────────────────────────────────────────────────────── + + +class TestDumpCollection: + def test_creates_output_file(self, tmp_path, mocker): + out = str(tmp_path / "dump.qql") + client = _make_client(mocker, points=[{"text": "hello"}]) + dump_collection("col", out, client, null_console(), null_console()) + assert (tmp_path / "dump.qql").exists() + + def test_writes_create_statement_dense(self, tmp_path, mocker): + out = str(tmp_path / "dump.qql") + client = _make_client(mocker, points=[{"text": "hello"}]) + dump_collection("my_col", out, client, null_console(), null_console()) + content = (tmp_path / "dump.qql").read_text() + assert "CREATE COLLECTION my_col\n" in content + assert "HYBRID" not in content.split("CREATE")[1].split("\n")[0] + + def test_writes_create_statement_hybrid(self, tmp_path, mocker): + out = str(tmp_path / "dump.qql") + client = _make_client(mocker, hybrid=True, points=[{"text": "hello"}]) + dump_collection("col", out, client, null_console(), null_console()) + content = (tmp_path / "dump.qql").read_text() + assert "CREATE COLLECTION col HYBRID" in content + + def test_hybrid_insert_bulk_has_using_hybrid(self, tmp_path, mocker): + out = str(tmp_path / "dump.qql") + client = _make_client(mocker, hybrid=True, points=[{"text": "hello"}]) + dump_collection("col", out, client, null_console(), null_console()) + content = (tmp_path / "dump.qql").read_text() + assert "] USING HYBRID" in content + + def test_dense_insert_bulk_has_no_using_clause(self, tmp_path, mocker): + out = str(tmp_path / "dump.qql") + client = _make_client(mocker, points=[{"text": "hello"}]) + dump_collection("col", out, client, null_console(), null_console()) + content = (tmp_path / "dump.qql").read_text() + assert "USING HYBRID" not in content + + def test_skips_points_without_text_field(self, tmp_path, mocker): + out = str(tmp_path / "dump.qql") + points = [{"text": "ok"}, {"author": "no_text_here"}, {"text": "also ok"}] + client = _make_client(mocker, points=points) + written, skipped = dump_collection("col", out, client, null_console(), null_console()) + assert written == 2 + assert skipped == 1 + content = (tmp_path / "dump.qql").read_text() + assert "no_text_here" not in content + + def test_returns_zero_when_collection_missing(self, tmp_path, mocker): + out = str(tmp_path / "dump.qql") + client = _make_client(mocker, exists=False) + written, skipped = dump_collection("missing", out, client, null_console(), null_console()) + assert written == 0 + assert skipped == 0 + + def test_payload_values_serialized_correctly(self, tmp_path, mocker): + out = str(tmp_path / "dump.qql") + payload = {"text": "hello", "year": 2024, "active": True, "score": 0.9} + client = _make_client(mocker, points=[payload]) + dump_collection("col", out, client, null_console(), null_console()) + content = (tmp_path / "dump.qql").read_text() + assert "'year': 2024" in content + assert "'active': true" in content + assert "'score':" in content + + def test_batches_multiple_scroll_pages(self, tmp_path, mocker): + """When scroll returns two pages, two INSERT BULK blocks should be written.""" + out = str(tmp_path / "dump.qql") + client = mocker.MagicMock() + client.collection_exists.return_value = True + client.get_collection.return_value.config.params.vectors = mocker.MagicMock(spec=[]) + cnt = mocker.MagicMock() + cnt.count = _DUMP_BATCH_SIZE + 1 + client.count.return_value = cnt + + batch1 = [mocker.MagicMock(payload={"text": f"doc {i}"}) for i in range(_DUMP_BATCH_SIZE)] + batch2 = [mocker.MagicMock(payload={"text": "last doc"})] + # First scroll call returns batch1 with a non-None offset; second returns batch2 + None + client.scroll.side_effect = [ + (batch1, "some_offset"), + (batch2, None), + ] + + written, skipped = dump_collection("col", out, client, null_console(), null_console()) + content = (tmp_path / "dump.qql").read_text() + assert written == _DUMP_BATCH_SIZE + 1 + assert content.count("INSERT BULK") == 2 + + def test_header_contains_collection_name(self, tmp_path, mocker): + out = str(tmp_path / "dump.qql") + client = _make_client(mocker, points=[{"text": "x"}]) + dump_collection("medical_records", out, client, null_console(), null_console()) + content = (tmp_path / "dump.qql").read_text() + assert "medical_records" in content.split("QQL Dump")[1] + + def test_output_file_created_in_nested_directory(self, tmp_path, mocker): + out = str(tmp_path / "sub" / "dir" / "dump.qql") + client = _make_client(mocker, points=[{"text": "x"}]) + dump_collection("col", out, client, null_console(), null_console()) + assert (tmp_path / "sub" / "dir" / "dump.qql").exists() diff --git a/tests/test_script.py b/tests/test_script.py new file mode 100644 index 0000000..9b90330 --- /dev/null +++ b/tests/test_script.py @@ -0,0 +1,183 @@ +"""Tests for the QQL script runner (src/qql/script.py).""" +from __future__ import annotations + +import pytest +from rich.console import Console + +from qql.ast_nodes import CreateCollectionStmt, InsertBulkStmt +from qql.exceptions import QQLRuntimeError +from qql.executor import ExecutionResult +from qql.lexer import Lexer +from qql.script import run_script, split_statements, strip_comments + + +# ── Helpers ─────────────────────────────────────────────────────────────────── + +def tokenize(text: str): + return Lexer().tokenize(text) + + +def null_console() -> Console: + """A Console that writes to /dev/null — suppresses output in tests.""" + return Console(quiet=True) + + +# ── strip_comments ──────────────────────────────────────────────────────────── + +class TestStripComments: + def test_removes_full_line_comment(self): + result = strip_comments("-- this is a comment\nCREATE COLLECTION x") + assert "-- this" not in result + assert "CREATE" in result + + def test_removes_inline_comment(self): + result = strip_comments("CREATE COLLECTION x -- inline note") + assert "-- inline" not in result + assert "CREATE COLLECTION x" in result + + def test_preserves_non_comment_lines(self): + text = "CREATE COLLECTION x\nSHOW COLLECTIONS" + assert strip_comments(text) == text + + def test_empty_string(self): + assert strip_comments("") == "" + + def test_only_comments(self): + result = strip_comments("-- line 1\n-- line 2") + assert "line" not in result + + def test_comment_at_start_of_line(self): + result = strip_comments(" -- leading spaces then comment\nDROP COLLECTION x") + assert "DROP" in result + assert "leading" not in result + + +# ── split_statements ────────────────────────────────────────────────────────── + +class TestSplitStatements: + def test_single_statement(self): + tokens = tokenize("CREATE COLLECTION x") + chunks = split_statements(tokens) + assert len(chunks) == 1 + + def test_two_statements(self): + tokens = tokenize("CREATE COLLECTION x\nSHOW COLLECTIONS") + chunks = split_statements(tokens) + assert len(chunks) == 2 + + def test_three_statements(self): + tokens = tokenize( + "CREATE COLLECTION x\n" + "INSERT INTO COLLECTION x VALUES {'text': 'hi'}\n" + "SHOW COLLECTIONS" + ) + chunks = split_statements(tokens) + assert len(chunks) == 3 + + def test_bulk_insert_not_split_inside_brackets(self): + """INSERT keyword inside a VALUES [...] array must NOT start a new chunk.""" + tokens = tokenize( + "INSERT BULK INTO COLLECTION x VALUES [\n" + " {'text': 'a'},\n" + " {'text': 'b'}\n" + "]\n" + "SHOW COLLECTIONS" + ) + chunks = split_statements(tokens) + # There should be exactly 2 chunks: INSERT BULK and SHOW COLLECTIONS + assert len(chunks) == 2 + + def test_empty_input(self): + tokens = tokenize("") + chunks = split_statements(tokens) + assert chunks == [] + + def test_first_chunk_starts_with_create(self): + tokens = tokenize("CREATE COLLECTION x\nDROP COLLECTION x") + chunks = split_statements(tokens) + from qql.lexer import TokenKind + assert chunks[0][0].kind == TokenKind.CREATE + assert chunks[1][0].kind == TokenKind.DROP + + +# ── run_script ──────────────────────────────────────────────────────────────── + +class TestRunScript: + @pytest.fixture + def script_file(self, tmp_path): + """Factory: write content to a temp .qql file and return its path.""" + def _make(content: str) -> str: + p = tmp_path / "test.qql" + p.write_text(content) + return str(p) + return _make + + @pytest.fixture + def mock_executor(self, mocker): + ex = mocker.MagicMock() + ex.execute.return_value = ExecutionResult(success=True, message="ok") + return ex + + def test_executes_all_statements(self, script_file, mock_executor): + path = script_file( + "CREATE COLLECTION x\n" + "SHOW COLLECTIONS\n" + ) + ok, fail = run_script(path, mock_executor, null_console(), null_console()) + assert mock_executor.execute.call_count == 2 + assert ok == 2 + assert fail == 0 + + def test_continues_on_error_by_default(self, script_file, mock_executor): + mock_executor.execute.side_effect = [ + ExecutionResult(success=True, message="ok"), + QQLRuntimeError("boom"), + ExecutionResult(success=True, message="ok"), + ] + path = script_file( + "CREATE COLLECTION x\n" + "DROP COLLECTION missing\n" + "SHOW COLLECTIONS\n" + ) + ok, fail = run_script(path, mock_executor, null_console(), null_console()) + assert ok == 2 + assert fail == 1 + assert mock_executor.execute.call_count == 3 + + def test_stops_on_error_when_flag_set(self, script_file, mock_executor): + mock_executor.execute.side_effect = [ + QQLRuntimeError("fail fast"), + ExecutionResult(success=True, message="ok"), + ] + path = script_file( + "CREATE COLLECTION x\n" + "SHOW COLLECTIONS\n" + ) + ok, fail = run_script( + path, mock_executor, null_console(), null_console(), stop_on_error=True + ) + assert fail == 1 + assert mock_executor.execute.call_count == 1 # stopped after first + + def test_empty_script_returns_zero_counts(self, script_file, mock_executor): + path = script_file("-- only comments\n\n") + ok, fail = run_script(path, mock_executor, null_console(), null_console()) + assert ok == 0 + assert fail == 0 + mock_executor.execute.assert_not_called() + + def test_comments_are_stripped(self, script_file, mock_executor): + path = script_file( + "-- header comment\n" + "CREATE COLLECTION x -- inline comment\n" + ) + ok, fail = run_script(path, mock_executor, null_console(), null_console()) + assert ok == 1 + assert fail == 0 + + def test_nonexistent_file_returns_failure(self, mock_executor): + ok, fail = run_script( + "/no/such/file.qql", mock_executor, null_console(), null_console() + ) + assert ok == 0 + assert fail == 1