diff --git a/.github/RFC_TEMPLATE.md b/.github/RFC_TEMPLATE.md new file mode 100644 index 0000000..08a7ba2 --- /dev/null +++ b/.github/RFC_TEMPLATE.md @@ -0,0 +1,83 @@ +# RFC: + +- **Status:** Draft / Proposed / Accepted / Rejected / Implemented +- **Author:** @github-username +- **Target:** Python / Go / Both +- **Created:** YYYY-MM-DD + +## Summary + +Describe the proposed change in one paragraph. + +## Motivation + +What problem does this solve? + +- Current pain: +- Who is affected: +- Why QQL should expose this: + +## Proposed Syntax + +```sql +-- Minimal example +NEW SYNTAX ... + +-- Example with optional clauses +NEW SYNTAX ... WHERE ... WITH { ... } +``` + +## Qdrant Mapping + +| QQL syntax | Qdrant API/model | +|---|---| +| `...` | `...` | + +If Qdrant does not directly support the behavior, explain why QQL should still add it. + +## Output + +Human-readable output: + +```text +... +``` + +JSON output: + +```json +{ + "success": true, + "message": "...", + "data": {} +} +``` + +## Compatibility + +- Does this break existing QQL scripts? +- Does this affect JSON output contracts? +- Should Python and Go match? +- Can one implementation ship first? + +## Implementation Plan + +1. Lexer/parser/AST changes +2. Executor changes +3. Tests +4. Documentation + +## Alternatives + +List simpler or competing designs and why they were not chosen. + +## Open Questions + +- Question 1 +- Question 2 + +## References + +- Qdrant docs: +- Related issues: +- Related PRs: diff --git a/ROADMAP.md b/ROADMAP.md new file mode 100644 index 0000000..4485b89 --- /dev/null +++ b/ROADMAP.md @@ -0,0 +1,92 @@ +# QQL Roadmap + +> **Status:** Draft for maintainer and community discussion +> **Scope:** Public direction for the Python `qql-cli` project, with companion notes for the Go implementation where parity matters +> **Principle:** Keep QQL small, readable, and close to Qdrant's real API surface + +QQL is a SQL-like language and CLI for common Qdrant workflows. The near-term goal is not to cover every Qdrant feature. The goal is to make everyday vector database work easier to read, script, test, and share. + +## Current Position + +The Python implementation already supports the core workflow: + +| Area | Status | +|---|---| +| Collection create/drop/list | Supported | +| Payload indexes | Supported | +| Insert and bulk insert | Supported | +| Dense search | Supported | +| Hybrid dense+sparse search | Supported | +| Sparse-only search | Supported | +| WHERE filters | Supported | +| Recommend by example IDs | Supported | +| Query-time search params | Supported | +| Reranking | Supported in Python | +| Delete by ID or filter | Supported | +| Script execution and dump/restore | Supported | +| Programmatic Python API | Supported through `run_query()` | + +The Go implementation is developed separately. It should aim for language and behavior parity where practical, but this Python repository should not block on Go work before improving its own CLI and documentation. + +## Near-Term Priorities + +These are the best candidates for small, useful contributions. Each one should have tests and documentation before being considered complete. + +| Priority | Feature | Why it matters | Suggested syntax | +|---|---|---|---| +| P0 | Get point by ID | Basic inspection is currently missing | `GET FROM WHERE id = ''` | +| P0 | Scroll points | Needed for real datasets and exports | `SCROLL LIMIT 100` | +| P0 | Search pagination | Needed for browsing result sets | `SEARCH ... LIMIT 10 OFFSET 20` | +| P1 | Count points | Useful for validation and scripts | `COUNT WHERE ` | +| P1 | Describe collection | Improves introspection and debugging | `DESCRIBE COLLECTION ` | +| P1 | Update payload | Avoids full reinsert for metadata changes | `UPDATE SET {...} WHERE id = ''` | +| P1 | Delete payload keys | Removes fields without deleting points | `DELETE PAYLOAD field FROM WHERE id = ''` | + +## Later Ideas + +These are worth exploring, but they should not distract from the smaller parity gaps above. + +| Area | Possible work | +|---|---| +| Retrieval quality | MMR, score boosting, named vector search, batch search | +| Collection configuration | Distance selection, HNSW config, quantization, on-disk payload | +| Developer experience | Connection profiles, clearer JSON contracts, better error messages | +| Ecosystem | Syntax highlighting, examples, tutorials, migration guides | +| Operations | Collection aliases, snapshots, backup/restore workflows | + +## Contribution Process + +Use an RFC when a change affects syntax, CLI behavior, or JSON output. Small documentation fixes, tests, and bug fixes do not need an RFC. + +Good roadmap issues should include: + +- the Qdrant API being exposed +- the proposed QQL syntax +- expected human-readable output +- expected JSON output +- Python tests required +- Go parity notes, if relevant + +## Documentation Goals + +The documentation should stay practical: + +- README: quick start and common usage +- `docs/syntax/`: compact syntax reference +- `docs/COMPATIBILITY.md`: checked feature matrix +- `docs/CONTRIBUTING.md`: contributor workflow +- `docs/RFCS/`: proposed and accepted syntax decisions +- `docs/TUTORIALS/`: runnable examples as they are added +- `docs/MIGRATING/`: focused migration notes + +## Success Criteria + +QQL is moving in the right direction when: + +- users can inspect, insert, search, recommend, update, count, and export without dropping to raw SDK calls for common cases +- syntax changes are discussed before implementation +- docs describe what is implemented today, not only what is planned +- Python and Go differences are visible and intentional +- contributors can find small, well-scoped issues + +This roadmap is intentionally modest. It should be revised as maintainers and contributors agree on scope. diff --git a/SHARED.md b/SHARED.md new file mode 100644 index 0000000..1b17ded --- /dev/null +++ b/SHARED.md @@ -0,0 +1,50 @@ +# Cross-Repository Documentation Notes + +QQL has a Python implementation (`qql-cli`) and a companion Go implementation (`qql-go`). Some documentation is useful to keep conceptually aligned across both projects, but each repository remains responsible for its own implementation details. + +This file is a coordination guide, not a hard synchronization system. + +## Shared in Spirit + +These documents should use the same terminology and avoid contradicting each other across implementations: + +| File | Purpose | +|---|---| +| `ROADMAP.md` | Project direction and priority areas | +| `.github/RFC_TEMPLATE.md` | Template for syntax and behavior proposals | +| `.github/ISSUE_LABELS.md` | Suggested issue label taxonomy | +| `docs/CONTRIBUTING.md` | Contributor workflow | +| `docs/SYNTAX_GUIDELINES.md` | How to add or change QQL syntax | +| `docs/COMPATIBILITY.md` | Feature matrix across Qdrant, Python, and Go | +| `docs/RFCS/README.md` | RFC process overview | + +## Repository-Specific + +These files should normally stay different: + +| Python `qql` | Go `qql-go` | Why | +|---|---|---| +| `README.md` | `README.md` | Different install, command, and release details | +| `pyproject.toml` | `go.mod` | Different package managers | +| `src/qql/` | Go source tree | Different implementations | +| `tests/` | Go tests | Different test frameworks | +| release notes | release notes | Different version history | + +## Update Guidance + +When a change affects the QQL language rather than one implementation: + +1. Update the local documentation. +2. Note whether the behavior is Python-only, Go-only, or shared. +3. If the companion implementation is affected, open or link a tracking issue there. +4. Avoid blocking one implementation's documentation on the other unless the feature requires true lockstep behavior. + +## Long-Term Options + +If cross-repo drift becomes painful, consider one of these later: + +- a small `qql-spec` repository for syntax and compatibility docs +- a CI check that compares selected docs between repos +- release notes that explicitly call out Python and Go parity gaps + +For now, keep the process lightweight and accurate. diff --git a/docs/COMPATIBILITY.md b/docs/COMPATIBILITY.md new file mode 100644 index 0000000..acb7e76 --- /dev/null +++ b/docs/COMPATIBILITY.md @@ -0,0 +1,138 @@ +# QQL / Qdrant Compatibility Matrix + +> Tracks the current Python `qql-cli` surface and known companion status for `qql-go`. +> Last checked: 2026-04-28. + +This document should describe implemented behavior conservatively. If a feature is planned but not implemented, keep it marked as missing until tests exist. + +## Legend + +| Symbol | Meaning | +|---|---| +| Supported | Implemented and covered by normal usage/tests | +| Partial | Implemented with known limits | +| Missing | Not currently exposed by QQL | +| Planned | Roadmap or RFC candidate | +| Unknown | Needs verification in that implementation | + +## Collection Management + +| Feature | Qdrant API | Python `qql-cli` | Go `qql-go` | Notes | +|---|---|---|---|---| +| Create collection | `create_collection` | Supported | Supported | Dense collection | +| Create hybrid collection | `create_collection` with sparse vectors | Supported | Supported | `CREATE COLLECTION ... HYBRID` | +| Create with custom distance | Vector params | Missing | Missing | Currently cosine-only in Python | +| Create with custom HNSW | `hnsw_config` | Missing | Missing | Roadmap candidate | +| Create with quantization | `quantization_config` | Missing | Missing | Roadmap candidate | +| Create with on-disk payload | `on_disk_payload` | Missing | Missing | Roadmap candidate | +| Create with multivectors | `multivector_config` | Missing | Missing | Advanced roadmap candidate | +| Drop collection | `delete_collection` | Supported | Supported | `DROP COLLECTION` | +| List collections | `get_collections` | Supported | Supported | `SHOW COLLECTIONS` | +| Collection info | `get_collection` | Missing | Missing | Proposed as `DESCRIBE COLLECTION` | +| Collection aliases | Alias APIs | Missing | Missing | Later idea | +| Collection snapshots | Snapshot APIs | Missing | Missing | Later idea | + +## Points / Documents + +| Feature | Qdrant API | Python `qql-cli` | Go `qql-go` | Notes | +|---|---|---|---|---| +| Insert point | `upsert` | Supported | Supported | Requires a `text` field for embedding | +| Insert bulk | `upsert` | Supported | Supported | `INSERT BULK` | +| Explicit point ID on insert | `upsert` | Supported | Supported | Integer or UUID string | +| Get point by ID | `retrieve` | Missing | Missing | Near-term roadmap candidate | +| Update payload | `set_payload` | Missing | Missing | Near-term roadmap candidate | +| Delete point by ID | `delete` | Supported | Supported | `DELETE ... WHERE id = ...` | +| Delete points by filter | `delete` with filter selector | Supported | Supported | Python parser/executor support non-ID filters | +| Delete payload keys | `delete_payload` | Missing | Missing | Near-term roadmap candidate | +| Count points | `count` | Missing | Missing | Near-term roadmap candidate | +| Scroll points | `scroll` | Missing | Missing | Near-term roadmap candidate | + +## Search + +| Feature | Qdrant API | Python `qql-cli` | Go `qql-go` | Notes | +|---|---|---|---|---| +| Dense search | `query_points` | Supported | Supported | Default mode | +| Hybrid search | `query_points` + RRF | Supported | Supported | `USING HYBRID` | +| Sparse-only search | `query_points` sparse vector | Supported | Supported | `USING SPARSE` | +| Exact search | `SearchParams.exact` | Supported | Supported | `EXACT` or `WITH { exact: true }` | +| HNSW ef tuning | `SearchParams.hnsw_ef` | Supported | Supported | `WITH { hnsw_ef: N }` | +| ACORN filtered search | `SearchParams.acorn` | Supported | Supported | Depends on Qdrant support | +| Search with filters | `Filter` | Supported | Supported | `WHERE` clause | +| Search pagination | `offset` | Missing | Missing | Near-term roadmap candidate | +| Batch search | Batch/query APIs | Missing | Missing | Later idea | +| MMR diversity | Query diversity controls | Missing | Missing | Later idea | +| Score boosting | Formula/rescore APIs | Missing | Missing | Later idea | +| Multivector search | Multivector query | Missing | Missing | Later idea | +| Rerank | Cross-encoder / inference | Supported | Partial | Python uses local Fastembed cross-encoder; Go behavior should be checked against `qql-go` docs | +| Relevance feedback | Feedback query | Missing | Missing | Later idea | + +## Recommend + +| Feature | Qdrant API | Python `qql-cli` | Go `qql-go` | Notes | +|---|---|---|---|---| +| Recommend by examples | Recommend query | Supported | Supported | `RECOMMEND FROM` | +| Positive/negative IDs | Recommend input | Supported | Supported | | +| Strategy selection | `RecommendStrategy` | Supported | Supported | `average_vector`, `best_score`, `sum_scores` | +| Cross-collection lookup | `lookup_from` | Supported | Supported | | +| Named vector usage | `using` | Supported | Supported | | +| Offset | `offset` | Supported | Supported | | +| Score threshold | `score_threshold` | Supported | Supported | | +| Filtered recommend | `Filter` | Supported | Supported | `WHERE` clause | + +## Payload Indexes + +| Feature | Qdrant API | Python `qql-cli` | Go `qql-go` | Notes | +|---|---|---|---|---| +| Keyword index | `create_payload_index` | Supported | Supported | | +| Integer index | `create_payload_index` | Supported | Supported | Python syntax uses `TYPE integer` | +| Float index | `create_payload_index` | Supported | Supported | | +| Bool index | `create_payload_index` | Supported | Supported | | +| Text index | `create_payload_index` | Supported | Partial | Go support should be verified | +| Geo index | `create_payload_index` | Supported | Missing | Python maps `TYPE geo` | +| Datetime index | `create_payload_index` | Supported | Missing | Python maps `TYPE datetime` | + +## Filtering + +| Feature | Qdrant model | Python `qql-cli` | Go `qql-go` | Notes | +|---|---|---|---|---| +| Equality | `MatchValue` | Supported | Supported | `=` | +| Inequality | `must_not` + `MatchValue` | Supported | Supported | `!=` | +| Range | `Range` | Supported | Supported | `>`, `<`, `>=`, `<=` | +| Between | `Range` | Supported | Supported | Inclusive | +| In list | `MatchAny` | Supported | Supported | `IN (...)` | +| Not in list | `MatchExcept` | Supported | Supported | `NOT IN (...)` | +| Is null | `IsNull` | Supported | Supported | | +| Is empty | `IsEmpty` | Supported | Supported | | +| Full-text match | `MatchText` | Supported | Supported | `MATCH` | +| Match any term | `MatchTextAny` | Supported | Supported | `MATCH ANY` | +| Match phrase | `MatchPhrase` | Supported | Supported | `MATCH PHRASE` | +| Logical operators | `must`, `should`, `must_not` | Supported | Supported | `AND`, `OR`, `NOT` | +| Nested fields | Payload key paths | Supported | Supported | Dot notation | +| Nested array access | Payload key paths | Partial | Partial | Keep examples conservative until integration-tested | + +## Version Notes + +| Implementation | Current version in this repo/docs | Notes | +|---|---|---| +| Python `qql-cli` | `1.4.0` | Source of truth for this repository | +| Go `qql-go` | `0.1.x` | Companion implementation; verify exact behavior in the Go repo before release claims | + +## Known Gaps + +| Gap | Impact | Suggested next step | +|---|---|---| +| No `GET` statement | Hard to inspect one point from the CLI | Add RFC or issue | +| No `SCROLL` statement | Hard to page/export large collections through QQL syntax | Add RFC or issue | +| No `COUNT` statement | Hard to validate scripts and filters | Add RFC or issue | +| No `DESCRIBE COLLECTION` | Users must drop to SDK/Qdrant UI for collection metadata | Add RFC or issue | +| No payload update syntax | Metadata updates require SDK calls or full reinsert | Add RFC or issue | +| Limited custom collection configuration | Advanced users need SDK for distance/HNSW/quantization | Define minimal syntax before implementing | + +## Maintenance Rule + +When changing QQL behavior: + +1. Update this matrix in the same PR. +2. Link or mention tests that prove the status. +3. Mark companion implementation status as `Unknown` rather than guessing. +4. Avoid future-tense claims unless there is an accepted RFC or linked issue. diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md new file mode 100644 index 0000000..bf23bc2 --- /dev/null +++ b/docs/CONTRIBUTING.md @@ -0,0 +1,171 @@ +# Contributing to QQL + +Thanks for helping improve QQL. This project is small enough that contributions should stay focused: fix one behavior, add one feature, or improve one document at a time. + +## Project Scope + +This repository contains the Python implementation, `qql-cli`. + +There is also a companion Go implementation, `qql-go`. Parity is useful and should be tracked, but a Python contribution does not need to implement the Go side in the same PR unless the maintainers explicitly ask for it. + +## Getting Started + +1. Read the root `README.md` for current user-facing behavior. +2. Check `ROADMAP.md` for likely priorities. +3. For syntax or behavior changes, check whether an issue or RFC already exists. +4. Keep the first PR small. + +## Development Setup + +```bash +git clone https://github.com/pavanjava/qql.git +cd qql +uv sync +uv run pytest tests/ -v +``` + +If you do not use `uv`, install the package in editable mode with your preferred Python workflow and run `pytest`. + +## Reporting Bugs + +Before opening a bug report: + +1. Check existing issues. +2. Test against the latest available version. +3. Reduce the problem to the smallest `.qql` statement or script you can. + +Include: + +- QQL version +- Python version +- Qdrant version and deployment type +- the exact QQL command or script +- expected output +- actual output or error +- whether the issue appears in human output, JSON output, or both + +## Requesting Features + +Feature requests usually fall into two groups: + +| Type | Meaning | +|---|---| +| Qdrant parity | Qdrant supports the operation but QQL does not expose it yet | +| QQL enhancement | A QQL-specific convenience or workflow | + +Open a normal issue for small Qdrant parity gaps. Use an RFC for syntax changes, CLI surface changes, JSON contract changes, or anything likely to affect both Python and Go. + +## Pull Requests + +Use this checklist: + +- keep the change scoped +- add or update tests for code changes +- update docs when syntax or behavior changes +- avoid unrelated formatting/refactors +- explain how you verified the change + +Suggested PR body: + +```markdown +## Summary +One sentence describing the change. + +## Changes +- What changed +- What was intentionally left out + +## Testing +- Command(s) run + +## Notes +Any compatibility, migration, or Go parity notes. +``` + +## When an RFC Is Required + +Use an RFC for: + +- a new statement type, such as `COUNT` or `UPDATE` +- changes to existing syntax +- new CLI commands or flags +- changes to JSON output shape +- breaking changes +- new inference or embedding modes + +An RFC is not required for: + +- bug fixes +- documentation-only changes +- tests +- internal refactors with no user-visible behavior change +- examples or tutorials + +## Coding Standards + +### Python + +- Follow the existing style. +- Keep parser, AST, executor, and tests in sync. +- Prefer clear error messages over generic exceptions. +- Keep AST dataclasses immutable unless there is a strong reason not to. +- Do not add abstractions for one-off code. + +### Go Parity Notes + +When a Python change affects shared QQL syntax, add a short note in the PR about Go impact: + +- no Go impact +- Go should eventually match +- Go behavior is intentionally different +- Go status unknown + +This is enough for the Python PR unless maintainers request a coordinated change. + +## Parser / Lexer Changes + +For new syntax, update the full pipeline: + +1. Lexer token or keyword +2. AST node +3. Parser rule +4. Executor behavior +5. Unit tests +6. Documentation +7. Compatibility matrix + +See `docs/SYNTAX_GUIDELINES.md` for a longer walkthrough. + +## Testing + +Run: + +```bash +uv run pytest tests/ -v +``` + +Integration tests, if added, should clearly document their Qdrant setup requirements. + +## Documentation + +Keep docs accurate before making them broad. + +| File | Purpose | +|---|---| +| `README.md` | Main user entry point | +| `ROADMAP.md` | Priorities and planned direction | +| `docs/syntax/` | Syntax reference | +| `docs/COMPATIBILITY.md` | Feature support matrix | +| `docs/RFCS/` | Design proposals | +| `docs/TUTORIALS/` | Runnable workflows | +| `docs/MIGRATING/` | Migration notes | + +Do not link to missing pages as if they already exist. Mark planned docs as planned until they are written. + +## Release Process + +Maintainers handle versioning and releases. Contributors usually do not need to bump versions. + +## Questions + +Use GitHub Discussions for open-ended questions and issues for scoped bugs or feature requests. diff --git a/docs/MIGRATING/README.md b/docs/MIGRATING/README.md new file mode 100644 index 0000000..1dc3d0a --- /dev/null +++ b/docs/MIGRATING/README.md @@ -0,0 +1,71 @@ +# Migrating to QQL + +This directory is for practical migration notes from raw Qdrant SDK calls or other vector database workflows into QQL. + +Keep migration guides honest: QQL is useful for readable CLI/script workflows, but it is not a replacement for every SDK feature. + +## Planned Guides + +| Guide | Purpose | Status | +|---|---|---| +| `python-sdk-to-qql.md` | Map common `qdrant-client` operations to QQL | Planned | +| `rest-api-to-qql.md` | Convert common curl examples to QQL | Planned | +| `go-sdk-to-qql.md` | Map common Go client operations to QQL | Planned | +| `sql-to-qql.md` | Explain where SQL instincts transfer and where they do not | Planned | + +## Quick Examples + +### Insert + +Python SDK: + +```python +client.upsert( + collection_name="articles", + points=[PointStruct(id=1, vector=[...], payload={"text": "Hello"})], +) +``` + +QQL: + +```sql +INSERT INTO COLLECTION articles VALUES {'id': 1, 'text': 'Hello'} +``` + +### Search With Filter + +Python SDK: + +```python +client.query_points( + collection_name="articles", + query=[...], + query_filter=Filter(...), + limit=5, +) +``` + +QQL: + +```sql +SEARCH articles SIMILAR TO 'machine learning' LIMIT 5 WHERE category = 'ml' +``` + +## Migration Checklist + +- Map collections and payload fields. +- Decide whether each collection should be dense-only or hybrid. +- Create payload indexes for fields used in filters. +- Convert a small sample first. +- Compare search results before migrating a full workflow. +- Keep raw SDK code where QQL does not yet expose the needed Qdrant feature. + +## Adding a Guide + +Each guide should include: + +- what is being migrated from +- side-by-side examples +- limitations +- performance or indexing notes +- tested setup diff --git a/docs/RFCS/README.md b/docs/RFCS/README.md new file mode 100644 index 0000000..de7a952 --- /dev/null +++ b/docs/RFCS/README.md @@ -0,0 +1,66 @@ +# RFCs + +RFCs are lightweight design notes for changes that affect QQL syntax, CLI behavior, JSON output, or cross-implementation compatibility. + +Use an RFC to slow down decisions that would be hard to reverse. Do not use RFCs for routine bug fixes or documentation cleanup. + +## When to Write an RFC + +Write an RFC for: + +- a new statement type, such as `GET`, `COUNT`, `SCROLL`, or `UPDATE` +- changes to existing statement syntax +- new CLI commands or flags +- JSON output contract changes +- breaking changes +- new embedding or inference modes + +No RFC is needed for: + +- bug fixes +- tests +- documentation updates +- examples and tutorials +- internal refactors with no user-visible behavior change + +## Process + +1. Copy `.github/RFC_TEMPLATE.md`. +2. Create `docs/RFCS/NNNN-short-title.md`. +3. Mark the status as `Draft`. +4. Open a PR when ready for discussion. +5. Update the RFC as decisions are made. +6. Link implementation PRs after merge. + +## Statuses + +| Status | Meaning | +|---|---| +| Draft | The author is still shaping the proposal | +| Proposed | Ready for maintainer/community review | +| Accepted | Approved for implementation | +| Rejected | Declined, with reason documented | +| Implemented | Merged into at least one implementation | + +## RFC Index + +No RFCs have been accepted yet. + +When RFCs are added, keep the index grouped by status: + +| RFC | Title | Status | Implementation | +|---|---|---|---| +| `0001-example.md` | Example title | Proposed | Not implemented | + +## Guidance + +Good RFCs are specific: + +- show exact syntax +- map syntax to Qdrant APIs +- define human-readable and JSON output +- list known limitations +- state whether Python and Go should match +- explain alternatives considered + +Keep one RFC focused on one behavior. Large bundles are harder to review and easier to stall. diff --git a/docs/SYNTAX_GUIDELINES.md b/docs/SYNTAX_GUIDELINES.md new file mode 100644 index 0000000..60bbfc5 --- /dev/null +++ b/docs/SYNTAX_GUIDELINES.md @@ -0,0 +1,133 @@ +# Syntax Guidelines + +Use this guide when adding or changing QQL syntax. + +The important rule: syntax changes must update the whole path from lexer to docs. A statement is not complete just because the parser accepts it. + +## Before Coding + +Write down: + +- the exact syntax +- the Qdrant API it maps to +- whether it needs embeddings +- whether it changes JSON output +- whether Go should eventually match +- known limitations + +If the change adds a statement, changes existing syntax, or changes JSON output, write an RFC first. + +## Implementation Checklist + +For Python `qql-cli`, most syntax changes touch: + +| Stage | Typical file | +|---|---| +| Lexer | `src/qql/lexer.py` | +| AST | `src/qql/ast_nodes.py` | +| Parser | `src/qql/parser.py` | +| Executor | `src/qql/executor.py` | +| Tests | `tests/test_lexer.py`, `tests/test_parser.py`, `tests/test_executor.py` | +| Docs | `docs/syntax/README.md`, `docs/COMPATIBILITY.md`, maybe `README.md` | + +For Go parity, use the equivalent lexer/parser/AST/executor/test locations in `qql-go`. A Python PR can include a Go parity note without implementing the Go side. + +## Example: Adding `COUNT` + +Proposed syntax: + +```sql +COUNT +COUNT WHERE +``` + +Qdrant mapping: + +| QQL | Qdrant | +|---|---| +| `COUNT ` | `client.count(collection_name=...)` | +| `WHERE ` | `count_filter` | + +Expected output: + +```text +42 point(s) in 'articles' +``` + +Expected JSON shape: + +```json +{ + "success": true, + "message": "42 point(s) in 'articles'", + "data": { + "count": 42 + } +} +``` + +## Parser Rules + +Keep grammar small and predictable: + +- prefer one obvious clause order +- avoid aliases unless there is a compatibility reason +- reuse existing filter parsing where possible +- reject unsupported syntax with clear errors +- do not silently ignore extra tokens + +For `COUNT`, a simple grammar is enough: + +```text +count_stmt := "COUNT" identifier [where_clause] +``` + +## Executor Rules + +Executor code should: + +- check collection existence when needed +- convert QQL filters through the existing filter builder +- call the closest Qdrant API directly +- return a stable `ExecutionResult` +- wrap Qdrant errors with a QQL-specific message + +Do not add planner or optimizer layers unless the feature genuinely needs them. + +## Tests + +At minimum, add tests for: + +- lexer recognizes new keywords +- parser builds the right AST +- invalid syntax fails clearly +- executor calls the expected Qdrant client method +- JSON/human output stays stable if the CLI formats the result + +If the feature maps to Qdrant behavior that is hard to mock, add a small integration test separately and document the setup. + +## Documentation + +Update: + +- `docs/COMPATIBILITY.md` +- `docs/syntax/README.md` +- root `README.md` only when the feature is stable and important for most users + +Dedicated syntax pages are welcome, but only link to them after they exist. + +Each syntax page should include: + +- syntax +- 2-3 examples +- output shape +- limitations +- version/support notes + +## Common Pitfalls + +- Adding parser support without executor support. +- Updating Python docs while forgetting Go parity notes. +- Documenting planned syntax as implemented. +- Returning a JSON shape that differs from similar statements. +- Adding broad syntax when a smaller first version would solve the use case. diff --git a/docs/TUTORIALS/README.md b/docs/TUTORIALS/README.md new file mode 100644 index 0000000..667d300 --- /dev/null +++ b/docs/TUTORIALS/README.md @@ -0,0 +1,68 @@ +# Tutorials + +This directory is for runnable, end-to-end QQL workflows. + +At the moment it is an index and template. Add tutorial files only after the commands have been tested against a real Qdrant instance. + +## Good First Tutorials + +| Tutorial | Goal | Status | +|---|---|---| +| `01-quick-start.md` | Create a collection, insert data, search, clean up | Planned | +| `02-filtered-search.md` | Use `WHERE` filters and payload indexes | Planned | +| `03-hybrid-search.md` | Compare dense, sparse, and hybrid search | Planned | +| `04-reranking.md` | Show when `RERANK` improves precision | Planned | +| `05-dump-restore.md` | Export and restore a small collection | Planned | + +## Tutorial Rules + +- Keep one tutorial focused on one workflow. +- Include setup and cleanup. +- Use small sample data. +- Show expected output when it helps. +- Avoid domain-heavy examples until the basics are covered. +- Do not depend on private services or credentials. + +## Template + +Use this structure for new tutorial files: + +### Title + +Short, task-oriented name. + +> Time: 5-10 minutes +> Requires: Qdrant running locally + +### Goal + +What the user will accomplish. + +### Setup + +```bash +docker run -p 6333:6333 qdrant/qdrant +qql connect --url http://localhost:6333 +``` + +### Steps + +```sql +CREATE COLLECTION demo +INSERT INTO COLLECTION demo VALUES {'text': 'hello vector search'} +SEARCH demo SIMILAR TO 'hello' LIMIT 3 +``` + +### Expected Result + +Describe the important output, not every character. + +### Cleanup + +```sql +DROP COLLECTION demo +``` + +### Next + +Link to related docs or tutorials. diff --git a/docs/syntax/CREATE_COLLECTION.md b/docs/syntax/CREATE_COLLECTION.md new file mode 100644 index 0000000..3f7d6f7 --- /dev/null +++ b/docs/syntax/CREATE_COLLECTION.md @@ -0,0 +1,34 @@ +# CREATE COLLECTION + +Create a Qdrant collection sized for QQL's embedding model. + +## Syntax + +```sql +CREATE COLLECTION +CREATE COLLECTION USING MODEL '' +CREATE COLLECTION HYBRID +CREATE COLLECTION USING HYBRID +CREATE COLLECTION USING HYBRID DENSE MODEL '' +``` + +## Examples + +```sql +CREATE COLLECTION articles +CREATE COLLECTION articles USING MODEL 'BAAI/bge-base-en-v1.5' +CREATE COLLECTION articles HYBRID +``` + +## Behavior + +- Dense collections store one dense vector per point. +- Hybrid collections store a named dense vector and a named sparse vector. +- The dense vector size is inferred from the configured or requested embedding model. +- Distance is currently cosine. + +## Limitations + +- Custom distance, HNSW config, quantization, and on-disk payload options are not exposed yet. +- Multivector collections are not exposed yet. + diff --git a/docs/syntax/CREATE_INDEX.md b/docs/syntax/CREATE_INDEX.md new file mode 100644 index 0000000..db01519 --- /dev/null +++ b/docs/syntax/CREATE_INDEX.md @@ -0,0 +1,41 @@ +# CREATE INDEX + +Create a Qdrant payload index for fields used in filters. + +## Syntax + +```sql +CREATE INDEX ON COLLECTION FOR TYPE +``` + +Supported Python index types: + +| Type | Use for | +|---|---| +| `keyword` | exact string/category filters | +| `integer` | integer range/equality filters | +| `float` | floating-point range/equality filters | +| `bool` | boolean filters | +| `text` | full-text match filters | +| `geo` | geo payload fields | +| `datetime` | datetime payload fields | + +## Examples + +```sql +CREATE INDEX ON COLLECTION articles FOR category TYPE keyword +CREATE INDEX ON COLLECTION articles FOR year TYPE integer +CREATE INDEX ON COLLECTION articles FOR score TYPE float +``` + +## Behavior + +- The collection must already exist. +- Dot notation is accepted for nested payload fields. +- Indexes improve filtered search performance in Qdrant. + +## Limitations + +- QQL does not currently expose advanced text index configuration. +- Companion implementation support should be checked before claiming cross-language parity for every index type. + diff --git a/docs/syntax/DELETE.md b/docs/syntax/DELETE.md new file mode 100644 index 0000000..1e8290a --- /dev/null +++ b/docs/syntax/DELETE.md @@ -0,0 +1,33 @@ +# DELETE + +Delete points by ID or by filter. + +## Syntax + +```sql +DELETE FROM WHERE id = '' +DELETE FROM WHERE id = +DELETE FROM WHERE +``` + +## Examples + +```sql +DELETE FROM articles WHERE id = 1 +``` + +```sql +DELETE FROM articles WHERE status = 'archived' +``` + +## Behavior + +- `WHERE id = ...` deletes a single point by ID. +- Any other supported filter deletes all matching points. +- The collection must exist. + +## Limitations + +- `DELETE PAYLOAD` is not implemented yet. +- There is no dry-run syntax yet. Use filters carefully. + diff --git a/docs/syntax/DROP_COLLECTION.md b/docs/syntax/DROP_COLLECTION.md new file mode 100644 index 0000000..6c28488 --- /dev/null +++ b/docs/syntax/DROP_COLLECTION.md @@ -0,0 +1,26 @@ +# DROP COLLECTION + +Delete a Qdrant collection. + +## Syntax + +```sql +DROP COLLECTION +``` + +## Example + +```sql +DROP COLLECTION articles +``` + +## Behavior + +- The collection must exist. +- The operation deletes the collection and its points. + +## Limitations + +- There is no confirmation prompt inside QQL syntax. +- Collection snapshots and restore workflows are not exposed through QQL yet. + diff --git a/docs/syntax/INSERT.md b/docs/syntax/INSERT.md new file mode 100644 index 0000000..00da259 --- /dev/null +++ b/docs/syntax/INSERT.md @@ -0,0 +1,40 @@ +# INSERT + +Insert one point into a collection. + +## Syntax + +```sql +INSERT INTO COLLECTION VALUES {} +INSERT INTO COLLECTION VALUES {} USING MODEL '' +INSERT INTO COLLECTION VALUES {} USING HYBRID +INSERT INTO COLLECTION VALUES {} USING HYBRID DENSE MODEL '' SPARSE MODEL '' +``` + +## Examples + +```sql +INSERT INTO COLLECTION articles VALUES {'text': 'Qdrant is a vector database'} +``` + +```sql +INSERT INTO COLLECTION articles VALUES { + 'id': 1, + 'text': 'Hybrid search combines dense and sparse retrieval', + 'category': 'search' +} USING HYBRID +``` + +## Behavior + +- `text` is required and is embedded automatically. +- If `id` is omitted, QQL generates a UUID. +- Explicit IDs may be unsigned integers or UUID strings. +- If the collection does not exist, QQL can auto-create it using the selected embedding mode. +- Hybrid inserts write both dense and sparse vectors. + +## Limitations + +- QQL does not accept precomputed vectors in `INSERT`. +- Updating only payload fields is not exposed yet; use the planned `UPDATE` syntax once implemented. + diff --git a/docs/syntax/INSERT_BULK.md b/docs/syntax/INSERT_BULK.md new file mode 100644 index 0000000..d72337b --- /dev/null +++ b/docs/syntax/INSERT_BULK.md @@ -0,0 +1,34 @@ +# INSERT BULK + +Insert multiple points in one statement. + +## Syntax + +```sql +INSERT BULK INTO COLLECTION VALUES [{}, ...] +INSERT BULK INTO COLLECTION VALUES [{}, ...] USING MODEL '' +INSERT BULK INTO COLLECTION VALUES [{}, ...] USING HYBRID +INSERT BULK INTO COLLECTION VALUES [{}, ...] USING HYBRID DENSE MODEL '' SPARSE MODEL '' +``` + +## Example + +```sql +INSERT BULK INTO COLLECTION articles VALUES [ + {'id': 1, 'text': 'Dense vectors capture semantic similarity', 'category': 'search'}, + {'id': 2, 'text': 'Sparse vectors help with keyword matching', 'category': 'search'} +] USING HYBRID +``` + +## Behavior + +- Each item must be a dictionary. +- Each item must contain `text`. +- Explicit IDs follow the same rules as `INSERT`. +- The selected model and hybrid mode apply to all items. + +## Limitations + +- Very large imports should be split into manageable script files. +- QQL currently embeds text client-side before upsert. + diff --git a/docs/syntax/README.md b/docs/syntax/README.md new file mode 100644 index 0000000..ba313d1 --- /dev/null +++ b/docs/syntax/README.md @@ -0,0 +1,91 @@ +# QQL Syntax Reference + +This page is a compact index of the QQL language surface. It lists implemented syntax and planned syntax separately so the docs do not imply pages or features exist before they do. + +For a complete narrative walkthrough, see the root `README.md`. For focused syntax details, use the pages linked below. + +## Implemented Statements + +These statements are parsed by QQL and executed against Qdrant. + +| Statement | Description | Python `qql-cli` | +|---|---|:---:| +| [`CREATE COLLECTION`](CREATE_COLLECTION.md) | Create dense or hybrid collections | Supported | +| [`CREATE INDEX`](CREATE_INDEX.md) | Create a payload index | Supported | +| [`DROP COLLECTION`](DROP_COLLECTION.md) | Delete a collection | Supported | +| [`SHOW COLLECTIONS`](SHOW_COLLECTIONS.md) | List collections | Supported | +| [`INSERT`](INSERT.md) | Insert one point | Supported | +| [`INSERT BULK`](INSERT_BULK.md) | Insert multiple points | Supported | +| [`SEARCH`](SEARCH.md) | Dense, hybrid, sparse, filtered, and reranked search | Supported | +| [`RECOMMEND`](RECOMMEND.md) | Recommend by example IDs | Supported | +| [`DELETE`](DELETE.md) | Delete by ID or filter | Supported | + +## Planned Statements + +These are roadmap items, not current syntax. + +| Statement | Purpose | +|---|---| +| `GET FROM WHERE id = ''` | Retrieve a point by ID | +| `SCROLL LIMIT ` | Iterate through points | +| `COUNT WHERE ` | Count points | +| `DESCRIBE COLLECTION ` | Show collection configuration and statistics | +| `UPDATE SET {...} WHERE id = ''` | Update payload fields | +| `DELETE PAYLOAD FROM WHERE id = ''` | Remove payload keys | + +## CLI / REPL Commands + +These commands are handled by the CLI or REPL instead of the language parser. + +| Command | Description | Python `qql-cli` | +|---|---|:---:| +| `qql connect --url ` | Save connection settings | Supported | +| `qql disconnect` | Remove saved connection settings | Supported | +| `qql execute ` | Run a `.qql` script file | Supported | +| `DUMP COLLECTION TO ''` | Export a collection to QQL statements | Supported | + +## Clauses and Modifiers + +| Clause | Used in | Description | +|---|---|---| +| `WHERE` | `SEARCH`, `RECOMMEND`, `DELETE` | Payload filtering | +| `USING MODEL ''` | `CREATE COLLECTION`, `INSERT`, `SEARCH` | Pin dense embedding model | +| `USING HYBRID` | `CREATE COLLECTION`, `INSERT`, `SEARCH` | Use dense+sparse vectors | +| `DENSE MODEL ''` | Hybrid `CREATE COLLECTION`, `INSERT`, `SEARCH` | Pin dense model | +| `SPARSE MODEL ''` | Hybrid/sparse `INSERT`, `SEARCH` | Pin sparse model | +| `USING SPARSE` | `SEARCH` | Search sparse vector only | +| `RERANK` | `SEARCH` | Apply cross-encoder reranking | +| `EXACT` | `SEARCH`, `RECOMMEND` | Use exact search | +| `WITH { hnsw_ef, exact, acorn }` | `SEARCH`, `RECOMMEND` | Query-time search params | +| `LIMIT ` | `SEARCH`, `RECOMMEND` | Max results | +| `OFFSET ` | `RECOMMEND` | Skip initial results | +| `SCORE THRESHOLD ` | `RECOMMEND` | Filter low-scoring recommendations | +| `STRATEGY ''` | `RECOMMEND` | Recommendation strategy | +| `LOOKUP FROM ` | `RECOMMEND` | Use examples from another collection | + +## Filter Operators + +| Operator | Example | +|---|---| +| `=` | `status = 'active'` | +| `!=` | `status != 'draft'` | +| `>` / `>=` | `year >= 2020` | +| `<` / `<=` | `score < 0.8` | +| `BETWEEN ... AND` | `year BETWEEN 2020 AND 2024` | +| `IN (...)` | `status IN ('a', 'b')` | +| `NOT IN (...)` | `status NOT IN ('x', 'y')` | +| `IS NULL` / `IS NOT NULL` | `reviewer IS NOT NULL` | +| `IS EMPTY` / `IS NOT EMPTY` | `tags IS NOT EMPTY` | +| `MATCH` | `title MATCH 'vector database'` | +| `MATCH ANY` | `title MATCH ANY 'embedding retrieval'` | +| `MATCH PHRASE` | `title MATCH PHRASE 'semantic search'` | +| `AND` / `OR` / `NOT` | `status = 'active' AND NOT archived = true` | + +## Adding Syntax Docs + +When adding a dedicated page for a statement: + +1. Create `docs/syntax/STATEMENT_NAME.md`. +2. Include syntax, examples, output shape, and limitations. +3. Link it from this index only after the page exists. +4. Update `docs/COMPATIBILITY.md` if implementation status changes. diff --git a/docs/syntax/RECOMMEND.md b/docs/syntax/RECOMMEND.md new file mode 100644 index 0000000..3827205 --- /dev/null +++ b/docs/syntax/RECOMMEND.md @@ -0,0 +1,47 @@ +# RECOMMEND + +Find points similar to existing example point IDs. + +## Syntax + +```sql +RECOMMEND FROM POSITIVE IDS (, ...) LIMIT +RECOMMEND FROM POSITIVE IDS (, ...) NEGATIVE IDS (, ...) LIMIT +RECOMMEND FROM POSITIVE IDS (, ...) STRATEGY '' LIMIT +RECOMMEND FROM POSITIVE IDS (, ...) LOOKUP FROM LIMIT +RECOMMEND FROM POSITIVE IDS (, ...) LIMIT WHERE +RECOMMEND FROM POSITIVE IDS (, ...) LIMIT WITH { exact: true, hnsw_ef: 128 } +``` + +## Examples + +```sql +RECOMMEND FROM articles POSITIVE IDS (1) LIMIT 5 +``` + +```sql +RECOMMEND FROM articles +POSITIVE IDS (1, 2) +NEGATIVE IDS (3) +STRATEGY 'best_score' +LIMIT 10 +WHERE category = 'search' +``` + +## Options + +| Clause | Purpose | +|---|---| +| `NEGATIVE IDS` | Push results away from examples | +| `STRATEGY` | Select Qdrant recommendation strategy | +| `LOOKUP FROM` | Use examples from another collection | +| `USING ''` | Choose target named vector | +| `OFFSET` | Skip initial recommendations | +| `SCORE THRESHOLD` | Exclude low-scoring results | +| `WITH` | Pass search params | + +## Limitations + +- Example IDs must already exist. +- The supported strategies are the strategies exposed by the Qdrant client. + diff --git a/docs/syntax/SEARCH.md b/docs/syntax/SEARCH.md new file mode 100644 index 0000000..2cb28e6 --- /dev/null +++ b/docs/syntax/SEARCH.md @@ -0,0 +1,53 @@ +# SEARCH + +Search a collection using dense, sparse, or hybrid retrieval. + +## Syntax + +```sql +SEARCH SIMILAR TO '' LIMIT +SEARCH SIMILAR TO '' LIMIT WHERE +SEARCH SIMILAR TO '' LIMIT USING MODEL '' +SEARCH SIMILAR TO '' LIMIT USING HYBRID +SEARCH SIMILAR TO '' LIMIT USING SPARSE +SEARCH SIMILAR TO '' LIMIT RERANK +SEARCH SIMILAR TO '' LIMIT WITH { hnsw_ef: 128, exact: true, acorn: true } +``` + +## Examples + +```sql +SEARCH articles SIMILAR TO 'vector search' LIMIT 5 +``` + +```sql +SEARCH articles SIMILAR TO 'keyword heavy query' LIMIT 10 USING HYBRID WHERE category = 'search' +``` + +```sql +SEARCH articles SIMILAR TO 'high precision result' LIMIT 5 RERANK +``` + +## Filters + +`WHERE` supports equality, inequality, ranges, `BETWEEN`, `IN`, `NOT IN`, null/empty checks, text match operators, and `AND`/`OR`/`NOT`. + +```sql +SEARCH articles SIMILAR TO 'retrieval' LIMIT 10 +WHERE year >= 2023 AND title MATCH ANY 'hybrid sparse' +``` + +## Behavior + +- Dense search is the default. +- `USING HYBRID` searches dense and sparse vectors and fuses results. +- `USING SPARSE` searches only the sparse vector. +- `RERANK` applies a cross-encoder reranking pass to retrieved candidates. +- `EXACT` is shorthand for exact search. + +## Limitations + +- `SEARCH ... OFFSET` is not implemented yet. +- Batch search is not implemented yet. +- Reranking adds latency and expects useful text in the result payload. + diff --git a/docs/syntax/SHOW_COLLECTIONS.md b/docs/syntax/SHOW_COLLECTIONS.md new file mode 100644 index 0000000..7957942 --- /dev/null +++ b/docs/syntax/SHOW_COLLECTIONS.md @@ -0,0 +1,27 @@ +# SHOW COLLECTIONS + +List available Qdrant collections. + +## Syntax + +```sql +SHOW COLLECTIONS +``` + +## Example + +```sql +SHOW COLLECTIONS +``` + +## Output + +Human-readable output lists collection names. + +Programmatic execution returns a list of names in `ExecutionResult.data`. + +## Limitations + +- This statement lists names only. +- Use the planned `DESCRIBE COLLECTION` statement, once implemented, for vector config, point counts, and index details. +