Fixed all issues

pavanjava · pavanjava · commit 301bad55d7cc · 2026-05-15T15:52:51.000+05:30
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@
 [![PyPI version](https://img.shields.io/pypi/v/qql-cli?color=blue&label=PyPI)](https://pypi.org/project/qql-cli/)
 [![Python 3.12+](https://img.shields.io/pypi/pyversions/qql-cli)](https://pypi.org/project/qql-cli/)
 [![MIT License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
-[![Tests](https://img.shields.io/badge/tests-485%20passing-brightgreen)](tests/)
+[![Tests](https://img.shields.io/badge/tests-500%20passing-brightgreen)](tests/)
 
 Write `INSERT`, `SELECT`, `SEARCH`, `SCROLL`, `RECOMMEND`, `UPDATE`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, grouped search (GROUP BY), cross-encoder reranking, quantization (scalar, turbo, binary, product), SQL-style `WHERE` filters, script execution, and collection dump/restore.
 
@@ -158,7 +158,7 @@ Tests do not require a running Qdrant instance — the Qdrant client is mocked.
 pytest tests/ -v
 ```
 
-Expected: **485 tests passing**.
+Expected: **500 tests passing**.
 
 ---
 
diff --git a/docs/reference.md b/docs/reference.md
@@ -162,7 +162,7 @@ Tests do not require a running Qdrant instance — the Qdrant client is mocked.
 pytest tests/ -v
 ```
 
-Expected output: **485 tests passing**.
+Expected output: **500 tests passing**.
 
 ---
 
@@ -188,6 +188,8 @@ Expected output: **485 tests passing**.
 | `Expected a vector list [...] after point ID in UPDATE SET VECTOR` | UPDATE SET VECTOR missing the `[...]` float array | Add the vector array: `UPDATE ... SET VECTOR WHERE id = '...' [0.1, 0.2, ...]` |
 | `Qdrant error during UPDATE VECTOR: ...` | Point does not exist, or vector dimensions mismatch | Verify the point ID exists and the vector length matches the collection's dimensions |
 | `Qdrant error during UPDATE PAYLOAD: ...` | Qdrant rejected the payload update | Check field values and collection state |
+| `Vector elements must be numeric; got invalid value: ...` | A non-numeric value (string, boolean, or null) was present in the vector array for `UPDATE SET VECTOR` | Ensure all vector elements are floats: `UPDATE … [0.1, 0.2, …, 0.N]` |
+| `GROUP_SIZE must be a positive integer, got N` | `GROUP_SIZE 0` or a negative value was specified | Use a positive integer: `GROUP_SIZE 3` |
 | `Qdrant error during SCROLL: ...` | Qdrant rejected scroll request | Verify collection state, filter, and cursor (`AFTER`) value |
 | `Unknown index type '...'` | Invalid schema type in CREATE INDEX | Use one of: `keyword`, `integer`, `float`, `bool`, `text`, `geo`, `datetime` |
 | `Qdrant error during CREATE INDEX: ...` | Qdrant rejected the index creation | Check field name and collection state |
diff --git a/docs/search.md b/docs/search.md
@@ -362,7 +362,7 @@ SEARCH <collection> SIMILAR TO '<query>' LIMIT <n> USING HYBRID GROUP BY <field>
 
 - **`LIMIT <n>`** — maximum number of **groups** to return.
 - **`GROUP_SIZE <m>`** — maximum number of points per group (default: **3**).
-- **`GROUP BY <field>`** — the payload field whose values define the groups. Dot-notation is supported (e.g. `meta.author`). The field should be indexed as `keyword` or `integer` for best performance.
+- **`GROUP BY <field>`** — the payload field whose values define the groups. **Must be a string (keyword) or number (integer) field** — this is enforced by Qdrant. Dot-notation is supported (e.g. `meta.author`). Array-valued fields are allowed: a point with multiple values for the field can appear in multiple groups. The field should be indexed as `keyword` or `integer` for best performance (see [CREATE INDEX](collections.md)).
 - `WHERE` filters, `USING HYBRID`, and `USING MODEL` are all compatible with GROUP BY.
 - **`GROUP BY` and `RERANK` cannot be combined** in the same statement — this raises a syntax error.
 
diff --git a/src/qql/cli.py b/src/qql/cli.py
@@ -71,6 +71,9 @@
       Optional: [yellow]RERANK[/yellow] [MODEL '<model>']   rerank results with a cross-encoder
       Optional: [yellow]EXACT[/yellow]   bypass HNSW and perform exact search
       Optional: [yellow]WITH[/yellow] { hnsw_ef: <int>, exact: <bool>, acorn: <bool> }   search parameters
+      Optional: [yellow]GROUP BY[/yellow] <field> [[yellow]GROUP_SIZE[/yellow] <n>]
+                  Group results by a payload field value (default GROUP_SIZE: 3).
+                  Field must be keyword or integer type. RERANK and GROUP BY cannot be combined.
 
   [yellow]RECOMMEND FROM[/yellow] <name> [yellow]POSITIVE IDS[/yellow] (<id>, ...)
       Find points similar to known examples.
@@ -82,6 +85,15 @@
   [yellow]DELETE FROM[/yellow] <name> [yellow]WHERE id =[/yellow] '<id>'
       Delete a point by its ID.
 
+  [yellow]UPDATE[/yellow] <name> [yellow]SET VECTOR WHERE id =[/yellow] '<id>'|<int> [<vector>]
+      Replace the dense vector for a single point by ID.
+      The point must already exist. Vector is a float array: [0.1, 0.2, ..., 0.N]
+
+  [yellow]UPDATE[/yellow] <name> [yellow]SET PAYLOAD WHERE id =[/yellow] '<id>'|<int> {<payload>}
+  [yellow]UPDATE[/yellow] <name> [yellow]SET PAYLOAD WHERE[/yellow] <filter> {<payload>}
+      Merge new key/value pairs into a point's payload (additive; existing fields preserved).
+      Supports all WHERE filter operators. Filter-based updates affect all matching points.
+
 Script files (in-shell):
   [yellow]EXECUTE[/yellow] <path>   or   [yellow]\\e[/yellow] <path>
       Run a .qql script file. Statements are executed in order.
@@ -458,6 +470,32 @@ def _run_and_print(executor: Executor, query: str) -> None:
         console.print(_format_collection_diagnostics(result.data))
         return
 
+    # Pretty-print grouped search results (GROUP BY)
+    if (
+        isinstance(result.data, list)
+        and result.data
+        and isinstance(result.data[0], dict)
+        and "group_id" in result.data[0]
+    ):
+        for group in result.data:
+            console.print(f"\n[bold cyan]Group: {group['group_id']}[/bold cyan]")
+            hits = group.get("hits", [])
+            if hits:
+                tbl = Table(show_header=True, header_style="bold")
+                tbl.add_column("Score", style="green", no_wrap=True, justify="right")
+                tbl.add_column("ID")
+                tbl.add_column("Payload")
+                for hit in hits:
+                    tbl.add_row(
+                        str(hit["score"]),
+                        str(hit["id"]),
+                        str(hit.get("payload", {})),
+                    )
+                console.print(tbl)
+            else:
+                console.print("  (no hits)")
+        return
+
     # Pretty-print search results
     if isinstance(result.data, list) and result.data and isinstance(result.data[0], dict) and "score" in result.data[0]:
         table = Table(show_header=True, header_style="bold cyan")
diff --git a/src/qql/executor.py b/src/qql/executor.py
@@ -983,6 +983,23 @@ def _execute_search_groups(
                     query_filter=qdrant_filter,
                 )
                 label = "hybrid, grouped"
+            elif node.sparse_only:
+                sparse_model_name = node.sparse_model or SparseEmbedder.DEFAULT_MODEL
+                sparse_obj = SparseEmbedder(sparse_model_name).query_embed(node.query_text)
+                sparse_vector = SparseVector(
+                    indices=sparse_obj["indices"],
+                    values=sparse_obj["values"],
+                )
+                response = self._client.query_points_groups(
+                    collection_name=node.collection,
+                    group_by=node.group_by,
+                    query=sparse_vector,
+                    using="sparse",
+                    limit=node.limit,
+                    group_size=node.group_size,
+                    query_filter=qdrant_filter,
+                )
+                label = "sparse, grouped"
             else:
                 model_name = node.model or self._config.default_model
                 vector = Embedder(model_name).embed(node.query_text)
diff --git a/src/qql/parser.py b/src/qql/parser.py
@@ -439,7 +439,13 @@ def _parse_search(self) -> SearchStmt:
                 )
             if self._peek().kind == TokenKind.GROUP_SIZE:
                 self._advance()  # consume GROUP_SIZE
+                gs_tok = self._peek()
                 group_size = int(self._expect(TokenKind.INTEGER).value)
+                if group_size <= 0:
+                    raise QQLSyntaxError(
+                        f"GROUP_SIZE must be a positive integer, got {group_size}",
+                        gs_tok.pos,
+                    )
         return SearchStmt(
             collection=collection,
             query_text=query_text,
@@ -566,10 +572,17 @@ def _parse_update(self) -> UpdateVectorStmt | UpdatePayloadStmt:
                     "Expected a vector list [...] after point ID in UPDATE SET VECTOR",
                     self._peek().pos,
                 )
+            try:
+                coerced = tuple(float(v) for v in vector_val)
+            except (ValueError, TypeError) as exc:
+                raise QQLSyntaxError(
+                    f"Vector elements must be numeric; got invalid value: {exc}",
+                    self._peek().pos,
+                ) from exc
             return UpdateVectorStmt(
                 collection=collection,
                 point_id=point_id,
-                vector=tuple(float(v) for v in vector_val),
+                vector=coerced,
             )
 
         if self._peek().kind == TokenKind.PAYLOAD:
diff --git a/tests/test_executor.py b/tests/test_executor.py
@@ -2299,3 +2299,173 @@ def test_update_payload_passes_wait_true(self, executor, mock_client):
         executor.execute(node)
         kwargs = mock_client.set_payload.call_args.kwargs
         assert kwargs.get("wait") is True
+
+
+# ── PR #28 review gap fixes ───────────────────────────────────────────────────
+
+class TestSearchGroupBySparse:
+    """Gap 1 & 6 — sparse-only grouped search must use the sparse path."""
+
+    def test_sparse_only_grouped_calls_query_points_groups(self, executor, mock_client, mocker):
+        mock_client.collection_exists.return_value = True
+        mock_response = mocker.MagicMock()
+        mock_response.groups = []
+        mock_client.query_points_groups.return_value = mock_response
+
+        mock_sparse = mocker.MagicMock()
+        mock_sparse.query_embed.return_value = {"indices": [0, 1], "values": [0.5, 0.5]}
+        mocker.patch("qql.executor.SparseEmbedder", return_value=mock_sparse)
+
+        node = SearchStmt(
+            collection="articles", query_text="q", limit=5, model=None,
+            sparse_only=True, group_by="category", group_size=3,
+        )
+        executor.execute(node)
+        mock_client.query_points_groups.assert_called_once()
+        kwargs = mock_client.query_points_groups.call_args.kwargs
+        assert kwargs.get("using") == "sparse"
+        # Must NOT have called dense Embedder
+        from qql.embedder import Embedder as _Embedder  # noqa: F401
+        # mock_embedder fixture patches Embedder; query_points not called confirms no dense path
+        mock_client.query_points.assert_not_called()
+
+    def test_sparse_only_grouped_label_in_message(self, executor, mock_client, mocker):
+        mock_client.collection_exists.return_value = True
+        mock_response = mocker.MagicMock()
+        mock_response.groups = []
+        mock_client.query_points_groups.return_value = mock_response
+
+        mock_sparse = mocker.MagicMock()
+        mock_sparse.query_embed.return_value = {"indices": [0], "values": [1.0]}
+        mocker.patch("qql.executor.SparseEmbedder", return_value=mock_sparse)
+
+        node = SearchStmt(
+            collection="articles", query_text="q", limit=5, model=None,
+            sparse_only=True, group_by="tag", group_size=2,
+        )
+        result = executor.execute(node)
+        assert "sparse" in result.message
+        assert "grouped" in result.message
+
+
+class TestSearchGroupByAdvanced:
+    """Gaps 7 & 8 — fusion and search params forwarding in grouped search."""
+
+    def test_grouped_hybrid_fusion_dbsf(self, executor, mock_client, mocker):
+        from qdrant_client.models import Fusion
+        mock_client.collection_exists.return_value = True
+        mock_response = mocker.MagicMock()
+        mock_response.groups = []
+        mock_client.query_points_groups.return_value = mock_response
+
+        mock_sparse = mocker.MagicMock()
+        mock_sparse.query_embed.return_value = {"indices": [0], "values": [1.0]}
+        mocker.patch("qql.executor.SparseEmbedder", return_value=mock_sparse)
+
+        node = SearchStmt(
+            collection="articles", query_text="q", limit=3, model=None,
+            hybrid=True, fusion="dbsf", group_by="category", group_size=2,
+        )
+        executor.execute(node)
+        kwargs = mock_client.query_points_groups.call_args.kwargs
+        fusion_query = kwargs.get("query")
+        assert fusion_query is not None
+        assert fusion_query.fusion == Fusion.DBSF
+
+    def test_grouped_search_params_with_clause_forwarded(self, executor, mock_client, mocker):
+        mock_client.collection_exists.return_value = True
+        mock_response = mocker.MagicMock()
+        mock_response.groups = []
+        mock_client.query_points_groups.return_value = mock_response
+
+        node = SearchStmt(
+            collection="articles", query_text="q", limit=5, model=None,
+            with_clause=SearchWith(exact=True), group_by="category",
+        )
+        executor.execute(node)
+        kwargs = mock_client.query_points_groups.call_args.kwargs
+        assert kwargs.get("search_params") is not None
+
+
+class TestUpdateVectorVectorShape:
+    """Gaps 12 & 13 — verify exact vector shape sent to Qdrant for named/unnamed collections."""
+
+    def test_update_vector_unnamed_collection_sends_plain_list(self, executor, mock_client):
+        from qql.ast_nodes import UpdateVectorStmt
+        mock_client.collection_exists.return_value = True
+        # Unnamed collection: get_collection returns non-dict vectors
+        mock_vectors = mocker.MagicMock() if False else type("V", (), {})()
+        info = mock_client.get_collection.return_value
+        info.config.params.vectors = [None]  # list → not a dict → unnamed
+
+        node = UpdateVectorStmt(collection="articles", point_id=1, vector=(0.1, 0.2, 0.3))
+        executor.execute(node)
+        kwargs = mock_client.update_vectors.call_args.kwargs
+        pv = kwargs["points"][0]
+        assert isinstance(pv.vector, list)
+        assert pv.vector == [0.1, 0.2, 0.3]
+
+    def test_update_vector_named_collection_sends_dict(self, executor, mock_client):
+        from qql.ast_nodes import UpdateVectorStmt
+        mock_client.collection_exists.return_value = True
+        # Named collection: get_collection returns dict vectors
+        info = mock_client.get_collection.return_value
+        info.config.params.vectors = {"dense": object(), "sparse": object()}  # dict → named
+
+        node = UpdateVectorStmt(collection="articles", point_id="id-1", vector=(0.5, 0.6))
+        executor.execute(node)
+        kwargs = mock_client.update_vectors.call_args.kwargs
+        pv = kwargs["points"][0]
+        assert isinstance(pv.vector, dict)
+        assert "dense" in pv.vector
+        assert pv.vector["dense"] == [0.5, 0.6]
+
+    def test_update_vector_exact_values_preserved(self, executor, mock_client):
+        from qql.ast_nodes import UpdateVectorStmt
+        mock_client.collection_exists.return_value = True
+        info = mock_client.get_collection.return_value
+        info.config.params.vectors = [None]  # unnamed
+
+        vec = (0.11, 0.22, 0.33, 0.44)
+        node = UpdateVectorStmt(collection="articles", point_id=99, vector=vec)
+        executor.execute(node)
+        kwargs = mock_client.update_vectors.call_args.kwargs
+        assert kwargs["points"][0].vector == list(vec)
+
+
+class TestUpdatePayloadMessages:
+    """Gaps 17 — assert specific message text for both update-payload branches."""
+
+    def test_filter_based_update_message_contains_filter_based(self, executor, mock_client):
+        from qql.ast_nodes import UpdatePayloadStmt, CompareExpr
+        mock_client.collection_exists.return_value = True
+        node = UpdatePayloadStmt(
+            collection="articles",
+            payload={"status": "done"},
+            query_filter=CompareExpr(field="year", op="<", value=2020),
+        )
+        result = executor.execute(node)
+        assert "filter-based" in result.message
+
+    def test_id_based_update_message_contains_point_id(self, executor, mock_client):
+        from qql.ast_nodes import UpdatePayloadStmt
+        mock_client.collection_exists.return_value = True
+        node = UpdatePayloadStmt(
+            collection="articles", point_id="abc-999", payload={"tag": "ai"}
+        )
+        result = executor.execute(node)
+        assert "abc-999" in result.message
+
+    def test_filter_based_set_payload_receives_filter_object(self, executor, mock_client):
+        from qql.ast_nodes import UpdatePayloadStmt, CompareExpr
+        from qdrant_client.models import Filter
+        mock_client.collection_exists.return_value = True
+        node = UpdatePayloadStmt(
+            collection="articles",
+            payload={"x": 1},
+            query_filter=CompareExpr(field="cat", op="=", value="tech"),
+        )
+        executor.execute(node)
+        kwargs = mock_client.set_payload.call_args.kwargs
+        # SDK verified: PointsSelector accepts rest.Filter — must receive Filter, not a list
+        assert isinstance(kwargs["points"], Filter)
diff --git a/tests/test_parser.py b/tests/test_parser.py
@@ -1368,3 +1368,38 @@ def test_update_payload_dotted_filter_field(self):
         assert isinstance(node, UpdatePayloadStmt)
         assert node.query_filter is not None
         assert node.payload == {"reviewed": True}
+
+
+# ── PR #28 review gap fixes ───────────────────────────────────────────────────
+
+class TestSearchGroupByValidation:
+    """Parser-level validation added for PR #28 gaps 2 and 2."""
+
+    def test_group_size_zero_raises(self):
+        with pytest.raises(QQLSyntaxError, match="GROUP_SIZE must be a positive integer"):
+            parse("SEARCH articles SIMILAR TO 'q' LIMIT 5 GROUP BY category GROUP_SIZE 0")
+
+    def test_group_size_negative_raises(self):
+        with pytest.raises(QQLSyntaxError, match="GROUP_SIZE must be a positive integer"):
+            parse("SEARCH articles SIMILAR TO 'q' LIMIT 5 GROUP BY category GROUP_SIZE -1")
+
+
+class TestUpdateVectorValidation:
+    """PR #28 gap 11 — non-numeric vector elements should raise QQLSyntaxError."""
+
+    def test_non_numeric_string_element_raises(self):
+        with pytest.raises(QQLSyntaxError, match="Vector elements must be numeric"):
+            parse("UPDATE articles SET VECTOR WHERE id = 1 ['abc', 0.2, 0.3]")
+
+    def test_none_element_raises(self):
+        # null parsed as Python None → TypeError → QQLSyntaxError
+        with pytest.raises(QQLSyntaxError, match="Vector elements must be numeric"):
+            parse("UPDATE articles SET VECTOR WHERE id = 1 [null, 0.2]")
+
+
+class TestUpdateSetInvalidTargetMessage:
+    """PR #28 gap 16 — explicit error message for bad SET target."""
+
+    def test_invalid_set_target_message(self):
+        with pytest.raises(QQLSyntaxError, match="Expected VECTOR or PAYLOAD after SET"):
+            parse("UPDATE articles SET FOOBAR WHERE id = 1 [0.1]")