Skip to content

feat: enable MMR diversity for hybrid search#39

Open
srimon12 wants to merge 2 commits into
pavanjava:mainfrom
srimon12:feature/hybrid-mmr-support
Open

feat: enable MMR diversity for hybrid search#39
srimon12 wants to merge 2 commits into
pavanjava:mainfrom
srimon12:feature/hybrid-mmr-support

Conversation

@srimon12
Copy link
Copy Markdown
Collaborator

@srimon12 srimon12 commented May 21, 2026

MMR (Maximal Marginal Relevance) was artificially blocked for hybrid search in QQL, even though the Qdrant SDK supports it: Prefetch.query accepts NearestQuery, which carries an MMR field.

  • Remove the hybrid guard from _validate_search_mmr_usage
  • Wire _build_dense_query() into the dense prefetch of both flat and GROUP BY hybrid paths, so MMR params produce a NearestQuery(mmr=...) instead of a raw vector
  • Keep the sparse-only and recommend guards (MMR is a dense-space concept and RecommendInput has no MMR field in the Qdrant API)
  • Replace test_hybrid_search_with_mmr_raises with a new test that asserts NearestQuery + MMR is passed in the dense prefetch

Summary by CodeRabbit

  • New Features

    • Hybrid search now supports MMR (Maximal Marginal Relevance). Users can configure mmr_diversity and mmr_candidates for hybrid (grouped and non-grouped) searches.
  • Tests

    • Updated tests to assert MMR settings are forwarded into hybrid search execution, replacing prior tests that expected an error.

Review Change Stack

MMR (Maximal Marginal Relevance) was artificially blocked for hybrid
search in QQL, even though the Qdrant SDK supports it: Prefetch.query
accepts NearestQuery, which carries an MMR field.

- Remove the hybrid guard from _validate_search_mmr_usage
- Wire _build_dense_query() into the dense prefetch of both flat and
  GROUP BY hybrid paths, so MMR params produce a NearestQuery(mmr=...)
  instead of a raw vector
- Keep the sparse-only and recommend guards (MMR is a dense-space
  concept and RecommendInput has no MMR field in the Qdrant API)
- Replace test_hybrid_search_with_mmr_raises with a new test that
  asserts NearestQuery + MMR is passed in the dense prefetch
@srimon12 srimon12 requested a review from pavanjava May 21, 2026 07:56
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: aa7b302a-833a-4a8d-9cb9-72aaa0ee6987

📥 Commits

Reviewing files that changed from the base of the PR and between 7022578 and 0140d97.

📒 Files selected for processing (1)
  • tests/test_executor.py

📝 Walkthrough

Walkthrough

Hybrid searches now support Maximum Marginal Relevance (MMR) by updating validation logic to allow MMR configuration, applying MMR through constructed dense queries in both grouped and non-grouped paths, and verifying the behavior with updated test coverage.

Changes

Hybrid MMR Support

Layer / File(s) Summary
MMR validation for hybrid searches
src/qql/executor.py
_validate_search_mmr_usage now early-returns when the with_clause does not request MMR, removing the prior blanket hybrid-specific MMR rejection and preserving only the sparse-only MMR rejection.
Hybrid search MMR prefetch construction
src/qql/executor.py
Both non-grouped hybrid SEARCH and grouped hybrid SEARCH ... GROUP BY now pass _build_dense_query(dense_vector, node.with_clause) to the dense Prefetch, enabling MMR behavior in dense candidate retrieval instead of always passing the raw vector.
Hybrid search MMR test coverage
tests/test_executor.py
Tests replaced the old error expectation with assertions that the executor forwards MMR into the hybrid search prefetch by constructing a NearestQuery with mmr fields matching the provided SearchWith settings in both grouped and non-grouped flows.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • pavanjava

Poem

🐰 I nibble through vectors, neat and spry,
NearestQuery blooms, MMR held high,
Hybrid hops from error to grace,
Prefetch carries relevance in place,
A rabbit cheers: relevance, not shy!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: enable MMR diversity for hybrid search' accurately and concisely summarizes the main objective of the PR, which is to enable MMR (Maximal Marginal Relevance) support for hybrid search operations.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_executor.py (1)

1245-1277: ⚡ Quick win

Add grouped-hybrid MMR coverage for the second changed path.

Line 1245 validates flat hybrid prefetch MMR, but the grouped hybrid branch (Line 1636 in src/qql/executor.py) is also newly wired through _build_dense_query and should get a direct regression assertion (query_points_groups + group_by + hybrid=True + NearestQuery.mmr checks).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_executor.py` around lines 1245 - 1277, Add a new test that mirrors
test_hybrid_search_with_mmr_uses_nearest_query_in_prefetch but exercises the
grouped-hybrid path: create a SearchStmt with hybrid=True and a group_by value
(and with_clause=SearchWith(mmr_diversity=..., mmr_candidates=...)), patch
Embedder/SparseEmbedder and make mock_client.collection_exists/get_collection/
query_points_groups return a MagicMock response, call executor.execute(node),
then inspect mock_client.query_points_groups.call_args.kwargs["prefetch"] (not
query_points) to get the dense query and assert isinstance(..., NearestQuery)
and that dense_query.mmr.diversity and dense_query.mmr.candidates_limit match
the provided values; this ensures the _build_dense_query/grouped hybrid branch
is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_executor.py`:
- Around line 1245-1277: Add a new test that mirrors
test_hybrid_search_with_mmr_uses_nearest_query_in_prefetch but exercises the
grouped-hybrid path: create a SearchStmt with hybrid=True and a group_by value
(and with_clause=SearchWith(mmr_diversity=..., mmr_candidates=...)), patch
Embedder/SparseEmbedder and make mock_client.collection_exists/get_collection/
query_points_groups return a MagicMock response, call executor.execute(node),
then inspect mock_client.query_points_groups.call_args.kwargs["prefetch"] (not
query_points) to get the dense query and assert isinstance(..., NearestQuery)
and that dense_query.mmr.diversity and dense_query.mmr.candidates_limit match
the provided values; this ensures the _build_dense_query/grouped hybrid branch
is covered.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 827162fb-8be5-4b15-b436-b08655272932

📥 Commits

Reviewing files that changed from the base of the PR and between f2dda99 and 9ed565d.

📒 Files selected for processing (2)
  • src/qql/executor.py
  • tests/test_executor.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_executor.py (1)

1302-1305: ⚡ Quick win

Make dense-prefetch assertion order-independent.

This test currently assumes the dense prefetch is always at index 0. That makes it brittle to harmless ordering changes. Select the dense prefetch by using before asserting NearestQuery and MMR fields.

Suggested test adjustment
-        prefetch = mock_client.query_points_groups.call_args.kwargs["prefetch"]
-        assert prefetch is not None
-        dense_query = prefetch[0].query
+        prefetch = mock_client.query_points_groups.call_args.kwargs["prefetch"]
+        assert prefetch is not None
+        dense_prefetch = next(p for p in prefetch if p.using == "dense")
+        dense_query = dense_prefetch.query
         assert isinstance(dense_query, NearestQuery)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_executor.py` around lines 1302 - 1305, The test assumes the dense
prefetch is at index 0; change it to find the prefetch item by its using
attribute instead of relying on order: retrieve prefetch =
mock_client.query_points_groups.call_args.kwargs["prefetch"], then locate the
item where item.using == "dense" (or the actual using string used in the diff)
and assign that to dense_prefetch; assert isinstance(dense_prefetch.query,
NearestQuery) and check the MMR-related fields on dense_prefetch/query
accordingly so the assertions are order-independent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_executor.py`:
- Around line 1302-1305: The test assumes the dense prefetch is at index 0;
change it to find the prefetch item by its using attribute instead of relying on
order: retrieve prefetch =
mock_client.query_points_groups.call_args.kwargs["prefetch"], then locate the
item where item.using == "dense" (or the actual using string used in the diff)
and assign that to dense_prefetch; assert isinstance(dense_prefetch.query,
NearestQuery) and check the MMR-related fields on dense_prefetch/query
accordingly so the assertions are order-independent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c40dec0e-90aa-495b-bae2-163151851b72

📥 Commits

Reviewing files that changed from the base of the PR and between 9ed565d and 7022578.

📒 Files selected for processing (1)
  • tests/test_executor.py

@srimon12 srimon12 force-pushed the feature/hybrid-mmr-support branch from 7022578 to 0140d97 Compare May 21, 2026 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant