feat: extend MongoDB index-check prompt with six more review patterns#21
Open
mtrunkat wants to merge 1 commit into
Open
feat: extend MongoDB index-check prompt with six more review patterns#21mtrunkat wants to merge 1 commit into
mtrunkat wants to merge 1 commit into
Conversation
Adds patterns sourced from the org's ~1,100 MongoDB inline review comments: legacy `fields` option, `$lookup` before `$match`/`$sort`/`$limit`, `$lookup` joined via `$expr`, `$group` for uniques (use `distinct`), `updateMany` on sharded collections (use `bulkWrite`), and `projection` without TS return-type narrowing. Rubric and concrete examples updated to match.
There was a problem hiding this comment.
Pull request overview
This PR updates the MongoDB index-check review prompt with additional patterns intended to catch common query/index efficiency issues in PR diffs.
Changes:
- Adds six new MongoDB review patterns covering legacy projections, aggregation ordering,
$lookup,distinct, sharded updates, and projected TypeScript return types. - Extends concrete examples and severity rubric to include the new patterns.
Comments suppressed due to low confidence (3)
mongodb-query-index-check/prompts/review.md:95
- The severity rubric repeats the over-broad version of the
updateManyrule. Since sharded writes are only broadcast when the filter cannot target by full shard key, this bullet should sayupdateManywith an incomplete shard-key filter rather than allupdateManycalls on sharded collections.
- `updateMany` on a sharded collection — broadcasts to every shard; should be `bulkWrite` with per-op shard-key filters.
mongodb-query-index-check/prompts/review.md:94
- This severity bullet also states that a plain
$exprequality join necessarily defeats the foreign index, which is not true for all supported MongoDB versions and shapes. Align it with the narrowed rule above so the prompt does not classify indexed$lookuppipeline joins as high severity solely because they use$expr.
- `$lookup` uses an `$expr` pipeline for a plain equality join, defeating the equality index on the foreign collection.
mongodb-query-index-check/prompts/review.md:78
- This example presents the
$exprform as necessarily preventing use of{ actorId: 1 }, but$lookuppipeline$exprequality comparisons can still use a suitable foreign index in supported cases. Either choose an example that actually cannot use the index (for example due to multikey/field-to-field comparison/unsupported expression), or avoid saying this shape always collscans.
- **🟠 high** — `$lookup: { from: 'actorBadges', let: { actorId: '$_id' }, pipeline: [{ $match: { $expr: { $eq: ['$actorId', '$$actorId'] } } }] }`. The `$expr` form prevents the planner from using the `{ actorId: 1 }` index on `actorBadges`. Switch to `{ from: 'actorBadges', localField: '_id', foreignField: 'actorId', as: '…' }`.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 10. **Read without `projection`.** `find()`, `findOne()`, and `findOneAnd*()` calls that omit a `projection` (either in the options object or via a chained `.project()`) return every field of every matched document — wasted IO, network bandwidth, and BSON parse time. Documents in `Users`, `Act2Runs`, etc. are easily 10–100 KB, and most callers only need a handful of fields. Flag any added/modified read that lacks a projection; suggest the smallest projection covering the fields actually consumed downstream (and just `{ _id: 1 }` for existence checks). Skip writes (`updateOne`, `deleteOne`, `bulkWrite`) and `aggregate` (which shapes via `$project` and is often a full transformation anyway). 🟡 medium. | ||
| 11. **Legacy `fields` option in place of `projection`.** Pre-3.x driver / Meteor-era syntax. It silently no-ops on the current MongoDB Node driver (the full document is returned), so the caller sees full docs while reviewers think the projection is in place. Flag any read where the options object contains a `fields:` key and suggest renaming to `projection:`. 🟢 low. | ||
| 12. **`$lookup` runs before `$match` / `$sort` / `$limit`.** `$lookup` is the most expensive aggregation stage — it iterates the foreign collection once per input doc. The right order is to narrow the input set first (`$match` on indexed fields → `$sort` → `$limit`) and only then `$lookup` on the top-N candidates. Flag pipelines where `$lookup` precedes a later `$match` / `$sort` / `$limit` that could move up. 🟠 high. | ||
| 13. **`$lookup` joined via `$expr` instead of `localField` / `foreignField`.** A `$lookup` whose nested `pipeline` uses `$expr: { $eq: ['$x', '$$x'] }` to do a plain equality join forces per-document evaluation on the foreign collection — the planner can't use an equality index, so it collscans the foreign collection once per input doc. When the join is plain equality, use the top-level `localField` / `foreignField` form so the equality index applies. 🟠 high. |
| 12. **`$lookup` runs before `$match` / `$sort` / `$limit`.** `$lookup` is the most expensive aggregation stage — it iterates the foreign collection once per input doc. The right order is to narrow the input set first (`$match` on indexed fields → `$sort` → `$limit`) and only then `$lookup` on the top-N candidates. Flag pipelines where `$lookup` precedes a later `$match` / `$sort` / `$limit` that could move up. 🟠 high. | ||
| 13. **`$lookup` joined via `$expr` instead of `localField` / `foreignField`.** A `$lookup` whose nested `pipeline` uses `$expr: { $eq: ['$x', '$$x'] }` to do a plain equality join forces per-document evaluation on the foreign collection — the planner can't use an equality index, so it collscans the foreign collection once per input doc. When the join is plain equality, use the top-level `localField` / `foreignField` form so the equality index applies. 🟠 high. | ||
| 14. **`$group: { _id: '$field' }` with no accumulator (uniques).** When the only goal is the distinct set of values, use `Collection.distinct('field', filter)` instead — it's an index-aware server command, doesn't materialise a group, and returns a plain array. Flag any added `$group` whose only field is `_id` (no `$sum`, `$push`, `$addToSet`, …) and suggest `distinct`. 🟡 medium. | ||
| 15. **`updateMany` on a sharded collection.** Without the full shard key in the filter, the shard router can't target a single `updateMany` — it broadcasts to every shard. Even with the shard key, batched writes lose atomicity vs. per-op routing. Prefer `bulkWrite` (typically `ordered: false`) so each op carries its own shard-key-complete filter and writes parallelise. Flag added/modified `updateMany` calls on collections wrapped by `ShardAwareCollection` (use the same detection as pattern #3). 🟠 high. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sourced from a curated digest of ~1,100 MongoDB-related inline review comments across the org. Picked patterns with a distinctive syntactic signal so Claude can flag them with low false-positive risk:
fields:option — silently no-ops on the current driver. 🟢 low.$lookupbefore$match/$sort/$limit— narrow the input set first. 🟠 high.$lookupjoined via$expr— defeats equality index on the foreign collection. 🟠 high.$group: { _id: '$field' }for uniques — useCollection.distinct(). 🟡 medium.updateManyon sharded collection — broadcasts to every shard; usebulkWrite. 🟠 high.projectionwithout TS narrowing — un-projected fields areundefinedat runtime. 🟡 medium.Severity rubric and the concrete-examples block are updated to match. No code changes — prompt-only.
Generated by Claude Code