-
Notifications
You must be signed in to change notification settings - Fork 0
Elim dups and improve output #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
lzehrung
wants to merge
15
commits into
main
Choose a base branch
from
elim-dups
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
4588086
Plan duplicate elimination work
lzehrung 44f740b
Share chunk token and range helpers
lzehrung 9280694
Share compact symbol projection
lzehrung 6255cb5
Share symbol render model
lzehrung 89b7a35
Consolidate dependency tool wrappers
lzehrung 2dc93db
Share CSS-like language definition pieces
lzehrung 5b8e7ba
Share smaller CLI and query helpers
lzehrung 8811200
Share duplicate test helpers
lzehrung b030468
Refine duplicate grouping and remaining test helpers
lzehrung 7f2fcbe
Mark duplicate refactor plan complete
lzehrung 918c660
Use static canned query SQL
lzehrung 6b55db6
Surface duplicate opportunities in inspect
lzehrung 47a5098
Address PR duplicate cleanup review comments
lzehrung f05a222
Address follow-up PR review comments
lzehrung 3470d05
Fix omitted variant count for coalesced duplicate groups
claude File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
184 changes: 184 additions & 0 deletions
184
docs/superpowers/plans/2026-05-23-eliminate-duplicate-findings.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,184 @@ | ||
| # Duplicate Findings Refactor Plan | ||
|
|
||
| ## Baseline Scan | ||
|
|
||
| Generated from the repo-local CLI on branch `elim-dups` after rebuilding `dist`. | ||
|
|
||
| Commands: | ||
|
|
||
| ```bash | ||
| node ./dist/cli.js duplicates --root . ./src --min-confidence medium --limit 100 --include-same-file | ||
| node ./dist/cli.js duplicates --root . . --min-confidence high --limit 200 --include-same-file | ||
| ``` | ||
|
|
||
| Results: | ||
|
|
||
| - `src` scan: 100 returned groups, 1234 omitted groups, 1669 omitted raw suggestions. | ||
| - whole-repo scan: 200 returned groups, 2097 omitted groups, 2995 omitted raw suggestions. | ||
| - Whole-repo top results are dominated by test helpers and setup snippets. | ||
| - Product-code top results are concentrated in `src/graphs`, `src/cli`, `src/impact`, `src/chunking`, `src/mcp`, and language definitions. | ||
|
|
||
| Post-refactor comparison: | ||
|
|
||
| - `src` scan: 100 returned groups, 869 omitted groups, 1611 omitted raw suggestions. | ||
| - whole-repo scan: 200 returned groups, 899 omitted groups, 2904 omitted raw suggestions. | ||
| - The previous `src/graphs/symbol-render.ts`, `src/cli/graph.ts`, chunk tokenizer, AST range, CSS/LESS, and first-pass test helper findings dropped out of the top product-code results. | ||
| - The analyzer no longer ranks the large C/C++ query chunk against tiny language snippets as a top medium/high finding. | ||
|
|
||
| The grouped output is usable for triage. Remaining caveats: | ||
|
|
||
| - Some chunk findings are sub-ranges of a larger duplicate and should be handled through the larger refactor. | ||
| - Some renamed findings compare very different-sized chunks; those are analyzer noise unless a human can identify a clear shared behavior. | ||
| - Repeated declarative language-definition shapes are not automatically bad duplication. | ||
|
|
||
| ## Refactor Checklist | ||
|
|
||
| ### Product Code | ||
|
|
||
| - [x] Refactor symbol graph renderers. | ||
| - Findings: `src/graphs/symbol-render.ts:84-149` and `src/graphs/symbol-render.ts:151-218`. | ||
| - Extract shared file-node, symbol-node, and graph-edge collection. | ||
| - Keep Mermaid and DOT formatting separate so escaping and syntax remain explicit. | ||
| - Add or update renderer tests if output ordering or formatting can change. | ||
|
|
||
| - [x] Refactor compact graph symbol projection. | ||
| - Findings: `src/cli/graph.ts:121-146` and `src/cli/graph.ts:169-194`. | ||
| - Extract shared file index, symbol index, symbol array, and sorted symbol edge construction. | ||
| - Keep the difference between full compact graph output and symbols-only output visible at the call site. | ||
|
|
||
| - [x] Share AST range conversion. | ||
| - Findings: `src/impact/suggestions.ts:372-385` and `src/util/ast.ts:32-41`. | ||
| - Reuse `toRange` from `src/util/ast.ts` or move the non-null conversion into a shared helper. | ||
| - Preserve the existing null-node behavior used by current callers. | ||
|
|
||
| - [x] Share default token counting. | ||
| - Findings: `src/chunking/chunkFile.ts:28-31` and `src/chunking/chunkTextFile.ts:21-24`. | ||
| - Introduce a small shared tokenizer helper in `src/chunking`. | ||
| - Keep public chunking options unchanged. | ||
|
|
||
| - [x] Consolidate dependency and reverse-dependency wrappers where useful. | ||
| - Findings: `src/agent-tools.ts:366-441`, `src/mcp/server.ts:228-246`, and `src/mcp/tools.ts:88-114`. | ||
| - Prefer a small result-mapping helper over forcing identical public response shapes. | ||
| - Verify both CLI/agent tools and MCP tools still expose `dependencies` and `reverseDependencies` separately. | ||
|
|
||
| - [x] Revisit CSS and Less language definitions. | ||
| - Findings: `src/languages/definitions/css.ts:9-17` and `src/languages/definitions/less.ts:9-17`. | ||
| - Extract shared CSS-family structure/query pieces only if it keeps each language definition readable. | ||
| - Consider whether Vue/Svelte stylesheet definitions can reuse the same helper without hiding language-specific behavior. | ||
|
|
||
| - [x] Evaluate JS fallback type duplication. | ||
| - Findings: `packages/codegraph-js-fallback/js-fallback.d.ts` and `src/jsFallback.ts`. | ||
| - Prefer generating or importing a single declaration source if package boundaries allow it. | ||
| - Leave as-is if the duplication is required to keep the fallback package self-contained. | ||
| - Decision: leave as-is because the fallback package publishes a self-contained `js-fallback.d.ts`. | ||
|
|
||
| - [x] Review smaller wrapper candidates opportunistically. | ||
| - `src/cli/artifact.ts`, `src/cli/explain.ts`, and `src/cli/search.ts` command context interfaces. | ||
| - `src/cli/projectFile.ts` and `src/session.ts` file-input resolution. | ||
| - `src/cli/options.ts` positive and non-negative integer parsers. | ||
| - `src/sqlite/canned-query.ts` direct dependencies and dependents queries. | ||
| - Decision: extracted the common agent CLI context and canned-query edge loader; left session file resolution and option parsing as-is because their existing helpers already separate concerns. | ||
|
|
||
| ### Tests | ||
|
|
||
| - [x] Add a shared temporary directory helper for tests. | ||
| - Findings: repeated `mkTmpDir` helpers across dynamic resolution, fast graph edge cases, node modules, resolution precedence, robust fast graph, TS paths workspace, cache, and parsed-cache tests. | ||
| - Put it near existing test helpers and migrate only obvious identical helpers first. | ||
|
|
||
| - [x] Add shared edge-normalization helpers for graph tests. | ||
| - Findings: `tests/fast-graph.test.ts`, `tests/monorepo-fast-graph.test.ts`, and related fast graph tests. | ||
| - Replace duplicated `normEdge`, `toKey`, and slash normalization only where it improves readability. | ||
|
|
||
| - [x] Consolidate repeated SQLite/test database setup blocks. | ||
| - Findings: repeated chunks in `tests/sqlite.test.ts`, `tests/sql-artifact-graph.test.ts`, and `tests/sql-review-context.test.ts`. | ||
| - Extract helpers that describe domain intent, not just line-for-line setup. | ||
|
|
||
| - [x] Leave intentional fixture repetition alone. | ||
| - Repeated sample snippets are often test data, not production maintenance debt. | ||
| - Do not refactor setup that would make an individual test harder to read. | ||
|
|
||
| ### Analyzer Follow-Ups | ||
|
|
||
| - [x] Consider a length-ratio guard for high-confidence renamed groups. | ||
| - Example noise: large C/C++ query chunks paired with tiny language-definition snippets. | ||
| - The detector already reports `lengthRatio`; ranking or confidence can use it more aggressively. | ||
|
|
||
| - [x] Consider collapsing adjacent same-file chunk findings under a larger group. | ||
| - Example: multiple `src/cli/graph.ts` chunk findings are one underlying helper extraction. | ||
| - Keep raw variants available through `--raw-pairs`. | ||
|
|
||
| ## Follow-Up Scan: 2026-05-23 | ||
|
|
||
| Generated after the first duplicate cleanup pass on branch `elim-dups`. | ||
|
|
||
| Commands: | ||
|
|
||
| ```bash | ||
| node ./dist/cli.js duplicates --root . ./src --min-confidence medium --limit 120 --include-same-file | ||
| node ./dist/cli.js duplicates --root . . --min-confidence high --limit 120 --include-same-file | ||
| node ./dist/cli.js inspect --root . ./src --limit 8 | ||
| node ./dist/cli.js review --base main --head HEAD --summary | ||
| node ./dist/cli.js doctor | ||
| ``` | ||
|
|
||
| Results: | ||
|
|
||
| - `src` scan: 120 returned groups, 849 omitted groups, 1611 omitted raw suggestions. | ||
| - whole-repo scan: 120 returned groups, 973 omitted groups, 2898 omitted raw suggestions. | ||
| - `doctor` reports native runtime availability and artifact health only; it does not surface duplicate cleanup opportunities. | ||
| - `review --summary` reports diff risk and candidate tests only; it does not surface duplicate cleanup opportunities. | ||
| - `inspect` reports hotspots, unresolved imports, cycles, and recommended commands only; it is the best current home for an at-a-glance duplicate opportunity section. | ||
|
|
||
| ### Product Output Follow-Ups | ||
|
|
||
| - [x] Coalesce repeated grouped duplicate findings with the same primary ranges. | ||
| - Finding: `src/indexer/imports/languageSpecific.ts:249-258 applyJavaStatementOverride` vs `src/indexer/imports/languageSpecific.ts:260-269 applyKotlinStatementOverride` appeared multiple times with the same primary pair. | ||
| - Preserve raw evidence counts through `rawPairCount` and bounded `variants`. | ||
| - Keep `--raw-pairs` as the explicit escape hatch for low-level pair inspection. | ||
|
|
||
| - [x] Surface bounded duplicate opportunities in `inspect` output. | ||
| - Include high-signal fields only: confidence, clone type, score, files/ranges, token counts, and raw pair count. | ||
| - Keep the summary bounded by `--limit`. | ||
| - Add a `duplicates` follow-up command to `recommendedCommands` so agents can drill into full grouped JSON. | ||
| - Leave `doctor` focused on package/runtime/artifact health. | ||
|
|
||
| ### Remaining Source Cleanup Candidates | ||
|
|
||
| - [x] Share C/C++ language-definition scaffolding where it stays readable. | ||
| - Findings: `src/languages/definitions/c.ts:14-147` and `src/languages/definitions/cpp.ts:14-165`. | ||
| - Preserve C-specific macro support and C++ namespace/class/alias/using/lambda behavior. | ||
|
|
||
| - [x] Share package export target selection between node-module and workspace resolution. | ||
| - Findings: `src/util/resolution/node.ts:22-34 tryResolveRelative` and `src/util/workspace.ts:298-307 pickExportTarget`. | ||
| - Prefer a small resolver helper over duplicating `exports` target precedence. | ||
|
|
||
| - [x] Share Java/Kotlin import override plumbing. | ||
| - Findings: `src/indexer/imports/languageSpecific.ts:249-258` and `src/indexer/imports/languageSpecific.ts:260-269`. | ||
| - Keep Java and Kotlin parsing differences explicit at the call site. | ||
|
|
||
| - [x] Share range line-counting for coverage suggestions. | ||
| - Findings: `src/impact/report-suggestions.ts:820-827` and `src/impact/report-suggestions.ts:829-836`. | ||
| - Preserve zero-count behavior when no coverage is available. | ||
|
|
||
| - [x] Share file-like agent handle parsing. | ||
| - Findings: `src/agent/handles.ts:54-60` and `src/agent/handles.ts:82-88`. | ||
| - Keep public handle prefixes and return types unchanged. | ||
|
|
||
| - [x] Share discovery glob relative-path matching. | ||
| - Findings: `src/cli/context.ts:25-35` and `src/util/projectFiles.ts:278-288`. | ||
| - Keep CLI include/ignore globs relative to active scan roots and config globs relative to project roots. | ||
|
|
||
| Follow-up verification: | ||
|
|
||
| - [x] `src` scan reports `0` repeated displayed primary pairs in the first 80 medium-or-higher groups. | ||
| - [x] `inspect --root . ./src --limit 5` emits compact high-confidence duplicate opportunities and a `duplicates` follow-up command. | ||
| - [x] Focused duplicate, inspect, C/C++, resolution, references, impact-suggestion, and agent-search tests pass. | ||
| - [x] Full test suite passes. | ||
|
|
||
| ## Verification Plan | ||
|
|
||
| - [x] Run `npm run build`. | ||
| - [x] Run `npm run lint`. | ||
| - [x] Run focused tests for touched areas. | ||
| - [x] Run `npm test` before pushing a completed refactor batch. | ||
| - [x] Re-run the duplicate analyzer and compare top findings against this baseline. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.