Skip to content

Multi-SWE-Bench: Evaluation Changes#47

Draft
Vladislav0Art wants to merge 67 commits into
mainfrom
vartiukhov/evaluation
Draft

Multi-SWE-Bench: Evaluation Changes#47
Vladislav0Art wants to merge 67 commits into
mainfrom
vartiukhov/evaluation

Conversation

@Vladislav0Art
Copy link
Copy Markdown
Collaborator

No description provided.

dragoi75 and others added 2 commits April 16, 2026 23:00
…unused import optimization)

Left optimizations:
1. Import of a class from the same package.
2. Unused import.

Both imports above are removed when references are updated in a file that uses a renamed class/method.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 16, 2026

Qodana Community for JVM

19 new problems were found

Inspection name Severity Problems
Unresolved reference in KDoc 🔶 Warning 6
Unused import directive 🔶 Warning 4
Inconsistent comment for Java parameter 🔶 Warning 3
Unused symbol 🔶 Warning 2
Unnecessary type argument ◽️ Notice 2
'when' that can be simplified by introducing an argument ◽️ Notice 1
Class member can have 'private' visibility ◽️ Notice 1

💡 Qodana analysis was run in the pull request mode: only the changed files were checked

View the detailed Qodana report

To be able to view the detailed Qodana report, you can either:

To get *.log files or any other Qodana artifacts, run the action with upload-result option set to true,
so that the action will upload the files as the job artifacts:

      - name: 'Qodana Scan'
        uses: JetBrains/qodana-action@v2025.1.1
        with:
          upload-result: true
Contact Qodana team

Contact us at qodana-support@jetbrains.com

Vladislav0Art and others added 27 commits April 16, 2026 23:42
…leFilters` method in `RenameVariableTransformation`
…and CLI support via `RewriteProblemStatementStarter`
… and integrate with CLI starters

- Introduced `BenchmarkInstanceIO` for JSON parsing and transformation of benchmark records.
- Updated `RewriteProblemStatementStarter` and `TransformTextsStarter` to process `{title, body}` pairs and `resolved_issues`.
- Added `TextBlock` data class to support coherent updates across textual fields.
- Enhanced Gradle tasks `rewriteProblemStatement` and `transformMetamorphicTexts` with improved input/output handling.
…n-deterministic behavior during project close and renaming operations
… prevent inconsistent state during subsequent operations
… inconsistent behavior across transformations
…eordering

- Skip non-physical, compiled, and anonymous classes to avoid unintended modifications.
- Fix method filtering to exclude non-physical and compiled methods.
- Pre-validate method copyability to prevent partial class modifications during reordering.
- Add error handling for unexpected exceptions to improve reliability.
In multi-module projects with same-simple-name classes (e.g. fastjson
v1-compat `com.alibaba.fastjson.JSON` alongside v2
`com.alibaba.fastjson2.JSON`), MethodReferencesSearch's strict signature
search drops call sites whose overload PSI cannot unambiguously bind.
RenameProcessor.findUsages() never sees them, so they survive the rename
with the old method name and break compilation (e.g. fastjson2 PR-82:
`JSONReader.java:922` and `TypeUtils.java:187` left as
`JsonMapper.toJSONString(...)` after the family rename).

Add a post-rename safety net inside `tryRenameMethodFamily`: walk every
Java file in project scope and patch call sites whose `referenceName`
matches the old name AND that either resolve into the family or that
PSI failed to resolve while their qualifier still points at the family's
containing class. Sites PSI resolves to a different method are left
alone. Logged via `Post-rename safety net: patched N missed call
site(s)` so healthy runs report N=0.
Issue 1 (diagnostics): the post-rename safety net's reported patch
counts were surprisingly large (e.g. 681 for `parseObject`, 529 for
`toJSONString`). The numbers are real misses by `MethodReferencesSearch`
strict-signature search, not over-matching — but the log made it
hard to verify. Now log the count split into `resolved-to-family` vs
`qualifier-fallback` buckets and print up to 10 sample
`path:line (branch)` sites so the user can spot-check.

Issue 2 (wildcard import stripping): IntelliJ's `RenameProcessor`
invokes `JavaCodeStyleManager.shortenClassReferences()` on every file
whose references it rewrites, which can also strip unrelated
`import static X.*;` lines on those files (observed: rename of
`JSON` → `JsonCodec` removed `import static junit.framework.TestCase.*;`
from test files, breaking `assertNull(...)` and failing test compile).
There is no documented IntelliJ toggle (`IDEABKL-3561`) and the
existing code-style settings only prevent CREATION of new wildcards.

Add `WildcardImportExpander`: a one-shot project-wide pass that runs
before any transformation. For each Java file in project scope, replace
`import static X.*;` and `import pkg.*;` with explicit single imports
for the symbols actually used in the file. Each remaining import then
points at a name PSI sees as referenced, so the optimizer cannot drop
it. Wired into `TransformationService.applyTransformations` at the
top.
Prevent accidental modifications by adding a check to exclude method reference identifiers located within annotation arguments during post-rename patching.
The first version used `(resolved as PsiMember).containingClass == targetClass`
to attribute references to wildcards. Wrong because `import static X.*;`
inherits — `TestCase` exposes `assertNull` declared on its super `Assert`,
and PSI's resolver returns the declaring class. Two failure modes on
fastjson2 PR-82:

- Empty replacements: `Issue1344.java` only uses `assertNull(String)`. The
  resolver returned a member on `junit.framework.Assert`, equality rejected
  it, the wildcard was deleted with no single-import replacement.
- Multi-wildcard drops: files with both `org.junit.jupiter.api.Assertions.*`
  and `junit.framework.TestCase.*` lost names like `assertEquals(int,int)`
  from one of the two expansions because the resolver picked one origin.

Replace the equality check with a positive query against the target class's
visible (inherited) members via `findMethodsByName(checkBases = true)` /
`findFieldByName(checkBases = true)` / `findInnerClassByName(checkBases =
true)`. Walk the file once, split unqualified refs into resolved /
unresolved name sets, then per wildcard intersect with what the target
class exposes. The same name can land in multiple wildcards' expansions —
correct, since both originals exposed it.

Conservative keep: if a wildcard's `usedNames` is empty AND any unresolved
reference in the file matches a name the target class would expose, leave
the wildcard untouched. Better to retain a working wildcard than delete a
load-bearing one. Stats now report `expanded N; kept M as conservative`.
The previous version captured `List<PsiImportStaticStatement>` once, then
iterated and rewrote each in its own `WriteCommandAction`. After the first
WriteCommandAction (`importList.add` × N + `wildcard.delete()`), the OTHER
captured `PsiImportStaticStatement` siblings became invalidated. The next
iteration's `wildcard.resolveTargetClass()` then threw
`PsiInvalidElementAccessException: containing file is null` and the entire
transformation pipeline aborted before even creating the memory file —
observed on fastjson2 PR-82 right after the
"[TransformationService] Pre-expanding wildcard imports project-wide..."
log line.

Structural fix — never hold PSI element references across mutations:

- The captured plan now stores `staticTargetFqns: List<String>` and
  `regularPackageFqns: List<String>` instead of element references.
- Inside each rewrite's `WriteCommandAction` the wildcard is re-located by
  scanning the live `importList.importStaticStatements` /
  `importStatements` and matching by `importReference.qualifiedName` +
  `isOnDemand` + `isValid`. If the wildcard is gone, we silently skip.
- The target `PsiClass` is re-resolved fresh inside the WriteCommandAction
  via `JavaPsiFacade.findClass(fqn, allScope)`.

Defense in depth — the expander is best-effort and never aborts the
pipeline:

- New `safeReadAction(fallback) { ... }` helper wraps every read action,
  rethrowing only `ProcessCanceledException` / `InterruptedException`.
- `expandAll` and `expandInFile` wrap per-file / per-wildcard work in
  `try/catch (Throwable)`, logging WARN with file path + FQN and bumping
  `filesFailed` / `wildcardsFailed` counters.
- The visitor blocks in `summarizeStaticRefs` and `collectClassUses` now
  swallow per-node throwables.
- `Stats` extended with `filesFailed` and `wildcardsFailed`; final log
  line: "Pre-processed N files; expanded M; kept K conservative;
  failed F file(s) / W wildcard(s)".
- Introduced `TimeoutException` handling on move operations with a 3-minute limit.
- Added logging to warn about timed-out suggestions before proceeding to the next.
…leTransformation` to avoid additional modifications in base+test_patch run
…ansformation` with exponential backoff and robustness enhancements
…by explicitly attaching transitive overriders to the rename processor

`RenameProcessor`'s implicit override expansion via
`RenameJavaMethodProcessor.prepareRenaming` + `OverridingMethodsSearch`
silently dropped sibling overloads' overriders when multiple same-name
overloads were renamed to the same target name through a single
processor — only the seed overload's overrider got renamed, the others
kept the old name. The post-rename safety net `verifyAndPatchMissedCallSites`
only inspected `PsiMethodCallExpression` nodes, never `PsiMethod`
declarations, so missed override definitions were invisible to it.

Reproducer: an abstract base `A` with `write(JsonValue)` + `write(String)`
and `B extends A` overriding both. After renaming `A.write` to
`A.writeTo`, `B.write(JsonValue)` followed but `B.write(String)` was
left dangling with the old name.

In `tryRenameMethodFamily`, after the existing overload-sibling
`addElement` loop, enumerate transitive overriders via
`OverridingMethodsSearch.search(method, checkDeep = true)` for every
family method and attach each (skipping family members already added
and overriders in libraries) as a first-class rename target. Adding
them explicitly forces the same rename path that already works for the
seed; when implicit expansion would have caught them anyway, it is a
no-op via `myAllRenames` dedup, so the change is backward compatible.
…sformation` with exponential backoff

Mirror the robust LLM call path already used in
`RenameVariableTransformation`: each file now issues a single batched
LLM request listing every overload family (chunked at LLM_BATCH_SIZE=20
to keep prompt size manageable since each entry embeds a method body),
with up to LLM_MAX_ATTEMPTS=3 retries per batch and exponential backoff
(4s → 8s, capped at 12s) on transient failures.
`ProcessCanceledException` is rethrown per IntelliJ contract; permanent
failures return an empty list so affected families are skipped (no
rename, no memory write) without failing the whole transformation, and
other batches and other files keep progressing.

Wire format: replace `MethodNameSuggestions(suggestions)` with
`MethodFamilyRenaming(familyKey, suggestions)` +
`MethodRenameSuggestions(renamings)`. Each overload family carries a
precomputed `familyKey` of the form
`<classFqn>#<methodName>[<static|instance>]` so the model can echo it
back per entry — the `[static]/[instance]` tag prevents same-name
static/instance siblings in the same class from colliding in the batch.
`generateRenamesForFamilies` now matches results back via
`familyKey == family.familyKey` and pipes them through the unchanged
`buildSuggestionList` helper.

`extractRenamesFromMemoryForFamilies`, `tryRenameMethodFamily`, the
overrider-attachment / verify-and-patch / rich-logging code, and the
memory key format are untouched.
…oting discovery reads to smart-mode

`RenameProcessor.run()` + `commitAllDocuments()` + `saveAllDocuments()`
drops the IDE back into dumb mode while the stub index is recomputed,
so the next file's transformation can land mid-reindex. Any read action
that resolves a super-class hierarchy then hits the stub index and
throws. Reproduced on Multi-SWE-Bench:

- `RenameClassTransformation` — `psiClass.allFields` walks `MemberCache`
  → `getSupers()` → `findSpecialSuperClass()` → `JavaFullClassNameIndex`.
- `RenameMethodTransformation` — `method.findSuperMethods()` builds a
  hierarchical signature → `getSuperTypes()` →
  `findClass(java.lang.Object)`.

Per the `IndexNotReadyException` Javadoc, promote the topmost read
action to smart mode. Add a `withSmartReadAction(project) { ... }`
companion helper on `IntelliJAwareTransformation` (mirrors
`withReadAction` but uses suspending `smartReadAction(project)`, so the
block waits for index readiness before running) and use it at the
discovery sites:

- `RenameClassTransformation.apply` — the `findAllValidClasses(...)`
  wrap (also covers the per-class `ReferencesSearch.search(cls)` and
  `fileIndex.isInTestSourceContent(...)` calls inside).
- `RenameClassTransformation.generateNewClassNames` — the PSI-context
  extraction (covers `psiClass.allFields` — the exact line that threw).
- `RenameMethodTransformation.apply` — the
  `findAllValidMethodFamilies(...)` wrap (covers `findSuperMethods()`,
  `psiClass.supers`, and `ReferencesSearch.search(method)` inside the
  override-filter walk).

`tryRenameClassAndUsages` and `tryRenameMethodFamily` keep using plain
`withReadAction { ... }` — they run after the discovery walk (the
natural index-settling point) and adding `runBlocking` inside their
existing `invokeAndWait` envelopes risks deadlocks. Cancellation
contract preserved: `smartReadAction` honours `ProcessCanceledException`
propagation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants