Skip to content

feat: add patchsense-crs semantic patch validator to registry#171

Open
aaronsrhodes wants to merge 1 commit into
ossf:mainfrom
aaronsrhodes:feat/add-patchsense-crs
Open

feat: add patchsense-crs semantic patch validator to registry#171
aaronsrhodes wants to merge 1 commit into
ossf:mainfrom
aaronsrhodes:feat/add-patchsense-crs

Conversation

@aaronsrhodes
Copy link
Copy Markdown

Summary

  • Adds registry/patchsense-crs.yaml — a semantic patch correctness validator that acts as a post-generation filter in an OSS-CRS ensemble
  • Updates CHANGELOG.md with the new entry

What patchsense-crs does

PatchSense addresses the 37–46% semantic error rate in AI-generated patches that pass functional tests, documented in the AIxCC SoK paper (arxiv.org/abs/2602.07666).

It runs as a bug-fixing type CRS (no vulnerability discovery — pure validation). It:

  1. Fetches patches from the exchange directory
  2. Fetches bug-candidate SARIFs to extract CWE and vulnerability context
  3. Classifies each patch as root-cause-fix or symptom-suppression using a fine-tuned Qwen 2.5 Coder 32B model + structural diff analysis
  4. Re-submits only confirmed root-cause fixes; discards symptom suppressions
  5. Emits a SARIF assessment report for each evaluated patch

Performance (66 held-out AIxCC test cases)

Metric Value
Precision (root-cause-fix) 96.0% (24/25)
False positive rate 3.0% (1/33)
Recall 72.7% (24/33)
F1 Score 0.828
Accuracy 84.8% (56/66)

Head-to-head: 96.0% precision exceeds every AIxCC finalist team except Shellphish (who achieved 100% via extreme selectivity — submitting only 11/28 found vulns).

Source repo

https://github.com/aaronsrhodes/patchsense-crs

Includes: oss-crs/crs.yaml, Dockerfiles (base/builder/validator), validator.py, sarif_parser.py, 24 passing tests, example compose config, LiteLLM routing config for local model.

Supported targets

  • Language: c, c++, jvm
  • Sanitizer: address, undefined
  • Mode: full, delta
  • Architecture: x86_64

Test plan

🤖 Generated with Claude Code

PatchSense is a post-generation patch correctness filter. It receives
patches from the OSS-CRS exchange directory, classifies each as
root-cause-fix vs. symptom-suppression using a fine-tuned Qwen 2.5
Coder 32B model, and re-submits only confirmed root-cause fixes.

Performance on 66 held-out AIxCC test cases:
- Precision (root-cause-fix): 96.0% (24/25) — exceeds 95% target
- False positive rate: 3.0% (1/33)
- F1 score: 0.828

Addresses the documented 37–46% semantic error rate in AI-generated
patches that pass functional tests (SoK paper arxiv.org/abs/2602.07666).

Supported: c, c++, jvm | address + undefined sanitizers | full + delta mode
Signed-off-by: Aaron Rhodes <aaronr@jfrog.com>
@azchin
Copy link
Copy Markdown
Collaborator

azchin commented Apr 20, 2026

Hello, thanks for the contributed CRS!

May I ask for some example invocations of your CRS, along with the input data (e.g. I believe your CRS accepts SARIF reports)? I would like to familiarize myself with your CRS's workflow and use case since it seems outside the existing scope of strictly bug-finding and bug-fixing. If it's appropriate, I'd like to make OSS-CRS more flexible with the types of CRSs we can run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants