Exploit Pattern RAG

SCOUT's Exploit Pattern RAG is the AEG-first knowledge layer for curated exploit pattern reuse, not a raw public-PoC retrieval system.

SCOUT does not retrieve raw public PoCs for copy-based exploitation. It can use public PoC metadata as a high-value seed to derive normalized exploit patterns, then adapts those patterns against evidence recovered from the target firmware.

Current architecture

Curated, retrievable cards live under:

data/exploit_references/patterns/<pattern-id>/
  exploit.json
  pattern.md
  poc_sample.py

The runtime loader/retriever/contamination guard lives in src/aiedge/exploit_rag/. exploit_autopoc consumes the package and injects only the top-ranked pattern context into the lab-only PoC prompt.

The current public corpus is intentionally small and curated. Candidate ingestion expands the upstream pool, but AutoPoC still retrieves only promoted pattern cards.

Each promoted card may also carry validation_evidence entries. These record whether a pattern has vulnerable/control pair evidence (synthetic_pair or real_firmware_pair) and keep SCOUT from treating metadata-only pattern reuse as an AEG platform proof.

PoC-in-GitHub seed ingestion

PoC-in-GitHub is valuable because it maps CVEs to public proof-of-concept repositories at scale. SCOUT uses it as an upstream metadata source:

PoC-in-GitHub CVE JSON
  -> unreviewed candidate seed
  -> draft pattern card (`scripts/draft_exploit_pattern_card.py`)
  -> human reviewer / curated extractor
  -> normalized retrievable exploit pattern card
  -> AutoPoC retrieval

The importer deliberately does not clone repositories, execute PoC code, or make raw PoC source retrievable. Candidate JSON can be converted into a non-retrievable draft card, but a human reviewer must promote it into a curated pattern card before AutoPoC can use it.

Seed firmware-relevant candidates with:

# Use the curated firmware/network-appliance CVE seed list.
python scripts/import_poc_in_github_candidates.py --dry-run

# Import one explicit CVE into data/exploit_references/candidates/poc_in_github/.
python scripts/import_poc_in_github_candidates.py --cve CVE-2024-1781

Default seed list:

data/exploit_references/firmware_seed_cves.json

Candidate output:

data/exploit_references/candidates/poc_in_github/cve-*.json

Draft a review artifact from a candidate:

python scripts/draft_exploit_pattern_card.py data/exploit_references/candidates/poc_in_github/cve-2024-1781.json

Draft output:

data/exploit_references/drafts/<pattern-id>/
  exploit.json   # promotion.status=draft_requires_human_review
  pattern.md     # reviewer checklist, no raw PoC source

Promotion contract

A public PoC candidate can become a retrievable SCOUT AEG pattern only after the reviewer extracts target-independent structure:

family, entry channel, bridge channel, trigger model, and sink
source-to-sink reasoning and preconditions
non-destructive verification tactics
preconditions, adaptation rules, and forbidden reuse constraints

Do not promote target-specific endpoints, credentials, target hosts, payload literals, or vendor-specific magic constants as reusable tactics.

Check the current evidence state with:

python scripts/check_exploit_pattern_evidence.py

Use stricter release checks when appropriate:

# Require every curated card to have vulnerable/control pair evidence.
python scripts/check_exploit_pattern_evidence.py --require-all

# Require at least one real firmware known-vulnerable/patched pattern.
python scripts/check_exploit_pattern_evidence.py --require-real-firmware-pair

Record new pair evidence only after both sides have completed SCOUT run directories:

# Dry-run: validate the known-vulnerable run passes and the patched/control run fails closed.
python scripts/record_pattern_pair_evidence.py cgi_param_cmd_injection \
  --kind real_firmware_pair \
  --vulnerable-run-dir aiedge-runs/<known-vulnerable-run> \
  --control-run-dir aiedge-runs/<patched-control-run> \
  --artifact docs/pov/<stable-pair-evidence>.json \
  --vulnerable-firmware-sha256 <sha256> \
  --control-firmware-sha256 <sha256> \
  --cve CVE-YYYY-NNNN

# Apply only after the dry-run evidence JSON is reviewed.
python scripts/record_pattern_pair_evidence.py cgi_param_cmd_injection \
  --kind real_firmware_pair \
  --vulnerable-run-dir aiedge-runs/<known-vulnerable-run> \
  --control-run-dir aiedge-runs/<patched-control-run> \
  --evidence-id <stable-pair-id> \
  --artifact docs/pov/<stable-pair-evidence>.json \
  --vulnerable-firmware-sha256 <sha256> \
  --control-firmware-sha256 <sha256> \
  --cve CVE-YYYY-NNNN \
  --apply

The recorder refuses to count missing control artifacts as evidence and also rejects controls that fail only an FPR/non-dynamic check. At least one dynamic proof check (autopoc_runner_pass, poc_validation_reproducible, or verified_chain_pass) must fail on the patched/control side. For real_firmware_pair, it additionally requires a stable artifact reference, both firmware SHA-256 values, and either a CVE or target-family label.

As of this update, the original generic cards retain synthetic vulnerable/control pair evidence through scripts/run_aeg_synthetic_pair.py, and netgear_passwordrecovered_auth_bypass carries the first real known-vulnerable/patched firmware pair evidence for Netgear R7000 CVE-2017-5521. Release-level AEG claims should continue to cite the stable pair artifact in docs/pov/netgear-r7000-cve-2017-5521_real_pair.json and rerun python scripts/check_exploit_pattern_evidence.py --require-real-firmware-pair.

For platform-level readiness, run the integrated fail-closed audit instead of checking card counters alone:

./scout aeg-readiness --out docs/pov/aeg_platform_readiness.json

That audit ties the pattern-card aggregate to the stable real-firmware pair report, checks SHA/pattern-family binding, and verifies the vulnerable/pass vs patched/dynamic-fail-closed separation captured by the committed evidence.

E2E validation before platform claims

Pattern-card and retriever tests are necessary but insufficient. A SCOUT AEG claim requires a completed lab run that passes the dynamic/FP gate in docs/aeg_e2e_validation.md: AutoPoC runner pass, reproducible poc_validation, verified_chain isolation, run-level FPR ceiling, and no high/critical FP verdict for the AEG finding.

Safety boundary

Allowed:

fetch PoC-in-GitHub JSON metadata
record candidate repo metadata and CVE context
derive target-independent exploit structure during a separate curation step

Forbidden in the SCOUT ingestion/retrieval path:

cloning public PoC repositories automatically
executing public PoC code
placing raw PoC source in the AutoPoC prompt
copying reference endpoints, credentials, payload literals, or target hosts into generated PoCs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploit Pattern RAG

Current architecture

PoC-in-GitHub seed ingestion

Promotion contract

E2E validation before platform claims

Safety boundary

FilesExpand file tree

exploit-pattern-rag.md

Latest commit

History

exploit-pattern-rag.md

File metadata and controls

Exploit Pattern RAG

Current architecture

PoC-in-GitHub seed ingestion

Promotion contract

E2E validation before platform claims

Safety boundary