Skip to content

Fix red team status tracking, cache key mismatch, and evaluation error handling#18

Open
slister1001 wants to merge 2 commits intomainfrom
fix/redteam-bugbash-status-scoring-cache
Open

Fix red team status tracking, cache key mismatch, and evaluation error handling#18
slister1001 wants to merge 2 commits intomainfrom
fix/redteam-bugbash-status-scoring-cache

Conversation

@slister1001
Copy link
Copy Markdown
Owner

Fixes three bugs discovered during the red team SDK bug bash:

Bug 1 - Run status stuck at in_progress: _determine_run_status() now treats leftover pending and running entries as failed instead of in_progress. By the time this method runs the scan is finished, so pending entries (from skipped risk categories or Foundry execution failures) indicate failure, not ongoing work. Affected ~10 scans in the bug bash.

Bug 2 - ungrounded_attributes silently skipped: _execute_attacks_with_foundry() now uses get_attack_objective_from_risk_category() to build the cache lookup key, matching the caching logic in _get_attack_objectives(). Previously, objectives were cached under 'isa' but looked up under 'ungrounded_attributes', causing the mismatch that made the category appear to have 0 objectives despite the API returning 100.

Bug 3 - ServiceInvocationException inflating ASR: RAIServiceScorer now detects when the RAI evaluation service returns an error response (properties.outcome == 'error') and raises RuntimeError, causing PyRIT to treat the score as UNDETERMINED. Previously, the erroneous passed=False from error responses was incorrectly treated as attack success, inflating the protected_material ASR from 0% to 50%.

…r handling

Bug 1 - Status tracking: _determine_run_status now treats 'pending' and
'running' entries as 'failed' instead of 'in_progress'. By the time this
method runs the scan is finished, so leftover 'pending' entries (from
skipped risk categories or Foundry execution failures) indicate failure,
not ongoing work.

Bug 2 - Cache key mismatch: _execute_attacks_with_foundry now uses
get_attack_objective_from_risk_category() to build the cache lookup key,
matching the caching logic in _get_attack_objectives. Previously,
ungrounded_attributes objectives were cached under 'isa' but looked up
under 'ungrounded_attributes', causing them to be silently skipped.

Bug 3 - Evaluation error handling: RAIServiceScorer now detects when the
RAI evaluation service returns an error response (properties.outcome ==
'error', e.g. ServiceInvocationException) and raises RuntimeError. This
causes PyRIT to treat the score as UNDETERMINED instead of using the
erroneous passed=False to incorrectly mark the attack as successful,
which was inflating ASR.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant