Fix red team status tracking, cache key mismatch, and evaluation error handling by slister1001 · Pull Request #18 · slister1001/azure-sdk-for-python

slister1001 · 2026-03-04T21:07:13Z

Fixes three bugs discovered during the red team SDK bug bash:

Bug 1 - Run status stuck at in_progress: _determine_run_status() now treats leftover pending and running entries as failed instead of in_progress. By the time this method runs the scan is finished, so pending entries (from skipped risk categories or Foundry execution failures) indicate failure, not ongoing work. Affected ~10 scans in the bug bash.

Bug 2 - ungrounded_attributes silently skipped: _execute_attacks_with_foundry() now uses get_attack_objective_from_risk_category() to build the cache lookup key, matching the caching logic in _get_attack_objectives(). Previously, objectives were cached under 'isa' but looked up under 'ungrounded_attributes', causing the mismatch that made the category appear to have 0 objectives despite the API returning 100.

Bug 3 - ServiceInvocationException inflating ASR: RAIServiceScorer now detects when the RAI evaluation service returns an error response (properties.outcome == 'error') and raises RuntimeError, causing PyRIT to treat the score as UNDETERMINED. Previously, the erroneous passed=False from error responses was incorrectly treated as attack success, inflating the protected_material ASR from 0% to 50%.

…r handling Bug 1 - Status tracking: _determine_run_status now treats 'pending' and 'running' entries as 'failed' instead of 'in_progress'. By the time this method runs the scan is finished, so leftover 'pending' entries (from skipped risk categories or Foundry execution failures) indicate failure, not ongoing work. Bug 2 - Cache key mismatch: _execute_attacks_with_foundry now uses get_attack_objective_from_risk_category() to build the cache lookup key, matching the caching logic in _get_attack_objectives. Previously, ungrounded_attributes objectives were cached under 'isa' but looked up under 'ungrounded_attributes', causing them to be silently skipped. Bug 3 - Evaluation error handling: RAIServiceScorer now detects when the RAI evaluation service returns an error response (properties.outcome == 'error', e.g. ServiceInvocationException) and raises RuntimeError. This causes PyRIT to treat the score as UNDETERMINED instead of using the erroneous passed=False to incorrectly mark the attack as successful, which was inflating ASR. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot added the Evaluation label Mar 4, 2026

Add changelog entries for status tracking, cache key, and scoring fixes

3016c92

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix red team status tracking, cache key mismatch, and evaluation error handling#18

Fix red team status tracking, cache key mismatch, and evaluation error handling#18
slister1001 wants to merge 2 commits intomainfrom
fix/redteam-bugbash-status-scoring-cache

slister1001 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

slister1001 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant