[BREAKING] FEAT add TAP to content harms scenario#1378
[BREAKING] FEAT add TAP to content harms scenario#1378hannahwestra25 wants to merge 29 commits intoAzure:mainfrom
Conversation
…dd_tap_to_content_harms
…dd_tap_to_content_harms
…dd_tap_to_content_harms
…dd_tap_to_content_harms
There was a problem hiding this comment.
Pull request overview
This PR adds Tree of Attacks with Pruning (TAP) to the content harms scenario, aligns scenario naming by renaming classes from PsychosocialScenario→Psychosocial and LeakageScenario→Leakage, converts PsychosocialStrategy and LeakageStrategy members from UPPER_CASE to PascalCase, removes SINGLE_TURN/MULTI_TURN as tags from the psychosocial strategy, and provides deprecated backward-compatibility aliases. The PR also updates the notebook outputs to reflect new scenario names, fixes a documentation bug with broken images, and removes PsychosocialScenario and LeakageScenario as primary class names (making them deprecated aliases only).
Changes:
- Add
TreeOfAttacksWithPruningAttack(TAP) toContentHarmsalongsideManyShotJailbreakAttackin a new_get_multi_turn_attacksmethod, replacing the deprecatedMultiPromptSendingAttack - Rename scenario classes (
LeakageScenario→Leakage,PsychosocialScenario→Psychosocial) and strategy enum members to PascalCase, with deprecated aliases for backward compatibility Scenario.__init__now derivesnamefromtype(self).__name__when not explicitly provided, removing the requirement for subclasses to passname=
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
pyrit/scenario/scenarios/airt/content_harms.py |
Adds TAP attack; refactors _get_strategy_attacks into _get_single_turn_attacks + _get_multi_turn_attacks; deprecates objectives_by_harm |
pyrit/scenario/scenarios/airt/psychosocial.py |
Renames class to Psychosocial, renames strategy enum members to PascalCase, removes SINGLE_TURN/MULTI_TURN, adds deprecated alias |
pyrit/scenario/scenarios/airt/leakage.py |
Renames class to Leakage, renames strategy enum members to PascalCase, adds deprecated alias |
pyrit/scenario/core/scenario_strategy.py |
Adds _DeprecatedEnumMeta metaclass for deprecated member redirect support |
pyrit/scenario/core/scenario.py |
Makes name parameter optional, defaulting to class name |
pyrit/scenario/core/atomic_attack.py |
Removes duplicate self._objective_scorer = objective_scorer assignment |
pyrit/scenario/scenarios/airt/__init__.py |
Exports new Leakage and Psychosocial classes |
pyrit/scenario/scenarios/airt/scam.py |
Removes explicit name= from super().__init__() call |
pyrit/scenario/scenarios/airt/cyber.py |
Removes explicit name= from super().__init__() call |
pyrit/scenario/scenarios/airt/jailbreak.py |
Removes explicit name= from super().__init__() call |
pyrit/scenario/scenarios/foundry/red_team_agent.py |
Removes explicit name= from super().__init__() call |
pyrit/scenario/scenarios/garak/encoding.py |
Removes explicit name= from super().__init__() call |
tests/unit/scenarios/test_content_harms.py |
Adds tests for new _get_single_turn_attacks, _get_multi_turn_attacks, and _get_strategy_attacks |
tests/unit/scenarios/test_psychosocial_harms.py |
Updates tests to use Psychosocial and PascalCase strategy enum members |
tests/unit/scenarios/test_leakage_scenario.py |
Updates tests to use Leakage and PascalCase strategy enum members |
doc/code/scenarios/0_scenarios.ipynb |
Updates notebook output to show new scenario names/counts |
Makefile |
Adds cp -r assets doc/_build/assets to fix broken website images |
tests/integration/targets/test_notebooks_targets.py |
Removes skipped file entry for 4_non_llm_targets.ipynb |
Comments suppressed due to low confidence (3)
pyrit/scenario/scenarios/airt/psychosocial.py:437
- The error message in
_get_atomic_attacks_asyncreferences the old class namePsychosocialHarmsScenario, but the class has been renamed toPsychosocial. This stale name could confuse users trying to diagnose the error.
pyrit/scenario/scenarios/airt/leakage.py:100 - The
Leakageclass docstring at line 99 still references the old class nameLeakageScenario: "The LeakageScenario class contains different attack variations...". This should be updated to say "The Leakage class" since the class has been renamed.
pyrit/scenario/scenarios/airt/psychosocial.py:90 - The
PsychosocialStrategyclass docstring still references tagssingle_turnandmulti_turn(lines 85-87) which no longer exist on any strategy member, and also still uses the old class namePsychosocialHarmsStrategyin the docstring description (line 80). Additionally, theget_strategy_class()docstring at line 185 also refers to the old namePsychosocialHarmsStrategy. These references should be updated to reflect the new naming and strategy structure.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 19 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (3)
pyrit/scenario/scenarios/airt/psychosocial.py:90
- The
PsychosocialStrategyclass docstring (lines 84-87) still referencessingle_turn,multi_turntags and the old behavior where tags correspond to different attack strategies. These members (SINGLE_TURN,MULTI_TURN) have been removed and the newImminentCrisis/LicensedTherapistmembers have empty tag sets. The docstring is now inaccurate and should be updated to describe the new strategy structure.
pyrit/scenario/scenarios/airt/psychosocial.py:437 - The error message in
_get_atomic_attacks_asyncat line 436 still references the old class namePsychosocialHarmsScenario, but the class has been renamed toPsychosocial. The error message should be updated to sayPsychosocialinstead.
pyrit/scenario/scenarios/airt/leakage.py:97 - The
Leakageclass docstring (visible in the context just below the added class definition) still refers to "The LeakageScenario class" instead of "The Leakage class". This is a stale reference to the old class name that should be updated to match the renamed class.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…dd_tap_to_content_harms
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (2)
pyrit/scenario/scenarios/airt/psychosocial.py:110
- When
PsychosocialStrategy.ALLis used, it expands to bothImminentCrisisandLicensedTherapist, creating two separate entries in_scenario_composites. However,_extract_harm_category_filter()(called from_resolve_seed_groups) returns only the first matching harm category filter it encounters (e.g.,"imminent_crisis"), meaningLicensedTherapistseeds are never processed. As a result, running the scenario withALLis functionally equivalent to running it withImminentCrisisonly — theLicensedTherapistsubharm is silently skipped. The method should be updated to collect all distinct harm categories and generate attacks for each.
pyrit/scenario/scenarios/airt/psychosocial.py:96 - In
PsychosocialStrategy, the enum membersImminentCrisisandLicensedTherapistare defined usingset[str]()to create empty tag sets. While functionally equivalent toset(), this is unusual syntax and could be confusing. The standard way to pass an empty set is simplyset()which is also how the ALL strategy tag logic processes it. Consider usingset()for clarity.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (3)
pyrit/scenario/scenarios/airt/psychosocial.py:96
- The
PsychosocialStrategydocstring still references the oldsingle_turnandmulti_turntags (lines 84–87) which no longer exist as enum members or tags onImminentCrisis/LicensedTherapist. These descriptions are now misleading. The docstring should be updated to reflect the new strategy model whereImminentCrisisandLicensedTherapistare the concrete strategies, each running both single-turn and multi-turn attacks unconditionally.
pyrit/scenario/scenarios/airt/psychosocial.py:436 - The error message at this line still references the old class name
PsychosocialHarmsScenario. This class no longer exists — the correct name isPsychosocial. The error message should be updated to referencePsychosocialinstead.
pyrit/scenario/scenarios/airt/psychosocial.py:450 - When
PsychosocialStrategy.ALLis used (the default), it expands vianormalize_strategiesto{ImminentCrisis, LicensedTherapist}, creating twoScenarioCompositeStrategyinstances. However,_get_atomic_attacks_asynconly calls_resolve_seed_groups()once, which in turn calls_extract_harm_category_filter()— and that method returns the first filter found ("imminent_crisis") without combining filters for both strategies.
This means when the default ALL strategy is used, only seeds tagged with "imminent_crisis" are included, silently ignoring "licensed_therapist" seeds. Previously ALL would include all psychosocial seeds.
The _get_atomic_attacks_async method should loop over the composites (or call _resolve_seed_groups per composite) to handle each strategy's seed groups separately, similar to how ContentHarms._get_atomic_attacks_async iterates over seed_groups_by_harm.
| nb_directory_path = pathlib.Path(path.DOCS_CODE_PATH, "targets").resolve() | ||
|
|
||
| skipped_files = [ | ||
| "4_non_llm_targets.ipynb", # requires Azure SQL Storage IO for Azure Storage Account (see #4001) |
There was a problem hiding this comment.
how are you running this?
Description
Tests and Documentation
added / updated tests