Skip to content

[BREAKING] FEAT add TAP to content harms scenario#1378

Open
hannahwestra25 wants to merge 29 commits intoAzure:mainfrom
hannahwestra25:hawestra/add_tap_to_content_harms
Open

[BREAKING] FEAT add TAP to content harms scenario#1378
hannahwestra25 wants to merge 29 commits intoAzure:mainfrom
hannahwestra25:hawestra/add_tap_to_content_harms

Conversation

@hannahwestra25
Copy link
Contributor

@hannahwestra25 hannahwestra25 commented Feb 19, 2026

Description

  • Add TAP to the content harms scenario
  • align scenario naming
    • rename the scenario strategies to match casing (this is breaking)
  • remove multiturn / singleturn as tags from psychosocial strategy
    • this is breaking
  • fix broken images on website

Tests and Documentation

added / updated tests

@hannahwestra25 hannahwestra25 changed the title FEAT add TAP to content harms scenario [BREAKING] FEAT add TAP to content harms scenario Feb 20, 2026
Copilot AI review requested due to automatic review settings March 3, 2026 21:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Tree of Attacks with Pruning (TAP) to the content harms scenario, aligns scenario naming by renaming classes from PsychosocialScenarioPsychosocial and LeakageScenarioLeakage, converts PsychosocialStrategy and LeakageStrategy members from UPPER_CASE to PascalCase, removes SINGLE_TURN/MULTI_TURN as tags from the psychosocial strategy, and provides deprecated backward-compatibility aliases. The PR also updates the notebook outputs to reflect new scenario names, fixes a documentation bug with broken images, and removes PsychosocialScenario and LeakageScenario as primary class names (making them deprecated aliases only).

Changes:

  • Add TreeOfAttacksWithPruningAttack (TAP) to ContentHarms alongside ManyShotJailbreakAttack in a new _get_multi_turn_attacks method, replacing the deprecated MultiPromptSendingAttack
  • Rename scenario classes (LeakageScenarioLeakage, PsychosocialScenarioPsychosocial) and strategy enum members to PascalCase, with deprecated aliases for backward compatibility
  • Scenario.__init__ now derives name from type(self).__name__ when not explicitly provided, removing the requirement for subclasses to pass name=

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pyrit/scenario/scenarios/airt/content_harms.py Adds TAP attack; refactors _get_strategy_attacks into _get_single_turn_attacks + _get_multi_turn_attacks; deprecates objectives_by_harm
pyrit/scenario/scenarios/airt/psychosocial.py Renames class to Psychosocial, renames strategy enum members to PascalCase, removes SINGLE_TURN/MULTI_TURN, adds deprecated alias
pyrit/scenario/scenarios/airt/leakage.py Renames class to Leakage, renames strategy enum members to PascalCase, adds deprecated alias
pyrit/scenario/core/scenario_strategy.py Adds _DeprecatedEnumMeta metaclass for deprecated member redirect support
pyrit/scenario/core/scenario.py Makes name parameter optional, defaulting to class name
pyrit/scenario/core/atomic_attack.py Removes duplicate self._objective_scorer = objective_scorer assignment
pyrit/scenario/scenarios/airt/__init__.py Exports new Leakage and Psychosocial classes
pyrit/scenario/scenarios/airt/scam.py Removes explicit name= from super().__init__() call
pyrit/scenario/scenarios/airt/cyber.py Removes explicit name= from super().__init__() call
pyrit/scenario/scenarios/airt/jailbreak.py Removes explicit name= from super().__init__() call
pyrit/scenario/scenarios/foundry/red_team_agent.py Removes explicit name= from super().__init__() call
pyrit/scenario/scenarios/garak/encoding.py Removes explicit name= from super().__init__() call
tests/unit/scenarios/test_content_harms.py Adds tests for new _get_single_turn_attacks, _get_multi_turn_attacks, and _get_strategy_attacks
tests/unit/scenarios/test_psychosocial_harms.py Updates tests to use Psychosocial and PascalCase strategy enum members
tests/unit/scenarios/test_leakage_scenario.py Updates tests to use Leakage and PascalCase strategy enum members
doc/code/scenarios/0_scenarios.ipynb Updates notebook output to show new scenario names/counts
Makefile Adds cp -r assets doc/_build/assets to fix broken website images
tests/integration/targets/test_notebooks_targets.py Removes skipped file entry for 4_non_llm_targets.ipynb
Comments suppressed due to low confidence (3)

pyrit/scenario/scenarios/airt/psychosocial.py:437

  • The error message in _get_atomic_attacks_async references the old class name PsychosocialHarmsScenario, but the class has been renamed to Psychosocial. This stale name could confuse users trying to diagnose the error.
    pyrit/scenario/scenarios/airt/leakage.py:100
  • The Leakage class docstring at line 99 still references the old class name LeakageScenario: "The LeakageScenario class contains different attack variations...". This should be updated to say "The Leakage class" since the class has been renamed.
    pyrit/scenario/scenarios/airt/psychosocial.py:90
  • The PsychosocialStrategy class docstring still references tags single_turn and multi_turn (lines 85-87) which no longer exist on any strategy member, and also still uses the old class name PsychosocialHarmsStrategy in the docstring description (line 80). Additionally, the get_strategy_class() docstring at line 185 also refers to the old name PsychosocialHarmsStrategy. These references should be updated to reflect the new naming and strategy structure.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 3, 2026 21:41
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (3)

pyrit/scenario/scenarios/airt/psychosocial.py:90

  • The PsychosocialStrategy class docstring (lines 84-87) still references single_turn, multi_turn tags and the old behavior where tags correspond to different attack strategies. These members (SINGLE_TURN, MULTI_TURN) have been removed and the new ImminentCrisis / LicensedTherapist members have empty tag sets. The docstring is now inaccurate and should be updated to describe the new strategy structure.
    pyrit/scenario/scenarios/airt/psychosocial.py:437
  • The error message in _get_atomic_attacks_async at line 436 still references the old class name PsychosocialHarmsScenario, but the class has been renamed to Psychosocial. The error message should be updated to say Psychosocial instead.
    pyrit/scenario/scenarios/airt/leakage.py:97
  • The Leakage class docstring (visible in the context just below the added class definition) still refers to "The LeakageScenario class" instead of "The Leakage class". This is a stale reference to the old class name that should be updated to match the renamed class.

hannahwestra25 and others added 2 commits March 3, 2026 16:59
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 3, 2026 22:00
hannahwestra25 and others added 3 commits March 3, 2026 17:01
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

pyrit/scenario/scenarios/airt/psychosocial.py:110

  • When PsychosocialStrategy.ALL is used, it expands to both ImminentCrisis and LicensedTherapist, creating two separate entries in _scenario_composites. However, _extract_harm_category_filter() (called from _resolve_seed_groups) returns only the first matching harm category filter it encounters (e.g., "imminent_crisis"), meaning LicensedTherapist seeds are never processed. As a result, running the scenario with ALL is functionally equivalent to running it with ImminentCrisis only — the LicensedTherapist subharm is silently skipped. The method should be updated to collect all distinct harm categories and generate attacks for each.
    pyrit/scenario/scenarios/airt/psychosocial.py:96
  • In PsychosocialStrategy, the enum members ImminentCrisis and LicensedTherapist are defined using set[str]() to create empty tag sets. While functionally equivalent to set(), this is unusual syntax and could be confusing. The standard way to pass an empty set is simply set() which is also how the ALL strategy tag logic processes it. Consider using set() for clarity.

Copilot AI review requested due to automatic review settings March 3, 2026 23:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (3)

pyrit/scenario/scenarios/airt/psychosocial.py:96

  • The PsychosocialStrategy docstring still references the old single_turn and multi_turn tags (lines 84–87) which no longer exist as enum members or tags on ImminentCrisis / LicensedTherapist. These descriptions are now misleading. The docstring should be updated to reflect the new strategy model where ImminentCrisis and LicensedTherapist are the concrete strategies, each running both single-turn and multi-turn attacks unconditionally.
    pyrit/scenario/scenarios/airt/psychosocial.py:436
  • The error message at this line still references the old class name PsychosocialHarmsScenario. This class no longer exists — the correct name is Psychosocial. The error message should be updated to reference Psychosocial instead.
    pyrit/scenario/scenarios/airt/psychosocial.py:450
  • When PsychosocialStrategy.ALL is used (the default), it expands via normalize_strategies to {ImminentCrisis, LicensedTherapist}, creating two ScenarioCompositeStrategy instances. However, _get_atomic_attacks_async only calls _resolve_seed_groups() once, which in turn calls _extract_harm_category_filter() — and that method returns the first filter found ("imminent_crisis") without combining filters for both strategies.

This means when the default ALL strategy is used, only seeds tagged with "imminent_crisis" are included, silently ignoring "licensed_therapist" seeds. Previously ALL would include all psychosocial seeds.

The _get_atomic_attacks_async method should loop over the composites (or call _resolve_seed_groups per composite) to handle each strategy's seed groups separately, similar to how ContentHarms._get_atomic_attacks_async iterates over seed_groups_by_harm.

nb_directory_path = pathlib.Path(path.DOCS_CODE_PATH, "targets").resolve()

skipped_files = [
"4_non_llm_targets.ipynb", # requires Azure SQL Storage IO for Azure Storage Account (see #4001)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are you running this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants