Skip to content

FEAT Add SALAD-Bench dataset loader#1425

Merged
romanlutz merged 7 commits intoAzure:mainfrom
romanlutz:romanlutz/add-salad-bench-dataset
Mar 2, 2026
Merged

FEAT Add SALAD-Bench dataset loader#1425
romanlutz merged 7 commits intoAzure:mainfrom
romanlutz:romanlutz/add-salad-bench-dataset

Conversation

@romanlutz
Copy link
Contributor

Add remote dataset loader for SALAD-Bench (walledai/SaladBench), a hierarchical safety benchmark with ~30k prompts organized into 6 domains, 16 tasks, and 65+ categories (ACL 2024).

Copilot AI review requested due to automatic review settings March 1, 2026 14:31
@romanlutz romanlutz force-pushed the romanlutz/add-salad-bench-dataset branch from 95af585 to 99ab63b Compare March 1, 2026 14:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new remote seed dataset loader for the SALAD-Bench HuggingFace dataset, making it available through PyRIT’s automatic SeedDatasetProvider discovery and documenting it in the dataset-loading guide.

Changes:

  • Added _SaladBenchDataset remote loader that fetches SALAD-Bench from HuggingFace and converts rows into SeedPrompts.
  • Registered the loader for auto-discovery via pyrit.datasets.seed_datasets.remote.__init__.
  • Added unit tests and updated the “Loading Built-in Datasets” notebook to show the new dataset name.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pyrit/datasets/seed_datasets/remote/salad_bench_dataset.py New HuggingFace-backed loader that maps SALAD-Bench entries into SeedDataset/SeedPrompt.
pyrit/datasets/seed_datasets/remote/__init__.py Imports/exports _SaladBenchDataset so it’s registered and discoverable.
tests/unit/datasets/test_salad_bench_dataset.py Unit tests validating dataset fetching and config passthrough behavior.
doc/code/datasets/1_loading_datasets.ipynb Documentation notebook updated to reflect the new dataset in the available list (but currently includes executed outputs/metadata).
Comments suppressed due to low confidence (1)

pyrit/datasets/seed_datasets/remote/salad_bench_dataset.py:74

  • The authors list formatting is inconsistent with other remote dataset loaders and is hard to read (and likely exceeds the repo’s 120-char line length). Please format the authors list across multiple lines (one author per line) like other dataset loaders for readability and consistent styling.
            dataset_name=self.hf_dataset_name,
            config=self.config,

@romanlutz romanlutz force-pushed the romanlutz/add-salad-bench-dataset branch 2 times, most recently from 7db0e9c to bbc66cf Compare March 2, 2026 13:08
Copilot AI review requested due to automatic review settings March 2, 2026 13:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

doc/code/datasets/1_loading_datasets.ipynb:242

  • The notebook metadata was updated to a different local Python version. To avoid unnecessary diffs across environments, consider reverting/normalizing kernel metadata (or stripping it) in committed docs notebooks.
   "version": "3.13.5"

Copilot AI review requested due to automatic review settings March 2, 2026 13:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

doc/code/datasets/1_loading_datasets.ipynb:200

  • This notebook output includes a DeprecationWarning with a user-specific temporary file path (C:\\Users\\...\\AppData\\Local\\Temp\\...). Please clear/sanitize this output (and ideally avoid emitting the warning in the example) so docs are reproducible and don't embed local filesystem paths.
      "C:\\Users\\romanlutz\\AppData\\Local\\Temp\\ipykernel_40808\\4021500943.py:10: DeprecationWarning: is_objective parameter is deprecated since 0.13.0. Use seed_type='objective' instead.\n",
      "  memory.get_seeds(harm_categories=[\"illegal\"], is_objective=True)\n"
     ]

Copilot AI review requested due to automatic review settings March 2, 2026 14:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

romanlutz and others added 7 commits March 2, 2026 13:51
Add remote dataset loader for SALAD-Bench (walledai/SaladBench), a hierarchical
safety benchmark with ~30k prompts organized into 6 domains, 16 tasks, and 65+
categories (ACL 2024).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The HF dataset identifier is now a class constant HF_DATASET_NAME
instead of a constructor parameter, consistent with other loaders.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wrapping in Jinja2 raw tags preserves original dataset text that
may contain {{ }} or {% %} syntax. Also precomputes loop constants.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 21:53
@romanlutz romanlutz force-pushed the romanlutz/add-salad-bench-dataset branch from 10d21b9 to 29490c0 Compare March 2, 2026 21:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

@romanlutz romanlutz merged commit 1f6fb87 into Azure:main Mar 2, 2026
41 checks passed
@romanlutz romanlutz deleted the romanlutz/add-salad-bench-dataset branch March 2, 2026 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants