FEAT Add SALAD-Bench dataset loader#1425
Merged
romanlutz merged 7 commits intoAzure:mainfrom Mar 2, 2026
Merged
Conversation
95af585 to
99ab63b
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new remote seed dataset loader for the SALAD-Bench HuggingFace dataset, making it available through PyRIT’s automatic SeedDatasetProvider discovery and documenting it in the dataset-loading guide.
Changes:
- Added
_SaladBenchDatasetremote loader that fetches SALAD-Bench from HuggingFace and converts rows intoSeedPrompts. - Registered the loader for auto-discovery via
pyrit.datasets.seed_datasets.remote.__init__. - Added unit tests and updated the “Loading Built-in Datasets” notebook to show the new dataset name.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
pyrit/datasets/seed_datasets/remote/salad_bench_dataset.py |
New HuggingFace-backed loader that maps SALAD-Bench entries into SeedDataset/SeedPrompt. |
pyrit/datasets/seed_datasets/remote/__init__.py |
Imports/exports _SaladBenchDataset so it’s registered and discoverable. |
tests/unit/datasets/test_salad_bench_dataset.py |
Unit tests validating dataset fetching and config passthrough behavior. |
doc/code/datasets/1_loading_datasets.ipynb |
Documentation notebook updated to reflect the new dataset in the available list (but currently includes executed outputs/metadata). |
Comments suppressed due to low confidence (1)
pyrit/datasets/seed_datasets/remote/salad_bench_dataset.py:74
- The
authorslist formatting is inconsistent with other remote dataset loaders and is hard to read (and likely exceeds the repo’s 120-char line length). Please format the authors list across multiple lines (one author per line) like other dataset loaders for readability and consistent styling.
dataset_name=self.hf_dataset_name,
config=self.config,
7db0e9c to
bbc66cf
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
doc/code/datasets/1_loading_datasets.ipynb:242
- The notebook metadata was updated to a different local Python version. To avoid unnecessary diffs across environments, consider reverting/normalizing kernel metadata (or stripping it) in committed docs notebooks.
"version": "3.13.5"
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
doc/code/datasets/1_loading_datasets.ipynb:200
- This notebook output includes a DeprecationWarning with a user-specific temporary file path (
C:\\Users\\...\\AppData\\Local\\Temp\\...). Please clear/sanitize this output (and ideally avoid emitting the warning in the example) so docs are reproducible and don't embed local filesystem paths.
"C:\\Users\\romanlutz\\AppData\\Local\\Temp\\ipykernel_40808\\4021500943.py:10: DeprecationWarning: is_objective parameter is deprecated since 0.13.0. Use seed_type='objective' instead.\n",
" memory.get_seeds(harm_categories=[\"illegal\"], is_objective=True)\n"
]
Add remote dataset loader for SALAD-Bench (walledai/SaladBench), a hierarchical safety benchmark with ~30k prompts organized into 6 domains, 16 tasks, and 65+ categories (ACL 2024). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The HF dataset identifier is now a class constant HF_DATASET_NAME instead of a constructor parameter, consistent with other loaders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wrapping in Jinja2 raw tags preserves original dataset text that
may contain {{ }} or {% %} syntax. Also precomputes loop constants.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
10d21b9 to
29490c0
Compare
ValbuenaVC
approved these changes
Mar 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add remote dataset loader for SALAD-Bench (walledai/SaladBench), a hierarchical safety benchmark with ~30k prompts organized into 6 domains, 16 tasks, and 65+ categories (ACL 2024).