FEAT Add BeaverTails dataset loader#1424
Merged
romanlutz merged 17 commits intoAzure:mainfrom Mar 4, 2026
Merged
Conversation
7b635d9 to
b652d70
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new remote seed dataset loader for the BeaverTails HuggingFace dataset, making it discoverable via SeedDatasetProvider and documenting its availability.
Changes:
- Introduces
_BeaverTailsDatasetremote loader with optionalunsafe_onlyfiltering (default: unsafe only). - Registers the loader in the remote datasets module and adds unit tests for filtering behavior.
- Updates the “Loading Built-in Datasets” notebook output to include the new dataset name.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
pyrit/datasets/seed_datasets/remote/beaver_tails_dataset.py |
New HuggingFace-backed loader that converts BeaverTails rows into SeedPrompts (unsafe-only by default). |
pyrit/datasets/seed_datasets/remote/__init__.py |
Imports/exports the new loader so it’s auto-registered/discoverable. |
tests/unit/datasets/test_beaver_tails_dataset.py |
Adds unit tests covering unsafe-only vs all-entries behavior and dataset naming. |
doc/code/datasets/1_loading_datasets.ipynb |
Notebook updated to reflect the new dataset in the available list (but now includes executed outputs/metadata). |
9741ae3 to
1fd2ef7
Compare
Add remote dataset loader for BeaverTails (PKU-Alignment/BeaverTails), containing 330k+ QA pairs annotated across 14 harm categories for safety alignment research. Filters to unsafe entries by default. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The HF dataset identifier is now a class constant HF_DATASET_NAME instead of a constructor parameter, consistent with other loaders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
For a 330k-row dataset, this avoids hundreds of thousands of redundant string/list allocations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
8a9dccb to
a91052f
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tails-dataset # Conflicts: # pyproject.toml
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
pyrit/prompt_converter/braille_converter.py:133
- In
_get_braile,is_numberis still reset toFalseafter processing any character that isn’t innumberPunctuations(line 132). Since digits aren’t innumberPunctuations, this resets the number-mode after every digit, causing the Braille number indicator (characterUnicodes['num']) to be emitted before each digit instead of once per digit sequence. Consider resetting number-mode only when leaving a numeric run (e.g., whenis_numberis True and the currentcharis neither a digit nor an allowed number punctuation).
is_number = False
for char in text:
if char in escapeCharacters:
output += char
elif char.isupper():
if char.lower() in characterUnicodes:
output += characterUnicodes["caps"]
output += characterUnicodes[char.lower()]
elif char in characterUnicodes:
if char.isdigit() and not is_number:
is_number = True
output += characterUnicodes["num"]
output += characterUnicodes[char]
if is_number and char not in numberPunctuations:
is_number = False
Replaces isoformat().replace('+00:00', 'Z') with strftime('%Y-%m-%dT%H:%M:%SZ')
for second-resolution timestamps without microsecond noise.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
varunj-msft
reviewed
Mar 3, 2026
Contributor
varunj-msft
left a comment
There was a problem hiding this comment.
Loader looks clean. Worth splitting the braille and markdown printer fixes into their own PRs? The braille one may need a follow up on the reset condition
Merge latest main into the branch. Revert unrelated changes to braille_converter.py and markdown_printer.py that don't belong in the BeaverTails dataset PR. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wrap SeedPrompt construction in try/except TemplateSyntaxError to gracefully skip prompts that contain Jinja2 syntax (e.g. endraw) which would crash the template parser. Add test for this case. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…om/romanlutz/PyRIT into romanlutz/add-beaver-tails-dataset
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rlundeen2
approved these changes
Mar 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add remote dataset loader for BeaverTails (PKU-Alignment/BeaverTails), containing 330k+ QA pairs annotated across 14 harm categories for safety alignment research. Filters to unsafe entries by default.