Conversation
f8de803 to
e996238
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new remote seed-dataset provider for the HuggingFace declare-lab/HarmfulQA dataset so it can be fetched and used as SeedPrompt entries within PyRIT’s dataset discovery/registration system.
Changes:
- Introduced
_HarmfulQADatasetremote loader that fetches HarmfulQA from HuggingFace and converts rows intoSeedPrompts. - Exported the new loader from
pyrit.datasets.seed_datasets.remoteto trigger auto-registration. - Added unit tests validating basic fetch + conversion behavior and
dataset_name.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pyrit/datasets/seed_datasets/remote/harmful_qa_dataset.py | New remote dataset loader implementation for HarmfulQA -> SeedDataset/SeedPrompt conversion. |
| pyrit/datasets/seed_datasets/remote/init.py | Re-export/import the new loader so it’s discoverable/registered alongside other remote loaders. |
| tests/unit/datasets/test_harmful_qa_dataset.py | Unit tests for fetching/conversion and dataset_name behavior. |
e996238 to
d441180
Compare
Add remote dataset loader for HarmfulQA (declare-lab/HarmfulQA), containing ~2k harmful questions organized by academic topic and subtopic for testing LLM susceptibility to harm-inducing question-answering. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
d441180 to
b4c033f
Compare
The HF dataset identifier is now a class constant HF_DATASET_NAME instead of a constructor parameter, consistent with other loaders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…-qa-dataset # Conflicts: # doc/code/datasets/1_loading_datasets.ipynb
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
pyproject.toml
Outdated
| @@ -297,7 +297,7 @@ notice-rgx = "Copyright \\(c\\) Microsoft Corporation\\.\\s*\\n.*Licensed under | |||
| # Ignore D and DOC rules everywhere except for the pyrit/ directory | |||
| "!pyrit/**.py" = ["D", "DOC"] | |||
| # Ignore copyright check only in doc/ directory | |||
There was a problem hiding this comment.
The per-file-ignores comment says this rule is “Ignore copyright check only in doc/ directory”, but the ignore list now also includes E402 and E501. Please update the comment to reflect the broader scope (or remove E402/E501 if they weren’t intended to be ignored).
| # Ignore copyright check only in doc/ directory | |
| # Ignore D and DOC rules everywhere except for the pyrit/ directory | |
| # Ignore copyright, import placement, line-length, and type-checking checks in doc/ directory |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…-qa-dataset # Conflicts: # pyproject.toml
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Reviewed the PR - code follows established patterns, tests pass,, fetches all seeds from HuggingFace successfully. Looks good to me, ready to merge. |
Add remote dataset loader for HarmfulQA (declare-lab/HarmfulQA), containing ~2k harmful questions organized by academic topic and subtopic for testing LLM susceptibility to harm-inducing question-answering.