Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new remote seed dataset loader for the HuggingFace lmsys/toxic-chat dataset so it can be discovered/loaded through PyRIT’s dataset provider framework, with accompanying unit tests and docs update.
Changes:
- Introduces
_ToxicChatDatasetremote loader that converts ToxicChat rows intoSeedPrompts and skips prompts that fail template parsing. - Adds unit tests for the ToxicChat loader.
- Updates dataset-loading documentation output to include
toxic_chatin the available dataset list.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py |
New HuggingFace-backed loader that builds SeedDataset/SeedPrompt entries from ToxicChat rows and skips problematic entries. |
tests/unit/datasets/test_toxic_chat_dataset.py |
New unit tests validating basic loading behavior and custom HF config plumbing. |
pyrit/datasets/seed_datasets/remote/__init__.py |
Exposes/imports the new loader for discovery/registration. |
doc/code/datasets/1_loading_datasets.ipynb |
Updates the displayed list of available datasets to include toxic_chat. |
Comments suppressed due to low confidence (1)
tests/unit/datasets/test_toxic_chat_dataset.py:78
- For consistency with other dataset loader tests (e.g., DarkBench), this test should assert that
split(and optionallycache) are forwarded to_fetch_from_huggingfacein addition todataset_nameandconfig; otherwise regressions in argument plumbing won't be caught.
with patch.object(loader, "_fetch_from_huggingface", return_value=mock_toxic_chat_data) as mock_fetch:
dataset = await loader.fetch_dataset()
assert len(dataset.seeds) == 2
mock_fetch.assert_called_once()
call_kwargs = mock_fetch.call_args.kwargs
assert call_kwargs["dataset_name"] == "custom/toxic-chat"
assert call_kwargs["config"] == "custom_config"
cb93f7f to
c63374c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
tests/unit/datasets/test_toxic_chat_dataset.py:78
- This test configures
split="test"but doesn't assert thatfetch_dataset()passessplitthrough to_fetch_from_huggingface(unlike similar tests for other datasets). Adding an assertion forcall_kwargs["split"](and optionallycache) would improve coverage of the custom-config behavior.
loader = _ToxicChatDataset(
dataset_name="custom/toxic-chat",
config="custom_config",
split="test",
)
with patch.object(loader, "_fetch_from_huggingface", return_value=mock_toxic_chat_data) as mock_fetch:
dataset = await loader.fetch_dataset()
assert len(dataset.seeds) == 2
mock_fetch.assert_called_once()
call_kwargs = mock_fetch.call_args.kwargs
assert call_kwargs["dataset_name"] == "custom/toxic-chat"
assert call_kwargs["config"] == "custom_config"
Add remote dataset loader for ToxicChat (lmsys/toxic-chat), containing ~10k real user-chatbot conversations from Chatbot Arena annotated for toxicity and jailbreaking attempts. Gracefully skips entries with Jinja2-incompatible content. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
c63374c to
79e2bf9
Compare
The HF dataset identifier is now a class constant HF_DATASET_NAME instead of a constructor parameter, consistent with other loaders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
doc/code/datasets/1_loading_datasets.ipynb:242
- The notebook metadata
language_info.versionchanged from 3.11.14 to 3.13.5, which is typically just an artifact of the author’s local environment. To reduce churn in docs, consider keeping this stable (or stripping environment-specific metadata) when updating notebooks.
"version": "3.13.5"
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Raw wrapping preserves original dataset text including Jinja2-like syntax (HTML, template tags), eliminating the need to catch and skip TemplateSyntaxError entries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py:112
- Remote dataset loaders in this repo typically wrap prompt text in Jinja2
{% raw %}...{% endraw %}before constructing aSeedPromptso thatSeedPrompt.__post_init__doesn't render/alter prompt content and so template-like text doesn't trigger parsing errors. Herevalue=item["user_input"]plus catchingTemplateSyntaxErrormeans some prompts will be silently dropped and others may still be mutated by Jinja rendering. Consider wrappinguser_inputin raw tags (and only skipping if it still fails), and include the conv_id in the debug log for traceability.
description=description,
source=source_url,
authors=authors,
groups=groups,
metadata={
"toxicity": str(item.get("toxicity", "")),
"jailbreaking": str(item.get("jailbreaking", "")),
"human_annotation": str(item.get("human_annotation", "")),
},
)
for item in data
]
logger.info(f"Successfully loaded {len(seed_prompts)} prompts from ToxicChat dataset")
return SeedDataset(seeds=seed_prompts, dataset_name=self.dataset_name)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| async def test_fetch_dataset_preserves_jinja2_content(self): | ||
| """Test that entries with Jinja2-like content are preserved via raw wrapping.""" | ||
| data_with_html = [ | ||
| { |
There was a problem hiding this comment.
The tests don’t exercise the advertised behavior of skipping truly Jinja2-incompatible inputs (e.g., user_input containing {% endraw %} that can break parsing even when wrapped). After adding skip/exception handling in the loader, add a unit test asserting such an entry is skipped while the rest of the dataset still loads.
| seed_prompts = [ | ||
| SeedPrompt( | ||
| value=f"{{% raw %}}{item['user_input']}{{% endraw %}}", | ||
| data_type="text", | ||
| dataset_name=self.dataset_name, |
There was a problem hiding this comment.
fetch_dataset builds SeedPrompt objects in a list comprehension, but SeedPrompt.__post_init__ can raise when Jinja2 cannot parse the template (e.g., if user_input contains {% endraw %} or otherwise breaks the outer {% raw %}...{% endraw %} wrapper). In that case, the entire dataset load will fail rather than “gracefully skipping” bad entries as described. Consider switching to an explicit loop that wraps SeedPrompt(...) construction in a try/except (catch ValueError / Jinja2 TemplateSyntaxError), logs a warning with conv_id, and continues.
…hat-dataset # Conflicts: # doc/code/datasets/1_loading_datasets.ipynb
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
pyproject.toml:300
- The comment says this per-file ignore is "only" for the copyright check, but the rule list now also ignores E402/E501/TCH under
doc/**. Update the comment to reflect the broader scope (or narrow the ignores if that wasn’t intended).
# Ignore copyright check only in doc/ directory
"doc/**" = ["CPY001", "E402", "E501", "TCH"]
| mock_fetch.assert_called_once() | ||
| call_kwargs = mock_fetch.call_args.kwargs | ||
| assert call_kwargs["dataset_name"] == "lmsys/toxic-chat" | ||
| assert call_kwargs["config"] == "custom_config" |
There was a problem hiding this comment.
This test verifies dataset_name and config are passed through to _fetch_from_huggingface, but it doesn't assert split (and optionally cache) are forwarded. Adding those assertions would better match the coverage pattern used in other dataset loader tests (e.g., SALAD-Bench) and prevent regressions.
| assert call_kwargs["config"] == "custom_config" | |
| assert call_kwargs["config"] == "custom_config" | |
| assert call_kwargs["split"] == "test" |
…hat-dataset # Conflicts: # pyproject.toml
…hat-dataset # Conflicts: # pyrit/datasets/seed_datasets/remote/__init__.py
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py:104
Seed.metadatasupports integer values, and other remote dataset loaders often preserve numeric metadata as ints (e.g., grades invlsu_multimodal_dataset). Convertingtoxicity/jailbreakingtostrhere loses type information and can make downstream consumers re-parse these fields. Prefer keeping these fields asintwhen present (and only fall back to empty string / None when missing).
metadata={
"toxicity": str(item.get("toxicity", "")),
"jailbreaking": str(item.get("jailbreaking", "")),
"human_annotation": str(item.get("human_annotation", "")),
},
| class _ToxicChatDataset(_RemoteDatasetLoader): | ||
| """ | ||
| Loader for the ToxicChat dataset from HuggingFace. |
There was a problem hiding this comment.
This loader likely won’t be discoverable via the normal dataset auto-registration path. pyrit.datasets imports pyrit.datasets.seed_datasets.remote, and remote/__init__.py explicitly imports each concrete loader to trigger SeedDatasetProvider.__init_subclass__ registration—but _ToxicChatDataset isn’t added there, so it won’t appear in SeedDatasetProvider.get_all_dataset_names() unless users import this module directly. Add an import (and __all__ entry) for this loader in pyrit/datasets/seed_datasets/remote/__init__.py and consider adding a unit test asserting "toxic_chat" is registered after importing pyrit.datasets.
| seed_prompts = [ | ||
| SeedPrompt( | ||
| value=f"{{% raw %}}{item['user_input']}{{% endraw %}}", | ||
| data_type="text", | ||
| dataset_name=self.dataset_name, | ||
| description=description, | ||
| source=source_url, | ||
| authors=authors, | ||
| groups=groups, | ||
| metadata={ | ||
| "toxicity": str(item.get("toxicity", "")), | ||
| "jailbreaking": str(item.get("jailbreaking", "")), | ||
| "human_annotation": str(item.get("human_annotation", "")), | ||
| }, | ||
| ) | ||
| for item in data | ||
| ] |
There was a problem hiding this comment.
The PR description says the loader “gracefully skips entries with Jinja2-incompatible content,” but this implementation currently wraps every user_input in a {% raw %}...{% endraw %} block and never skips any records (and the tests assert all items are preserved). Either update the PR description to match the implemented behavior, or implement the described skipping (e.g., catch template parsing failures during SeedPrompt creation and continue).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| class _ToxicChatDataset(_RemoteDatasetLoader): | ||
| """ | ||
| Loader for the ToxicChat dataset from HuggingFace. | ||
|
|
||
| ToxicChat contains approximately 10k real user-chatbot conversations from the Chatbot Arena, | ||
| annotated for toxicity and jailbreaking attempts. It provides real-world examples of | ||
| how users interact with LLMs in adversarial ways. | ||
|
|
||
| References: | ||
| - https://huggingface.co/datasets/lmsys/toxic-chat | ||
| - https://arxiv.org/abs/2310.17389 | ||
| License: CC BY-NC 4.0 | ||
|
|
||
| Warning: This dataset contains toxic, offensive, and jailbreaking content from real user | ||
| conversations. Consult your legal department before using these prompts for testing. | ||
| """ | ||
|
|
||
| HF_DATASET_NAME: str = "lmsys/toxic-chat" | ||
|
|
There was a problem hiding this comment.
This new remote dataset provider won’t be auto-discoverable unless it’s imported in pyrit/datasets/seed_datasets/remote/__init__.py (that module imports concrete providers to trigger registration via SeedDatasetProvider.__init_subclass__). Add _ToxicChatDataset to the remote __init__.py imports/__all__ so SeedDatasetProvider.get_all_dataset_names() can find it.
| ) | ||
| except Exception: | ||
| conv_id = item.get("conv_id", "unknown") | ||
| logger.debug(f"Skipping entry with conv_id={conv_id}: failed to parse as Jinja2 template") | ||
|
|
There was a problem hiding this comment.
except Exception: will also swallow non-template failures (e.g., missing user_input due to schema changes) and the log drops the underlying exception details. Prefer validating required keys up-front and/or catching specific Jinja2 parse exceptions, and include exception info in the log (e.g., exc_info=True) so unexpected errors don’t get silently skipped.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Catch jinja2.TemplateError instead of broad Exception so schema errors (e.g., missing user_input key) propagate instead of being silently swallowed - Extract user_input key access before try block for clarity Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| user_input = item["user_input"] | ||
| try: | ||
| seed_prompts.append( | ||
| SeedPrompt( | ||
| value=f"{{% raw %}}{user_input}{{% endraw %}}", | ||
| data_type="text", | ||
| dataset_name=self.dataset_name, |
There was a problem hiding this comment.
SeedPrompt.__post_init__ calls render_template_value_silent(), which returns early (without rendering) when it detects Jinja2 control-structure tokens like {% for ... %} and the loop collection variable isn’t provided. Because this loader wraps every user_input in {% raw %}...{% endraw %}, a ToxicChat entry containing something like {% for x in items %} can cause render_template_value_silent() to return the raw-wrapped string, leaving the {% raw %} / {% endraw %} markers in the stored prompt value instead of the original text. Consider stripping the wrapper after SeedPrompt creation (only when it’s still present) or forcing a render path that always removes the wrapper, and add a regression test for a {% for ... %} example to prevent this from silently changing prompt content.
| async def test_fetch_dataset_preserves_jinja2_content(self): | ||
| """Test that entries with Jinja2-like content are preserved via raw wrapping.""" | ||
| data_with_html = [ | ||
| { | ||
| "conv_id": "good1", | ||
| "user_input": "Normal question", | ||
| "model_output": "Normal answer.", | ||
| "human_annotation": "False", | ||
| "toxicity": 0, | ||
| "jailbreaking": 0, | ||
| "openai_moderation": "[]", | ||
| }, | ||
| { | ||
| "conv_id": "html1", | ||
| "user_input": "<!DOCTYPE html>{%block%}broken", | ||
| "model_output": "N/A", | ||
| "human_annotation": "False", | ||
| "toxicity": 0, | ||
| "jailbreaking": 0, | ||
| "openai_moderation": "[]", | ||
| }, | ||
| ] | ||
| loader = _ToxicChatDataset() | ||
|
|
||
| with patch.object(loader, "_fetch_from_huggingface", new=AsyncMock(return_value=data_with_html)): | ||
| dataset = await loader.fetch_dataset() | ||
|
|
||
| assert len(dataset.seeds) == 2 | ||
| assert dataset.seeds[0].value == "Normal question" | ||
| assert dataset.seeds[1].value == "<!DOCTYPE html>{%block%}broken" | ||
|
|
There was a problem hiding this comment.
Current tests cover raw-wrapping for block-like content and skipping on embedded {% endraw %}, but they don’t cover the case where user_input contains a Jinja2 control-structure token like {% for x in items %}. With the current Seed.render_template_value_silent() logic, that pattern can prevent the raw wrapper from being rendered away, leaving {% raw %} / {% endraw %} in the final SeedPrompt.value. Adding a regression test for a {% for ... %}-style input would help catch this behavior.
|
Looks good to approve. Minor thought, not blocking: could be worth considering harm_categories down the line - most other loaders populate it, which enables get_values(harm_categories=...) filtering. The binary flags here are a bit different from string labeled categories in other datasets though, so I can see why metadata is the right home for now. |
Add remote dataset loader for ToxicChat (lmsys/toxic-chat), containing ~10k real user-chatbot conversations from Chatbot Arena annotated for toxicity and jailbreaking attempts. Gracefully skips entries with Jinja2-incompatible content.