Skip to content

FEAT Add ToxicChat dataset loader#1422

Open
romanlutz wants to merge 16 commits intoAzure:mainfrom
romanlutz:romanlutz/add-toxic-chat-dataset
Open

FEAT Add ToxicChat dataset loader#1422
romanlutz wants to merge 16 commits intoAzure:mainfrom
romanlutz:romanlutz/add-toxic-chat-dataset

Conversation

@romanlutz
Copy link
Contributor

Add remote dataset loader for ToxicChat (lmsys/toxic-chat), containing ~10k real user-chatbot conversations from Chatbot Arena annotated for toxicity and jailbreaking attempts. Gracefully skips entries with Jinja2-incompatible content.

Copilot AI review requested due to automatic review settings March 1, 2026 14:20
@romanlutz romanlutz changed the title Add ToxicChat dataset loader FEAT Add ToxicChat dataset loader Mar 1, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new remote seed dataset loader for the HuggingFace lmsys/toxic-chat dataset so it can be discovered/loaded through PyRIT’s dataset provider framework, with accompanying unit tests and docs update.

Changes:

  • Introduces _ToxicChatDataset remote loader that converts ToxicChat rows into SeedPrompts and skips prompts that fail template parsing.
  • Adds unit tests for the ToxicChat loader.
  • Updates dataset-loading documentation output to include toxic_chat in the available dataset list.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py New HuggingFace-backed loader that builds SeedDataset/SeedPrompt entries from ToxicChat rows and skips problematic entries.
tests/unit/datasets/test_toxic_chat_dataset.py New unit tests validating basic loading behavior and custom HF config plumbing.
pyrit/datasets/seed_datasets/remote/__init__.py Exposes/imports the new loader for discovery/registration.
doc/code/datasets/1_loading_datasets.ipynb Updates the displayed list of available datasets to include toxic_chat.
Comments suppressed due to low confidence (1)

tests/unit/datasets/test_toxic_chat_dataset.py:78

  • For consistency with other dataset loader tests (e.g., DarkBench), this test should assert that split (and optionally cache) are forwarded to _fetch_from_huggingface in addition to dataset_name and config; otherwise regressions in argument plumbing won't be caught.
        with patch.object(loader, "_fetch_from_huggingface", return_value=mock_toxic_chat_data) as mock_fetch:
            dataset = await loader.fetch_dataset()

            assert len(dataset.seeds) == 2
            mock_fetch.assert_called_once()
            call_kwargs = mock_fetch.call_args.kwargs
            assert call_kwargs["dataset_name"] == "custom/toxic-chat"
            assert call_kwargs["config"] == "custom_config"

@romanlutz romanlutz force-pushed the romanlutz/add-toxic-chat-dataset branch 2 times, most recently from cb93f7f to c63374c Compare March 1, 2026 14:54
Copilot AI review requested due to automatic review settings March 1, 2026 14:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

tests/unit/datasets/test_toxic_chat_dataset.py:78

  • This test configures split="test" but doesn't assert that fetch_dataset() passes split through to _fetch_from_huggingface (unlike similar tests for other datasets). Adding an assertion for call_kwargs["split"] (and optionally cache) would improve coverage of the custom-config behavior.
        loader = _ToxicChatDataset(
            dataset_name="custom/toxic-chat",
            config="custom_config",
            split="test",
        )

        with patch.object(loader, "_fetch_from_huggingface", return_value=mock_toxic_chat_data) as mock_fetch:
            dataset = await loader.fetch_dataset()

            assert len(dataset.seeds) == 2
            mock_fetch.assert_called_once()
            call_kwargs = mock_fetch.call_args.kwargs
            assert call_kwargs["dataset_name"] == "custom/toxic-chat"
            assert call_kwargs["config"] == "custom_config"

Add remote dataset loader for ToxicChat (lmsys/toxic-chat), containing ~10k real
user-chatbot conversations from Chatbot Arena annotated for toxicity and
jailbreaking attempts. Gracefully skips entries with Jinja2-incompatible content.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@romanlutz romanlutz force-pushed the romanlutz/add-toxic-chat-dataset branch from c63374c to 79e2bf9 Compare March 2, 2026 12:58
The HF dataset identifier is now a class constant HF_DATASET_NAME
instead of a constructor parameter, consistent with other loaders.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 13:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

doc/code/datasets/1_loading_datasets.ipynb:242

  • The notebook metadata language_info.version changed from 3.11.14 to 3.13.5, which is typically just an artifact of the author’s local environment. To reduce churn in docs, consider keeping this stable (or stripping environment-specific metadata) when updating notebooks.
   "version": "3.13.5"

romanlutz and others added 2 commits March 2, 2026 05:55
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 14:02
Raw wrapping preserves original dataset text including Jinja2-like
syntax (HTML, template tags), eliminating the need to catch and skip
TemplateSyntaxError entries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py:112

  • Remote dataset loaders in this repo typically wrap prompt text in Jinja2 {% raw %}...{% endraw %} before constructing a SeedPrompt so that SeedPrompt.__post_init__ doesn't render/alter prompt content and so template-like text doesn't trigger parsing errors. Here value=item["user_input"] plus catching TemplateSyntaxError means some prompts will be silently dropped and others may still be mutated by Jinja rendering. Consider wrapping user_input in raw tags (and only skipping if it still fails), and include the conv_id in the debug log for traceability.
                description=description,
                source=source_url,
                authors=authors,
                groups=groups,
                metadata={
                    "toxicity": str(item.get("toxicity", "")),
                    "jailbreaking": str(item.get("jailbreaking", "")),
                    "human_annotation": str(item.get("human_annotation", "")),
                },
            )
            for item in data
        ]

        logger.info(f"Successfully loaded {len(seed_prompts)} prompts from ToxicChat dataset")

        return SeedDataset(seeds=seed_prompts, dataset_name=self.dataset_name)

romanlutz and others added 2 commits March 2, 2026 06:49
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 15:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment on lines +58 to +61
async def test_fetch_dataset_preserves_jinja2_content(self):
"""Test that entries with Jinja2-like content are preserved via raw wrapping."""
data_with_html = [
{
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests don’t exercise the advertised behavior of skipping truly Jinja2-incompatible inputs (e.g., user_input containing {% endraw %} that can break parsing even when wrapped). After adding skip/exception handling in the loader, add a unit test asserting such an entry is skipped while the rest of the dataset still loads.

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +95
seed_prompts = [
SeedPrompt(
value=f"{{% raw %}}{item['user_input']}{{% endraw %}}",
data_type="text",
dataset_name=self.dataset_name,
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_dataset builds SeedPrompt objects in a list comprehension, but SeedPrompt.__post_init__ can raise when Jinja2 cannot parse the template (e.g., if user_input contains {% endraw %} or otherwise breaks the outer {% raw %}...{% endraw %} wrapper). In that case, the entire dataset load will fail rather than “gracefully skipping” bad entries as described. Consider switching to an explicit loop that wraps SeedPrompt(...) construction in a try/except (catch ValueError / Jinja2 TemplateSyntaxError), logs a warning with conv_id, and continues.

Copilot uses AI. Check for mistakes.
romanlutz and others added 3 commits March 2, 2026 14:45
…hat-dataset

# Conflicts:
#	doc/code/datasets/1_loading_datasets.ipynb
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 3, 2026 00:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

pyproject.toml:300

  • The comment says this per-file ignore is "only" for the copyright check, but the rule list now also ignores E402/E501/TCH under doc/**. Update the comment to reflect the broader scope (or narrow the ignores if that wasn’t intended).
# Ignore copyright check only in doc/ directory
"doc/**" = ["CPY001", "E402", "E501", "TCH"]

mock_fetch.assert_called_once()
call_kwargs = mock_fetch.call_args.kwargs
assert call_kwargs["dataset_name"] == "lmsys/toxic-chat"
assert call_kwargs["config"] == "custom_config"
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test verifies dataset_name and config are passed through to _fetch_from_huggingface, but it doesn't assert split (and optionally cache) are forwarded. Adding those assertions would better match the coverage pattern used in other dataset loader tests (e.g., SALAD-Bench) and prevent regressions.

Suggested change
assert call_kwargs["config"] == "custom_config"
assert call_kwargs["config"] == "custom_config"
assert call_kwargs["split"] == "test"

Copilot uses AI. Check for mistakes.
…hat-dataset

# Conflicts:
#	pyrit/datasets/seed_datasets/remote/__init__.py
Copilot AI review requested due to automatic review settings March 3, 2026 05:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py:104

  • Seed.metadata supports integer values, and other remote dataset loaders often preserve numeric metadata as ints (e.g., grades in vlsu_multimodal_dataset). Converting toxicity/jailbreaking to str here loses type information and can make downstream consumers re-parse these fields. Prefer keeping these fields as int when present (and only fall back to empty string / None when missing).
                metadata={
                    "toxicity": str(item.get("toxicity", "")),
                    "jailbreaking": str(item.get("jailbreaking", "")),
                    "human_annotation": str(item.get("human_annotation", "")),
                },

Comment on lines +14 to +16
class _ToxicChatDataset(_RemoteDatasetLoader):
"""
Loader for the ToxicChat dataset from HuggingFace.
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loader likely won’t be discoverable via the normal dataset auto-registration path. pyrit.datasets imports pyrit.datasets.seed_datasets.remote, and remote/__init__.py explicitly imports each concrete loader to trigger SeedDatasetProvider.__init_subclass__ registration—but _ToxicChatDataset isn’t added there, so it won’t appear in SeedDatasetProvider.get_all_dataset_names() unless users import this module directly. Add an import (and __all__ entry) for this loader in pyrit/datasets/seed_datasets/remote/__init__.py and consider adding a unit test asserting "toxic_chat" is registered after importing pyrit.datasets.

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +107
seed_prompts = [
SeedPrompt(
value=f"{{% raw %}}{item['user_input']}{{% endraw %}}",
data_type="text",
dataset_name=self.dataset_name,
description=description,
source=source_url,
authors=authors,
groups=groups,
metadata={
"toxicity": str(item.get("toxicity", "")),
"jailbreaking": str(item.get("jailbreaking", "")),
"human_annotation": str(item.get("human_annotation", "")),
},
)
for item in data
]
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says the loader “gracefully skips entries with Jinja2-incompatible content,” but this implementation currently wraps every user_input in a {% raw %}...{% endraw %} block and never skips any records (and the tests assert all items are preserved). Either update the PR description to match the implemented behavior, or implement the described skipping (e.g., catch template parsing failures during SeedPrompt creation and continue).

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 3, 2026 05:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment on lines +14 to +32
class _ToxicChatDataset(_RemoteDatasetLoader):
"""
Loader for the ToxicChat dataset from HuggingFace.

ToxicChat contains approximately 10k real user-chatbot conversations from the Chatbot Arena,
annotated for toxicity and jailbreaking attempts. It provides real-world examples of
how users interact with LLMs in adversarial ways.

References:
- https://huggingface.co/datasets/lmsys/toxic-chat
- https://arxiv.org/abs/2310.17389
License: CC BY-NC 4.0

Warning: This dataset contains toxic, offensive, and jailbreaking content from real user
conversations. Consult your legal department before using these prompts for testing.
"""

HF_DATASET_NAME: str = "lmsys/toxic-chat"

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new remote dataset provider won’t be auto-discoverable unless it’s imported in pyrit/datasets/seed_datasets/remote/__init__.py (that module imports concrete providers to trigger registration via SeedDatasetProvider.__init_subclass__). Add _ToxicChatDataset to the remote __init__.py imports/__all__ so SeedDatasetProvider.get_all_dataset_names() can find it.

Copilot uses AI. Check for mistakes.
Comment on lines +109 to +113
)
except Exception:
conv_id = item.get("conv_id", "unknown")
logger.debug(f"Skipping entry with conv_id={conv_id}: failed to parse as Jinja2 template")

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except Exception: will also swallow non-template failures (e.g., missing user_input due to schema changes) and the log drops the underlying exception details. Prefer validating required keys up-front and/or catching specific Jinja2 parse exceptions, and include exception info in the log (e.g., exc_info=True) so unexpected errors don’t get silently skipped.

Copilot uses AI. Check for mistakes.
romanlutz and others added 2 commits March 2, 2026 21:34
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Catch jinja2.TemplateError instead of broad Exception so schema
  errors (e.g., missing user_input key) propagate instead of being
  silently swallowed
- Extract user_input key access before try block for clarity

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 3, 2026 12:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment on lines +95 to +101
user_input = item["user_input"]
try:
seed_prompts.append(
SeedPrompt(
value=f"{{% raw %}}{user_input}{{% endraw %}}",
data_type="text",
dataset_name=self.dataset_name,
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SeedPrompt.__post_init__ calls render_template_value_silent(), which returns early (without rendering) when it detects Jinja2 control-structure tokens like {% for ... %} and the loop collection variable isn’t provided. Because this loader wraps every user_input in {% raw %}...{% endraw %}, a ToxicChat entry containing something like {% for x in items %} can cause render_template_value_silent() to return the raw-wrapped string, leaving the {% raw %} / {% endraw %} markers in the stored prompt value instead of the original text. Consider stripping the wrapper after SeedPrompt creation (only when it’s still present) or forcing a render path that always removes the wrapper, and add a regression test for a {% for ... %} example to prevent this from silently changing prompt content.

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +88
async def test_fetch_dataset_preserves_jinja2_content(self):
"""Test that entries with Jinja2-like content are preserved via raw wrapping."""
data_with_html = [
{
"conv_id": "good1",
"user_input": "Normal question",
"model_output": "Normal answer.",
"human_annotation": "False",
"toxicity": 0,
"jailbreaking": 0,
"openai_moderation": "[]",
},
{
"conv_id": "html1",
"user_input": "<!DOCTYPE html>{%block%}broken",
"model_output": "N/A",
"human_annotation": "False",
"toxicity": 0,
"jailbreaking": 0,
"openai_moderation": "[]",
},
]
loader = _ToxicChatDataset()

with patch.object(loader, "_fetch_from_huggingface", new=AsyncMock(return_value=data_with_html)):
dataset = await loader.fetch_dataset()

assert len(dataset.seeds) == 2
assert dataset.seeds[0].value == "Normal question"
assert dataset.seeds[1].value == "<!DOCTYPE html>{%block%}broken"

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current tests cover raw-wrapping for block-like content and skipping on embedded {% endraw %}, but they don’t cover the case where user_input contains a Jinja2 control-structure token like {% for x in items %}. With the current Seed.render_template_value_silent() logic, that pattern can prevent the raw wrapper from being rendered away, leaving {% raw %} / {% endraw %} in the final SeedPrompt.value. Adding a regression test for a {% for ... %}-style input would help catch this behavior.

Copilot uses AI. Check for mistakes.
@varunj-msft
Copy link
Contributor

Looks good to approve.

Minor thought, not blocking: could be worth considering harm_categories down the line - most other loaders populate it, which enables get_values(harm_categories=...) filtering. The binary flags here are a bit different from string labeled categories in other datasets though, so I can see why metadata is the right home for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants