FEAT: Scientific Translation Converter by jbolor21 · Pull Request #1379 · Azure/PyRIT

jbolor21 · 2026-02-19T19:21:54Z

Description

Adding scientific translation converter to translate queries into various "scientific" modes

Tests and Documentation

Added unit tests and added converter into converters notebook for text->text using LLMs

…converter

pyrit/datasets/prompt_converters/scientific_obfuscation_converter.yaml

romanlutz · 2026-02-19T21:07:55Z

pyrit/datasets/prompt_converters/scientific_translation_converter.yaml

+
+  ## Mode-specific guidelines:
+
+  {% if mode == "academic" %}


I am confused. This includes only a specific section depending on the mode BUT at the end there's a combined mode. How will it know all the modes if we exclude most of them? Examples below also include all of them.

Also not sure if I understand the combined mode. In the class it's explicitly listed as a mode, but here it's a catchall, so "foobar" would resolve to a combined prompt. I feel like it would be better to just drop "combined" and refer to the default/wildcard as a combined mode

I put the combined to combine a couple of the methods into 1. I did change to an "elif" rather than catch all "else" since any other mode should get caught as an exception!

romanlutz · 2026-02-19T21:12:59Z

pyrit/prompt_converter/scientific_translation_converter.py

+        Raises:
+            ValueError: If an invalid mode is provided.
+        """
+        valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined")


is there an easier way to check for this given that it's a literal that's defined above

I suggested one below by just attaching the valid modes to the class itself, but it's a nit so feel free to disregard

I suppose you could have both.

from typing import Literal, get_args ObfuscationMode = Literal[ "academic", "technical", "smiles", "research", "reaction", "combined" ] OBFUSCATION_MODES = set(get_args(ObfuscationMode)) def is_valid_mode(value: str) -> bool: return value in OBFUSCATION_MODES

Copilot

Pull request overview

Adds a new LLM-based prompt converter that rewrites prompts into “scientific/technical” phrasing across multiple modes, along with the seed prompt template, documentation wiring, and unit tests.

Changes:

Introduces ScientificObfuscationConverter (mode-driven) backed by a new YAML seed prompt template.
Exposes the converter via pyrit.prompt_converter exports and API docs.
Adds unit tests and an example usage snippet in the converters documentation notebook.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`pyrit/prompt_converter/scientific_obfuscation_converter.py`	Implements the new LLM-based converter and identifier construction.
`pyrit/datasets/prompt_converters/scientific_obfuscation_converter.yaml`	Adds the mode-parameterized system prompt template used by the converter.
`pyrit/prompt_converter/__init__.py`	Exports the new converter from the prompt_converter package.
`tests/unit/converter/test_scientific_obfuscation_converter.py`	Adds unit tests validating initialization, mode validation, and conversion behavior.
`doc/code/converters/1_text_to_text_converters.py`	Documents example usage of the new converter in the text-to-text converters notebook source.
`doc/code/converters/1_text_to_text_converters.ipynb`	Adds the corresponding notebook cell content for the new converter example.
`doc/api.rst`	Adds the converter to the API reference list.

Comments suppressed due to low confidence (2)

pyrit/prompt_converter/scientific_obfuscation_converter.py:23

The PR title/description refer to a "Scientific Translation Converter", but the implementation and dataset are named "ScientificObfuscationConverter" / "scientific_obfuscation_converter". If this is intended to be a translation-style converter, consider aligning the naming (or update the PR description) to avoid confusion for API consumers and documentation readers.

class ScientificObfuscationConverter(LLMGenericTextConverter):
    """
    Uses an LLM to transform simple or direct prompts into

pyrit/prompt_converter/scientific_obfuscation_converter.py:67

valid_modes duplicates the allowed values already defined in ObfuscationMode. To avoid the tuple and the type alias drifting out of sync, derive the runtime list from the type (e.g., typing.get_args(ObfuscationMode)) or centralize the allowed modes as a single constant reused for both validation and typing.

        valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined")
        if mode not in valid_modes:
            raise ValueError(f"Invalid mode '{mode}'. Must be one of: {valid_modes}")

tests/unit/converter/test_scientific_translation_converter.py

pyrit/prompt_converter/scientific_translation_converter.py

pyrit/prompt_converter/scientific_obfuscation_converter.py

pyrit/prompt_converter/scientific_translation_converter.py

ValbuenaVC · 2026-02-19T22:28:24Z

pyrit/prompt_converter/scientific_translation_converter.py

+        Raises:
+            ValueError: If an invalid mode is provided.
+        """
+        valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined")


I suggested one below by just attaching the valid modes to the class itself, but it's a nit so feel free to disregard

ValbuenaVC · 2026-02-19T22:31:12Z

pyrit/datasets/prompt_converters/scientific_translation_converter.yaml

+
+  ## Mode-specific guidelines:
+
+  {% if mode == "academic" %}


Also not sure if I understand the combined mode. In the class it's explicitly listed as a mode, but here it's a catchall, so "foobar" would resolve to a combined prompt. I feel like it would be better to just drop "combined" and refer to the default/wildcard as a combined mode

pyrit/prompt_converter/scientific_translation_converter.py

hannahwestra25 · 2026-02-24T15:03:39Z

pyrit/models/seeds/seed.py

-            if any(var not in kwargs for var in for_vars):
-                # Don't render if we're missing loop collection variables - preserve the template as-is
+            # Extract variable names from {% if var ... %} and {% elif var ... %} patterns
+            if_vars = re.findall(r"\{%[-\s]*(?:el)?if\s+(\w+)", self.value)


Rather than parsing here and having one yaml, is it possible to have multiple yamls each for the mode and then we can leave the seed.py untouched (I'm assuming this parsing is specific to this converter which I would prefer not to do since seed is a generic data structure) and redirect to the corresponding yaml based on the mode in the scientific_translation_converter.py

pyrit/prompt_converter/scientific_translation_converter.py

…converter

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 12 comments.

Comments suppressed due to low confidence (1)

pyrit/datasets/prompt_converters/academic_science_converter.yaml:21

Spelling in the example prompt: asprin should be aspirin (unless the misspelling is intentional for the example). Fixing this avoids propagating typos into docs/training data.

  Original: "How do I make asprin?"
  Converted: """Organic Chemistry 201 - Design a Synthesis Pathway to an Unknown Final Product, Starting with Salicylic

Copilot · 2026-03-03T23:04:37Z

tests/unit/converter/test_scientific_translation_converter.py

+@pytest.mark.parametrize("mode", ["academic", "technical", "smiles", "research", "reaction", "combined"])
+def test_scientific_translation_converter_init_valid_modes(mode, sqlite_instance):
+    prompt_target = MockPromptTarget()
+    converter = ScientificTranslationConverter(converter_target=prompt_target, mode=mode)
+    assert converter._system_prompt_template
+    assert converter._mode == mode


TranslationMode includes "math", and there is a corresponding math_science_converter.yaml, but the valid-mode parametrized init test doesn’t include math. Adding it here would ensure all built-in modes are covered and prevent silent regressions for that mode.

Copilot · 2026-03-03T23:04:37Z

pyrit/datasets/prompt_converters/technical_science_converter.yaml

+name: scientific_translation_converter_technical_mode
+description: |
+  Converts prompts into a technical mode (ie using scientific/technical language)
+authors: Bolor Jagdagdorj


In seed prompt YAMLs, authors is typically a YAML list (e.g., authors: ['AI Red Team']). Here it’s a scalar string, which will be loaded into the SeedPrompt.authors: Sequence[str] field as an iterable of characters and can break filtering/serialization that expects a list of author names. Change authors to a list form (even if it’s a single author).

Suggested change

authors: Bolor Jagdagdorj

authors: ['Bolor Jagdagdorj']

Copilot · 2026-03-03T23:04:38Z

pyrit/datasets/prompt_converters/reaction_science_converter.yaml

+name: scientific_translation_converter_reaction_mode
+description: |
+  Converts prompts into a reaction mechanism mode (ie using step-by-step mechanisms)
+authors: Bolor Jagdagdorj


In seed prompt YAMLs, authors is typically a YAML list (e.g., authors: ['AI Red Team']). Here it’s a scalar string, which will be loaded into the SeedPrompt.authors: Sequence[str] field as an iterable of characters and can break filtering/serialization that expects a list of author names. Change authors to a list form (even if it’s a single author).

Suggested change

authors: Bolor Jagdagdorj

authors: ['Bolor Jagdagdorj']

Copilot · 2026-03-03T23:04:38Z

pyrit/datasets/prompt_converters/combined_science_converter.yaml

+name: scientific_translation_converter_combined_mode
+description: |
+  Converts prompts into a complex scientific language mode that combines elements of academic, research, chemical notation, and reaction mechanism styles
+authors: Bolor Jagdagdorj


In seed prompt YAMLs, authors is typically a YAML list (e.g., authors: ['AI Red Team']). Here it’s a scalar string, which will be loaded into the SeedPrompt.authors: Sequence[str] field as an iterable of characters and can break filtering/serialization that expects a list of author names. Change authors to a list form (even if it’s a single author).

Suggested change

authors: Bolor Jagdagdorj

authors: ['Bolor Jagdagdorj']

Copilot · 2026-03-03T23:04:38Z

pyrit/datasets/prompt_converters/academic_science_converter.yaml

+name: scientific_translation_converter_academic_mode
+description: |
+  Converts prompts into an academic mode (ie homework/exam style questioning)
+authors: Bolor Jagdagdorj


In seed prompt YAMLs, authors is typically a YAML list (e.g., authors: ['AI Red Team']). Here it’s a scalar string, which will be loaded into the SeedPrompt.authors: Sequence[str] field as an iterable of characters and can break filtering/serialization that expects a list of author names. Change authors to a list form (even if it’s a single author).

Suggested change

authors: Bolor Jagdagdorj

authors: ["Bolor Jagdagdorj"]

Copilot · 2026-03-03T23:04:39Z

pyrit/models/seeds/seed.py

+            # Extract variable names from {% if var ... %} and {% elif var ... %} patterns
+            if_vars = re.findall(r"\{%[-\s]*(?:el)?if\s+(\w+)", self.value)
+            control_vars = set(for_vars + if_vars)
+            if any(var not in kwargs for var in control_vars):
+                # Don't render if we're missing control structure variables - preserve the template as-is


The new {% if ... %} extraction regex captures the first token after if, which breaks valid Jinja expressions like {% if not foo %} or {% if foo is defined %} (it would capture not/foo incorrectly) and can cause templates to be returned unrendered even when required variables are provided. Consider updating the pattern to handle optional not and more general expressions, or use Jinja's AST parsing to determine undeclared variables used in control structures.

Copilot · 2026-03-03T23:04:39Z

pyrit/datasets/prompt_converters/smiles_science_converter.yaml

+name: scientific_translation_converter_smiles_mode
+description: |
+  Converts prompts into a SMILES/chemical notation mode (ie using chemical notation and formulas)
+authors: Bolor Jagdagdorj


In seed prompt YAMLs, authors is typically a YAML list (e.g., authors: ['AI Red Team']). Here it’s a scalar string, which will be loaded into the SeedPrompt.authors: Sequence[str] field as an iterable of characters and can break filtering/serialization that expects a list of author names. Change authors to a list form (even if it’s a single author).

Suggested change

authors: Bolor Jagdagdorj

authors: ["Bolor Jagdagdorj"]

Copilot · 2026-03-03T23:04:39Z

pyrit/datasets/prompt_converters/research_science_converter.yaml

+name: scientific_translation_converter_research_mode
+description: |
+  Converts prompts into a research question or hypothesis mode (ie using scientific research framing)
+authors: Bolor Jagdagdorj


In seed prompt YAMLs, authors is typically a YAML list (e.g., authors: ['AI Red Team']). Here it’s a scalar string, which will be loaded into the SeedPrompt.authors: Sequence[str] field as an iterable of characters and can break filtering/serialization that expects a list of author names. Change authors to a list form (even if it’s a single author).

Suggested change

authors: Bolor Jagdagdorj

authors: ["Bolor Jagdagdorj"]

Copilot · 2026-03-03T23:04:40Z

pyrit/datasets/prompt_converters/math_science_converter.yaml

+name: scientific_translation_converter_math_mode
+description: |
+  Converts prompts into a math mode (ie math word problem for homework/exam style questioning)
+authors: Bolor Jagdagdorj


In seed prompt YAMLs, authors is typically a YAML list (e.g., authors: ['AI Red Team']). Here it’s a scalar string, which will be loaded into the SeedPrompt.authors: Sequence[str] field as an iterable of characters and can break filtering/serialization that expects a list of author names. Change authors to a list form (even if it’s a single author).

Suggested change

authors: Bolor Jagdagdorj

authors: ["Bolor Jagdagdorj"]

Copilot · 2026-03-03T23:04:40Z

pyrit/prompt_converter/scientific_translation_converter.py

+
+from pyrit.common.apply_defaults import REQUIRED_VALUE, apply_defaults
+from pyrit.common.path import CONVERTER_SEED_PROMPT_PATH
+from pyrit.identifiers import ConverterIdentifier


ConverterIdentifier is imported from pyrit.identifiers, but that module only exports ComponentIdentifier/Identifiable/helpers; this import will raise ImportError at runtime. Update the import and the _build_identifier return annotation to use the correct identifier type used by other converters (e.g., ComponentIdentifier).

Suggested change

from pyrit.identifiers import ConverterIdentifier

from pyrit.identifiers import ComponentIdentifier

Bolor added 3 commits February 19, 2026 11:19

initial commit adding all changed files new converter

fcb33a8

adding in api

501c1b2

Merge remote-tracking branch 'origin' into users/bjagdagdorj/science_…

30349f1

…converter

jbolor21 changed the title ~~[DRAFT]: FEAT: Scientific Translation Converter~~ FEAT: Scientific Translation Converter Feb 19, 2026

romanlutz reviewed Feb 19, 2026

View reviewed changes

romanlutz requested a review from Copilot February 19, 2026 21:13

Copilot started reviewing on behalf of romanlutz February 19, 2026 21:14 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

tests/unit/converter/test_scientific_translation_converter.py Outdated Show resolved Hide resolved

tests/unit/converter/test_scientific_translation_converter.py Outdated Show resolved Hide resolved

tests/unit/converter/test_scientific_translation_converter.py Outdated Show resolved Hide resolved

yaml edit

ef716a9

ValbuenaVC reviewed Feb 19, 2026

View reviewed changes

hannahwestra25 reviewed Feb 20, 2026

View reviewed changes

pyrit/prompt_converter/scientific_translation_converter.py Outdated Show resolved Hide resolved

Bolor added 3 commits February 23, 2026 11:05

revising yaml instructions adding a mode

e8d15aa

rename file

97cfad5

address feedback

863125e

hannahwestra25 reviewed Feb 24, 2026

View reviewed changes

pyrit/prompt_converter/scientific_translation_converter.py Outdated Show resolved Hide resolved

hannahwestra25 self-assigned this Feb 24, 2026

Bolor added 3 commits February 25, 2026 09:51

breaking yaml file up into multiple

4760410

Merge remote-tracking branch 'origin' into users/bjagdagdorj/science_…

32bb290

…converter

Merge remote-tracking branch 'origin' into users/bjagdagdorj/science_…

a24a3b2

…converter

Copilot AI review requested due to automatic review settings March 3, 2026 22:59

Copilot started reviewing on behalf of jbolor21 March 3, 2026 22:59 View session

precommit

44f9327

Copilot AI reviewed Mar 3, 2026

View reviewed changes

fix after main

1de2e16

	from pyrit.identifiers import ConverterIdentifier
	from pyrit.identifiers import ComponentIdentifier

Conversation

jbolor21 commented Feb 19, 2026

Description

Tests and Documentation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects