Skip to content

Fix composite transform reproducibility#813

Merged
qiyanjun merged 1 commit intoQData:masterfrom
mys007:fix-composite-transform-determinism
Apr 17, 2026
Merged

Fix composite transform reproducibility#813
qiyanjun merged 1 commit intoQData:masterfrom
mys007:fix-composite-transform-determinism

Conversation

@mys007
Copy link
Copy Markdown
Contributor

@mys007 mys007 commented Mar 31, 2025

What does this PR do?

Summary

Fix unwanted non-determinism text order in CompositeTransformation. Not using set for text makes reproducible runs possible.

Additions

Changes

Deletions

Checklist

  • The title of your pull request should be a summary of its contribution.
  • Please write detailed description of what parts have been newly added and what parts have been modified. Please also explain why certain changes were made.
  • [ ] If your pull request addresses an issue, please mention the issue number in the pull request description to make sure they are linked (and people consulting the issue know you are working on it)
  • [ ] To indicate a work in progress please mark it as a draft on Github.
  • [ ] Make sure existing tests pass.
  • [ ] Add relevant tests. No quality testing = no merge.
  • [ ] All public methods must have informative docstrings that work nicely with sphinx. For new modules/files, please add/modify the appropriate .rst file in TextAttack/docs/apidoc.'

Copy link
Copy Markdown
Collaborator

@yanjunqiAz yanjunqiAz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified the fix: Python's set iteration order varies across runs due to hash randomization (PYTHONHASHSEED), making CompositeTransformation nondeterministic. The ordered deduplication preserves insertion order while still removing duplicates, ensuring reproducible results.

Tested with PYTHONHASHSEED=random — set ordering differs on every run, confirming this is a real reproducibility issue.

@yanjunqiAz
Copy link
Copy Markdown
Collaborator

Yes, PR #813 is needed. It fixes a real bug.

The current code uses a set() to collect transformed texts, then converts to a list(). In Python, set iteration order is non-deterministic across runs (it depends on hash seeds, which are randomized by default via PYTHONHASHSEED). This means the same attack with the same seed can produce different candidate orderings between runs, breaking reproducibility.

The PR replaces the set with a list + order-preserving deduplication, which preserves the insertion order from each transformation while still removing duplicates. This is a correct and minimal fix.

@qiyanjun qiyanjun merged commit 7f4a993 into QData:master Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants