MAINT Add pre-commit hook to sanitize user paths in notebook outputs#1429
Merged
romanlutz merged 16 commits intoAzure:mainfrom Mar 3, 2026
Merged
MAINT Add pre-commit hook to sanitize user paths in notebook outputs#1429romanlutz merged 16 commits intoAzure:mainfrom
romanlutz merged 16 commits intoAzure:mainfrom
Conversation
Adds a pre-commit hook that strips user-specific path prefixes (e.g., C:\Users\username\, /home/username/, /Users/username/) from notebook cell outputs. This prevents leaking local environment details and creates cleaner, more reproducible diffs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Run the new sanitize-notebook-paths hook across all 40 notebooks that contained user-specific path prefixes in their outputs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a local pre-commit hook (and supporting script/tests) to strip user-specific home-directory prefixes from Jupyter notebook outputs, reducing accidental leakage of local environment details and making notebook diffs more reproducible.
Changes:
- Introduces
build_scripts/sanitize_notebook_paths.pyto sanitize user home paths inside notebook outputs. - Adds unit tests validating path stripping and end-to-end notebook sanitization behavior.
- Registers the sanitizer as a local pre-commit hook for
doc/**/*.ipynb.
Reviewed changes
Copilot reviewed 43 out of 43 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
build_scripts/sanitize_notebook_paths.py |
Implements path-prefix stripping across notebook cell outputs and a CLI entrypoint for pre-commit. |
tests/unit/build_scripts/test_sanitize_notebook_paths.py |
Adds unit tests for platform path variants and notebook rewrite/idempotency behavior. |
.pre-commit-config.yaml |
Adds a local pre-commit hook to run the sanitizer on documentation notebooks. |
This was referenced Mar 2, 2026
…ook-stderr # Conflicts: # doc/code/datasets/1_loading_datasets.ipynb
Resolve ipynb conflicts by taking main's versions and re-sanitizing user paths with the new pre-commit hook. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address review comments: - Sanitize traceback and evalue fields in error outputs - Restrict data MIME sanitization to text/* and application/json Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…romanlutz/PyRIT into romanlutz/strip-notebook-stderr
rlundeen2
reviewed
Mar 3, 2026
…rack modified flag - Paths now normalized to ./ prefix with forward slashes (e.g., ./project/file.py) - Support lowercase drive letters - Use ensure_ascii=False to preserve unicode (emojis etc.) - Track modified flag instead of double JSON serialization - Add unicode preservation test and lowercase drive letter test - Replace personal username in tests with generic testuser Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
94ee865 to
16a09e6
Compare
…ook-stderr # Conflicts: # doc/code/datasets/1_loading_datasets.ipynb
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
application/json outputs in notebooks are typically nested dicts/lists, which _sanitize_output_field cannot handle (it only processes str and list[str]). Remove it from the mime filter and add a test confirming these outputs are left untouched. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rlundeen2
approved these changes
Mar 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a pre-commit hook that strips user-specific path prefixes (e.g., C:\Users\username, /home/username/, /Users/username/) from notebook cell outputs. This prevents leaking local environment details and creates cleaner, more reproducible diffs.