Skip to content

MAINT Add pre-commit hook to sanitize user paths in notebook outputs#1429

Merged
romanlutz merged 16 commits intoAzure:mainfrom
romanlutz:romanlutz/strip-notebook-stderr
Mar 3, 2026
Merged

MAINT Add pre-commit hook to sanitize user paths in notebook outputs#1429
romanlutz merged 16 commits intoAzure:mainfrom
romanlutz:romanlutz/strip-notebook-stderr

Conversation

@romanlutz
Copy link
Contributor

Adds a pre-commit hook that strips user-specific path prefixes (e.g., C:\Users\username, /home/username/, /Users/username/) from notebook cell outputs. This prevents leaking local environment details and creates cleaner, more reproducible diffs.

Adds a pre-commit hook that strips user-specific path prefixes
(e.g., C:\Users\username\, /home/username/, /Users/username/) from
notebook cell outputs. This prevents leaking local environment details
and creates cleaner, more reproducible diffs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 13:38
Run the new sanitize-notebook-paths hook across all 40 notebooks
that contained user-specific path prefixes in their outputs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a local pre-commit hook (and supporting script/tests) to strip user-specific home-directory prefixes from Jupyter notebook outputs, reducing accidental leakage of local environment details and making notebook diffs more reproducible.

Changes:

  • Introduces build_scripts/sanitize_notebook_paths.py to sanitize user home paths inside notebook outputs.
  • Adds unit tests validating path stripping and end-to-end notebook sanitization behavior.
  • Registers the sanitizer as a local pre-commit hook for doc/**/*.ipynb.

Reviewed changes

Copilot reviewed 43 out of 43 changed files in this pull request and generated 3 comments.

File Description
build_scripts/sanitize_notebook_paths.py Implements path-prefix stripping across notebook cell outputs and a CLI entrypoint for pre-commit.
tests/unit/build_scripts/test_sanitize_notebook_paths.py Adds unit tests for platform path variants and notebook rewrite/idempotency behavior.
.pre-commit-config.yaml Adds a local pre-commit hook to run the sanitizer on documentation notebooks.

romanlutz and others added 2 commits March 2, 2026 14:42
…ook-stderr

# Conflicts:
#	doc/code/datasets/1_loading_datasets.ipynb
Resolve ipynb conflicts by taking main's versions and re-sanitizing
user paths with the new pre-commit hook.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 23:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 43 out of 43 changed files in this pull request and generated 1 comment.

romanlutz and others added 5 commits March 2, 2026 16:01
Address review comments:
- Sanitize traceback and evalue fields in error outputs
- Restrict data MIME sanitization to text/* and application/json

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 3, 2026 00:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 43 out of 43 changed files in this pull request and generated 3 comments.

@rlundeen2 rlundeen2 self-assigned this Mar 3, 2026
romanlutz and others added 2 commits March 2, 2026 20:39
…rack modified flag

- Paths now normalized to ./ prefix with forward slashes (e.g., ./project/file.py)
- Support lowercase drive letters
- Use ensure_ascii=False to preserve unicode (emojis etc.)
- Track modified flag instead of double JSON serialization
- Add unicode preservation test and lowercase drive letter test
- Replace personal username in tests with generic testuser

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@romanlutz romanlutz force-pushed the romanlutz/strip-notebook-stderr branch from 94ee865 to 16a09e6 Compare March 3, 2026 04:45
Copilot AI review requested due to automatic review settings March 3, 2026 04:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 50 out of 50 changed files in this pull request and generated 2 comments.

romanlutz and others added 3 commits March 2, 2026 21:23
…ook-stderr

# Conflicts:
#	doc/code/datasets/1_loading_datasets.ipynb
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 3, 2026 05:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 50 out of 50 changed files in this pull request and generated 2 comments.

application/json outputs in notebooks are typically nested dicts/lists,
which _sanitize_output_field cannot handle (it only processes str and
list[str]). Remove it from the mime filter and add a test confirming
these outputs are left untouched.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 3, 2026 20:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 50 out of 50 changed files in this pull request and generated 2 comments.

@romanlutz romanlutz merged commit afa4315 into Azure:main Mar 3, 2026
42 checks passed
@romanlutz romanlutz deleted the romanlutz/strip-notebook-stderr branch March 3, 2026 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants