-
Notifications
You must be signed in to change notification settings - Fork 682
MAINT Add pre-commit hook to sanitize user paths in notebook outputs #1429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
romanlutz
merged 16 commits into
Azure:main
from
romanlutz:romanlutz/strip-notebook-stderr
Mar 3, 2026
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
c358fc5
Add pre-commit hook to sanitize user paths in notebook outputs
romanlutz 0ac2028
Sanitize user paths in all existing notebooks
romanlutz 6d7964a
Merge remote-tracking branch 'origin/main' into romanlutz/strip-noteb…
romanlutz 7ad8df7
Merge main into romanlutz/strip-notebook-stderr
romanlutz 1354b66
fix: sanitize traceback/evalue fields, skip binary MIME types
romanlutz 4c90a52
merge: resolve conflicts and sanitize notebook paths
romanlutz d3487a8
fix: nbstripout cleanup
romanlutz 1316362
Merge branch 'romanlutz/strip-notebook-stderr' of https://github.com/…
romanlutz 1d210fe
Merge remote-tracking branch 'origin/main' into romanlutz/strip-noteb…
romanlutz a8fb94a
Merge remote-tracking branch 'origin/main' into romanlutz/strip-noteb…
romanlutz 16a09e6
Address review comments: normalize paths to ./, ensure_ascii=False, t…
romanlutz 24cb775
Merge remote-tracking branch 'origin/main' into romanlutz/strip-noteb…
romanlutz 7257918
Fix PERF401: use list comprehension for modified_files
romanlutz 1de1a96
Make Windows path regex case-insensitive and add test
romanlutz 5a1efe5
Remove application/json from sanitized mime types
romanlutz 8c65a55
Merge branch 'main' into romanlutz/strip-notebook-stderr
romanlutz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,129 @@ | ||
| # Copyright (c) Microsoft Corporation. | ||
| # Licensed under the MIT license. | ||
|
|
||
| import json | ||
| import re | ||
| import sys | ||
| from re import Match | ||
|
|
||
| # Windows path: captures user prefix AND remaining path for normalization | ||
| _WINDOWS_PATH_PATTERN = re.compile( | ||
| r"[A-Za-z]:\\+Users\\+[^\\]+\\+((?:[^\\\s\"',:;]+\\+)*[^\\\s\"',:;]*)", | ||
| re.IGNORECASE, | ||
| ) | ||
| # Unix paths: just match the prefix | ||
| _UNIX_PATH_PATTERNS = [ | ||
| re.compile(r"/Users/[^/]+/"), # macOS | ||
| re.compile(r"/home/[^/]+/"), # Linux | ||
romanlutz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ] | ||
|
|
||
|
|
||
| def _windows_path_replacer(match: Match[str]) -> str: | ||
| """Replace Windows user path prefix with ./ and normalize backslashes to forward slashes.""" | ||
| remainder = match.group(1) | ||
| normalized = remainder.replace("\\", "/") | ||
| # Collapse multiple forward slashes from double-backslash paths | ||
| return "./" + re.sub(r"/+", "/", normalized) | ||
|
|
||
|
|
||
| def sanitize_notebook_paths(file_path: str) -> bool: | ||
| """ | ||
| Remove user-specific path prefixes from notebook cell outputs. | ||
|
|
||
| Replaces paths like C:\\Users\\username\\project\\file.py with ./project/file.py. | ||
|
|
||
| Args: | ||
| file_path (str): Path to the .ipynb file. | ||
|
|
||
| Returns: | ||
| bool: True if the file was modified. | ||
| """ | ||
| if not file_path.endswith(".ipynb"): | ||
| return False | ||
|
|
||
| with open(file_path, encoding="utf-8") as f: | ||
| content = json.load(f) | ||
|
|
||
| modified = False | ||
|
|
||
| for cell in content.get("cells", []): | ||
| for output in cell.get("outputs", []): | ||
| modified = _sanitize_output_field(output, "text") or modified | ||
| modified = _sanitize_output_field(output, "traceback") or modified | ||
| modified = _sanitize_output_field(output, "evalue") or modified | ||
| if "data" in output: | ||
| for mime_type in output["data"]: | ||
| if mime_type.startswith("text/"): | ||
| modified = _sanitize_output_field(output["data"], mime_type) or modified | ||
romanlutz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| if not modified: | ||
| return False | ||
|
|
||
romanlutz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| with open(file_path, "w", encoding="utf-8") as f: | ||
| json.dump(content, f, indent=1, ensure_ascii=False) | ||
| f.write("\n") | ||
|
|
||
| return True | ||
|
|
||
|
|
||
| def _sanitize_output_field(obj: dict, key: str) -> bool: | ||
| """ | ||
| Sanitize a single output field by replacing user path prefixes with ./ normalized paths. | ||
|
|
||
| Args: | ||
| obj (dict): The dict containing the field. | ||
| key (str): The key to sanitize. | ||
|
|
||
| Returns: | ||
| bool: True if the field was modified. | ||
| """ | ||
| value = obj.get(key) | ||
| if value is None: | ||
| return False | ||
|
|
||
| modified = False | ||
|
|
||
| if isinstance(value, list): | ||
| new_list = [] | ||
| for line in value: | ||
| if isinstance(line, str): | ||
| sanitized = _strip_user_paths(line) | ||
| if sanitized != line: | ||
| modified = True | ||
| new_list.append(sanitized) | ||
| else: | ||
| new_list.append(line) | ||
| obj[key] = new_list | ||
| elif isinstance(value, str): | ||
| sanitized = _strip_user_paths(value) | ||
| if sanitized != value: | ||
| modified = True | ||
| obj[key] = sanitized | ||
|
|
||
| return modified | ||
|
|
||
|
|
||
| def _strip_user_paths(text: str) -> str: | ||
| """ | ||
| Replace user-specific path prefixes with ./ and normalize separators. | ||
|
|
||
| Windows paths are normalized to forward slashes. For example, | ||
| C:\\Users\\alice\\project\\file.py becomes ./project/file.py. | ||
|
|
||
| Args: | ||
| text (str): The text to sanitize. | ||
|
|
||
| Returns: | ||
| str: The sanitized text. | ||
| """ | ||
| text = _WINDOWS_PATH_PATTERN.sub(_windows_path_replacer, text) | ||
| for pattern in _UNIX_PATH_PATTERNS: | ||
| text = pattern.sub("./", text) | ||
| return text | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| modified_files = [file_path for file_path in sys.argv[1:] if sanitize_notebook_paths(file_path)] | ||
| if modified_files: | ||
| print("Sanitized user paths in:", modified_files) | ||
| sys.exit(1) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.