polish: unified improvement pass — ZPE-Ink#4
polish: unified improvement pass — ZPE-Ink#4Zer0pa-Architect-Prime wants to merge 3 commits intomainfrom
Conversation
Reviewer's GuideThis PR performs a polish pass across metadata, documentation, CI workflows, and ingestion scripts, with the most substantial change being the introduction of safe archive extraction helpers to prevent path traversal when unpacking external datasets. Sequence diagram for safe archive extraction in gate_e_net_new_ingestionsequenceDiagram
participant Main
participant SafeExtractTar
participant SafeExtractZip
participant ValidatedPath
participant TarFile
participant ZipFile
participant FileSystem
Main->>SafeExtractTar: _safe_extract_tar(math_tgz, math_dir)
SafeExtractTar->>FileSystem: destination_root.resolve()
SafeExtractTar->>TarFile: tarfile.open(archive_path, r:*)
loop for each member in tar archive
SafeExtractTar->>ValidatedPath: _validated_archive_path(destination_root, member.name)
ValidatedPath-->>SafeExtractTar: target Path or ValueError
alt member is directory
SafeExtractTar->>FileSystem: target.mkdir(parents=True, exist_ok=True)
else member is regular file
SafeExtractTar->>TarFile: handle.extractfile(member)
TarFile-->>SafeExtractTar: extracted fileobj
SafeExtractTar->>FileSystem: target.parent.mkdir(parents=True, exist_ok=True)
SafeExtractTar->>FileSystem: write file using shutil.copyfileobj
else member is unsupported type
SafeExtractTar-->>Main: raise ValueError
end
end
SafeExtractTar-->>Main: extraction complete
Main->>SafeExtractZip: _safe_extract_zip(crohme_zip, crohme_dir)
SafeExtractZip->>FileSystem: destination_root.resolve()
SafeExtractZip->>ZipFile: zipfile.ZipFile(archive_path)
loop for each member in zip archive
SafeExtractZip->>ValidatedPath: _validated_archive_path(destination_root, member.filename)
ValidatedPath-->>SafeExtractZip: target Path or ValueError
alt member is directory
SafeExtractZip->>FileSystem: target.mkdir(parents=True, exist_ok=True)
else member is file
SafeExtractZip->>FileSystem: target.parent.mkdir(parents=True, exist_ok=True)
SafeExtractZip->>ZipFile: handle.open(member)
ZipFile-->>SafeExtractZip: extracted fileobj
SafeExtractZip->>FileSystem: write file using shutil.copyfileobj
end
end
SafeExtractZip-->>Main: extraction complete
Class diagram for new safe extraction helpers in gate_e_net_new_ingestionclassDiagram
class gate_e_net_new_ingestion {
+main() int
+_validated_archive_path(destination_root Path, archive_name str) Path
+_safe_extract_tar(archive_path Path, destination_root Path) None
+_safe_extract_zip(archive_path Path, destination_root Path) None
}
class Path {
}
class PurePosixPath {
}
class TarFileModule {
+open(archive_path Path, mode str)
}
class ZipFileModule {
+ZipFile(archive_path Path)
}
class ShutilModule {
+copyfileobj(source, destination)
}
gate_e_net_new_ingestion --> Path
gate_e_net_new_ingestion --> PurePosixPath
gate_e_net_new_ingestion --> TarFileModule
gate_e_net_new_ingestion --> ZipFileModule
gate_e_net_new_ingestion --> ShutilModule
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- In
_validated_archive_path, callingresolve()on the joined path means a pre-existing symlink insidedestination_rootcould still be used to write outside the root; consider validating containment using purely lexical paths (noresolve) or explicitly rejecting symlink components when creating directories/files. - The
permissions: {}block in.github/workflows/auto-add-to-project.ymldisables all default token scopes; double-check that theadd-to-projectjob still has thecontents/projectspermissions it needs, and explicitly grant only those instead of using an empty map.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `_validated_archive_path`, calling `resolve()` on the joined path means a pre-existing symlink inside `destination_root` could still be used to write outside the root; consider validating containment using purely lexical paths (no `resolve`) or explicitly rejecting symlink components when creating directories/files.
- The `permissions: {}` block in `.github/workflows/auto-add-to-project.yml` disables all default token scopes; double-check that the `add-to-project` job still has the `contents`/`projects` permissions it needs, and explicitly grant only those instead of using an empty map.
## Individual Comments
### Comment 1
<location path="code/scripts/gate_e_net_new_ingestion.py" line_range="183-180" />
<code_context>
+ shutil.copyfileobj(extracted, output)
+
+
+def _safe_extract_zip(archive_path: Path, destination_root: Path) -> None:
+ destination_root = destination_root.resolve()
+ with zipfile.ZipFile(archive_path) as handle:
+ for member in handle.infolist():
+ if member.filename in {"", ".", "./"}:
+ continue
+ target = _validated_archive_path(destination_root, member.filename)
+ if member.is_dir():
+ target.mkdir(parents=True, exist_ok=True)
+ continue
+ target.parent.mkdir(parents=True, exist_ok=True)
+ with handle.open(member) as extracted, target.open("wb") as output:
+ shutil.copyfileobj(extracted, output)
+
+
</code_context>
<issue_to_address>
**🚨 suggestion (security):** Zip extraction could benefit from a size guard to reduce risk from overly large or zip-bomb-style archives.
As written, this will extract arbitrarily large members and an unbounded number of entries, which can be abused (e.g., zip bombs) if the input isn’t fully trusted.
Consider enforcing:
- A global cap on total uncompressed bytes.
- A per-file size limit, skipping or erroring on oversized entries.
- Optionally, a maximum entry count.
You can enforce these by wrapping `shutil.copyfileobj` in a helper that tracks bytes copied and stops once thresholds are exceeded.
</issue_to_address>
### Comment 2
<location path=".github/workflows/auto-add-to-project.yml" line_range="8" />
<code_context>
pull_request:
types: [opened]
+permissions: {}
+
jobs:
</code_context>
<issue_to_address>
**issue (bug_risk):** Explicitly disabling all default `GITHUB_TOKEN` permissions may break actions that expect at least `contents: read`.
This hard-disables the default `GITHUB_TOKEN` and can break any step that implicitly relies on `contents: read` or other scopes (e.g. reading repo metadata, listing PRs, posting comments). If the workflow truly only uses a separate PAT and never the default token, this is fine; otherwise, consider explicitly granting only the needed scopes, for example:
```yaml
permissions:
contents: read
pull-requests: write # only if needed
```
Please double-check the job steps to confirm none depend on the default token’s permissions before keeping `permissions: {}`.
</issue_to_address>
### Comment 3
<location path="README.md" line_range="71-72" />
<code_context>
+ <td><code>.zpink</code> packet streams</td>
+ </tr>
+ <tr>
+ <td>Binding packaging</td>
+ <td><code>repo-local source surfaces</code></td>
+ </tr>
<tr>
</code_context>
<issue_to_address>
**nitpick (typo):** Consider using consistent terminology for bindings across the README.
Here you use "repo-local source surfaces," but earlier you say "repo-local sources." Choosing one term and using it throughout will make the README clearer.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| if extracted is None: | ||
| raise ValueError(f"missing tar member payload: {member.name}") | ||
| with extracted, target.open("wb") as output: | ||
| shutil.copyfileobj(extracted, output) |
There was a problem hiding this comment.
🚨 suggestion (security): Zip extraction could benefit from a size guard to reduce risk from overly large or zip-bomb-style archives.
As written, this will extract arbitrarily large members and an unbounded number of entries, which can be abused (e.g., zip bombs) if the input isn’t fully trusted.
Consider enforcing:
- A global cap on total uncompressed bytes.
- A per-file size limit, skipping or erroring on oversized entries.
- Optionally, a maximum entry count.
You can enforce these by wrapping shutil.copyfileobj in a helper that tracks bytes copied and stops once thresholds are exceeded.
| pull_request: | ||
| types: [opened] | ||
|
|
||
| permissions: {} |
There was a problem hiding this comment.
issue (bug_risk): Explicitly disabling all default GITHUB_TOKEN permissions may break actions that expect at least contents: read.
This hard-disables the default GITHUB_TOKEN and can break any step that implicitly relies on contents: read or other scopes (e.g. reading repo metadata, listing PRs, posting comments). If the workflow truly only uses a separate PAT and never the default token, this is fine; otherwise, consider explicitly granting only the needed scopes, for example:
permissions:
contents: read
pull-requests: write # only if neededPlease double-check the job steps to confirm none depend on the default token’s permissions before keeping permissions: {}.
| <td>Binding packaging</td> | ||
| <td><code>repo-local source surfaces</code></td> |
There was a problem hiding this comment.
nitpick (typo): Consider using consistent terminology for bindings across the README.
Here you use "repo-local source surfaces," but earlier you say "repo-local sources." Choosing one term and using it throughout will make the README clearer.
31afb4a to
cc6a28b
Compare
code/pyproject.toml:11now readsauthors = [{name = "Zer0pa"}]; package name and license string remain intact.README.md:7now opens with5.590209480060199x structured-tier ratio.README.md:10-15now uses the short phrase opener, install/transfer boundary,$100Mline, and authority pointer.README.md:24-27andREADME.md:63-72now make install unit, transfer unit, and binding packaging explicit.ink-ci.yml:16-17addscontents: read;auto-add-to-project.yml:8addspermissions: {}.gate_e_net_new_ingestion.py:150-195adds path validation and safe tar/zip extraction; extraction sites switched at:241,:280, and:367.$100Mline is presentgrep -c '100M' README.mdreturned1; the line is atREADME.md:14.zer0-point-energyreferencesrg -n 'zer0-point-energy'returned zero hits in the repo.make test PYTHON=.venv/bin/pythonpassed with24tests green.zpe-inkto PyPIZPE-Ink; dispatch constraints forbid touching any other repo.Summary by Sourcery
Strengthen archive extraction safety, refine project metadata, tighten CI permissions, and clarify README positioning and packaging details.
Bug Fixes:
Enhancements:
CI:
Documentation: