Skip to content

polish: unified improvement pass — ZPE-Ink#4

Open
Zer0pa-Architect-Prime wants to merge 3 commits intomainfrom
polish/unified-improvement-pass
Open

polish: unified improvement pass — ZPE-Ink#4
Zer0pa-Architect-Prime wants to merge 3 commits intomainfrom
polish/unified-improvement-pass

Conversation

@Zer0pa-Architect-Prime
Copy link
Copy Markdown
Contributor

@Zer0pa-Architect-Prime Zer0pa-Architect-Prime commented Apr 6, 2026

# Action Status Evidence
1 Normalize package author metadata ✅ DONE code/pyproject.toml:11 now reads authors = [{name = "Zer0pa"}]; package name and license string remain intact.
2 Lead README with the authority metric ✅ DONE README.md:7 now opens with 5.590209480060199x structured-tier ratio.
3 Replace defensive opener with metric-first packaging language ✅ DONE README.md:10-15 now uses the short phrase opener, install/transfer boundary, $100M line, and authority pointer.
4 Strengthen public transfer and packaging clarity ✅ DONE README.md:24-27 and README.md:63-72 now make install unit, transfer unit, and binding packaging explicit.
5 Add workflow permissions blocks ✅ DONE ink-ci.yml:16-17 adds contents: read; auto-add-to-project.yml:8 adds permissions: {}.
6 Remove unsafe archive extraction ✅ DONE gate_e_net_new_ingestion.py:150-195 adds path validation and safe tar/zip extraction; extraction sites switched at :241, :280, and :367.
7 Ensure no defensive tone in the first 30 README lines ✅ DONE `head -n 30 README.md
8 Ensure $100M line is present ✅ DONE grep -c '100M' README.md returned 1; the line is at README.md:14.
9 Ensure zero zer0-point-energy references ✅ DONE rg -n 'zer0-point-energy' returned zero hits in the repo.
10 Run existing test suite after changes ✅ DONE make test PYTHON=.venv/bin/python passed with 24 tests green.
11 Publish zpe-ink to PyPI ⏳ SKIPPED Dispatch constraints explicitly say not to publish to PyPI in this pass.
12 Prosody mention / cross-repo fix ⏳ SKIPPED The work item points outside ZPE-Ink; dispatch constraints forbid touching any other repo.

Summary by Sourcery

Strengthen archive extraction safety, refine project metadata, tighten CI permissions, and clarify README positioning and packaging details.

Bug Fixes:

  • Replace unsafe tar and zip extraction with path-validated safe extraction helpers to prevent archive traversal issues.

Enhancements:

  • Clarify README messaging to emphasize compression metrics, licensing, authority surface, and explicit install/transfer/binding units.
  • Update package author metadata to use the standardized "Zer0pa" author name.

CI:

  • Add explicit minimal permissions blocks to CI workflows to restrict default GitHub token capabilities.

Documentation:

  • Restructure README introduction and quickstart sections to be metric-first and to clearly describe install, transfer, and binding packaging surfaces.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Apr 6, 2026

Reviewer's Guide

This PR performs a polish pass across metadata, documentation, CI workflows, and ingestion scripts, with the most substantial change being the introduction of safe archive extraction helpers to prevent path traversal when unpacking external datasets.

Sequence diagram for safe archive extraction in gate_e_net_new_ingestion

sequenceDiagram
    participant Main
    participant SafeExtractTar
    participant SafeExtractZip
    participant ValidatedPath
    participant TarFile
    participant ZipFile
    participant FileSystem

    Main->>SafeExtractTar: _safe_extract_tar(math_tgz, math_dir)
    SafeExtractTar->>FileSystem: destination_root.resolve()
    SafeExtractTar->>TarFile: tarfile.open(archive_path, r:*)
    loop for each member in tar archive
        SafeExtractTar->>ValidatedPath: _validated_archive_path(destination_root, member.name)
        ValidatedPath-->>SafeExtractTar: target Path or ValueError
        alt member is directory
            SafeExtractTar->>FileSystem: target.mkdir(parents=True, exist_ok=True)
        else member is regular file
            SafeExtractTar->>TarFile: handle.extractfile(member)
            TarFile-->>SafeExtractTar: extracted fileobj
            SafeExtractTar->>FileSystem: target.parent.mkdir(parents=True, exist_ok=True)
            SafeExtractTar->>FileSystem: write file using shutil.copyfileobj
        else member is unsupported type
            SafeExtractTar-->>Main: raise ValueError
        end
    end
    SafeExtractTar-->>Main: extraction complete

    Main->>SafeExtractZip: _safe_extract_zip(crohme_zip, crohme_dir)
    SafeExtractZip->>FileSystem: destination_root.resolve()
    SafeExtractZip->>ZipFile: zipfile.ZipFile(archive_path)
    loop for each member in zip archive
        SafeExtractZip->>ValidatedPath: _validated_archive_path(destination_root, member.filename)
        ValidatedPath-->>SafeExtractZip: target Path or ValueError
        alt member is directory
            SafeExtractZip->>FileSystem: target.mkdir(parents=True, exist_ok=True)
        else member is file
            SafeExtractZip->>FileSystem: target.parent.mkdir(parents=True, exist_ok=True)
            SafeExtractZip->>ZipFile: handle.open(member)
            ZipFile-->>SafeExtractZip: extracted fileobj
            SafeExtractZip->>FileSystem: write file using shutil.copyfileobj
        end
    end
    SafeExtractZip-->>Main: extraction complete
Loading

Class diagram for new safe extraction helpers in gate_e_net_new_ingestion

classDiagram
    class gate_e_net_new_ingestion {
        +main() int
        +_validated_archive_path(destination_root Path, archive_name str) Path
        +_safe_extract_tar(archive_path Path, destination_root Path) None
        +_safe_extract_zip(archive_path Path, destination_root Path) None
    }

    class Path {
    }

    class PurePosixPath {
    }

    class TarFileModule {
        +open(archive_path Path, mode str)
    }

    class ZipFileModule {
        +ZipFile(archive_path Path)
    }

    class ShutilModule {
        +copyfileobj(source, destination)
    }

    gate_e_net_new_ingestion --> Path
    gate_e_net_new_ingestion --> PurePosixPath
    gate_e_net_new_ingestion --> TarFileModule
    gate_e_net_new_ingestion --> ZipFileModule
    gate_e_net_new_ingestion --> ShutilModule
Loading

File-Level Changes

Change Details Files
Harden archive extraction in the ingestion script to prevent path traversal and unsafe writes.
  • Introduce _validated_archive_path to normalize member names, reject absolute/parent paths, and enforce containment under a given extraction root.
  • Add _safe_extract_tar that iterates tar members, validates paths, restricts to regular files/directories, and streams contents via shutil.copyfileobj instead of using extractall.
  • Add _safe_extract_zip that validates member paths, creates directories, and streams file contents via shutil.copyfileobj instead of using extractall.
  • Replace direct tarfile.extractall and ZipFile.extractall calls for math, CROHME, and UJI datasets with the new safe helpers.
code/scripts/gate_e_net_new_ingestion.py
Retune README messaging to lead with metrics, clarify install/transfer/binding packaging semantics, and remove defensive wording.
  • Replace initial prose with a metric-first summary including structured-tier ratio, codec summary, and explicit install/transfer/binding units plus licensing and authority pointers.
  • Collapse and rephrase earlier status/what-this-is section into concise bullet points describing the deterministic transfer surface and packaging boundaries.
  • Extend the quickstart/metadata table with explicit rows for install unit, transfer unit, and binding packaging; keep verdict metadata but move defensive tone out of the top section.
README.md
Normalize Python package author metadata for the zpe-ink project.
  • Change pyproject.toml authors field from "Zer0pa Labs" to "Zer0pa" while leaving package name, version, license, and other metadata unchanged.
code/pyproject.toml
Tighten GitHub Actions permissions in CI and project-automation workflows.
  • Add a top-level permissions block with contents: read to the main ink-ci workflow.
  • Add an explicit empty permissions: {} block to the auto-add-to-project workflow so jobs don’t inherit broad default token scopes.
.github/workflows/ink-ci.yml
.github/workflows/auto-add-to-project.yml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • In _validated_archive_path, calling resolve() on the joined path means a pre-existing symlink inside destination_root could still be used to write outside the root; consider validating containment using purely lexical paths (no resolve) or explicitly rejecting symlink components when creating directories/files.
  • The permissions: {} block in .github/workflows/auto-add-to-project.yml disables all default token scopes; double-check that the add-to-project job still has the contents/projects permissions it needs, and explicitly grant only those instead of using an empty map.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_validated_archive_path`, calling `resolve()` on the joined path means a pre-existing symlink inside `destination_root` could still be used to write outside the root; consider validating containment using purely lexical paths (no `resolve`) or explicitly rejecting symlink components when creating directories/files.
- The `permissions: {}` block in `.github/workflows/auto-add-to-project.yml` disables all default token scopes; double-check that the `add-to-project` job still has the `contents`/`projects` permissions it needs, and explicitly grant only those instead of using an empty map.

## Individual Comments

### Comment 1
<location path="code/scripts/gate_e_net_new_ingestion.py" line_range="183-180" />
<code_context>
+                shutil.copyfileobj(extracted, output)
+
+
+def _safe_extract_zip(archive_path: Path, destination_root: Path) -> None:
+    destination_root = destination_root.resolve()
+    with zipfile.ZipFile(archive_path) as handle:
+        for member in handle.infolist():
+            if member.filename in {"", ".", "./"}:
+                continue
+            target = _validated_archive_path(destination_root, member.filename)
+            if member.is_dir():
+                target.mkdir(parents=True, exist_ok=True)
+                continue
+            target.parent.mkdir(parents=True, exist_ok=True)
+            with handle.open(member) as extracted, target.open("wb") as output:
+                shutil.copyfileobj(extracted, output)
+
+
</code_context>
<issue_to_address>
**🚨 suggestion (security):** Zip extraction could benefit from a size guard to reduce risk from overly large or zip-bomb-style archives.

As written, this will extract arbitrarily large members and an unbounded number of entries, which can be abused (e.g., zip bombs) if the input isn’t fully trusted.

Consider enforcing:
- A global cap on total uncompressed bytes.
- A per-file size limit, skipping or erroring on oversized entries.
- Optionally, a maximum entry count.

You can enforce these by wrapping `shutil.copyfileobj` in a helper that tracks bytes copied and stops once thresholds are exceeded.
</issue_to_address>

### Comment 2
<location path=".github/workflows/auto-add-to-project.yml" line_range="8" />
<code_context>
   pull_request:
     types: [opened]

+permissions: {}
+
 jobs:
</code_context>
<issue_to_address>
**issue (bug_risk):** Explicitly disabling all default `GITHUB_TOKEN` permissions may break actions that expect at least `contents: read`.

This hard-disables the default `GITHUB_TOKEN` and can break any step that implicitly relies on `contents: read` or other scopes (e.g. reading repo metadata, listing PRs, posting comments). If the workflow truly only uses a separate PAT and never the default token, this is fine; otherwise, consider explicitly granting only the needed scopes, for example:

```yaml
permissions:
  contents: read
  pull-requests: write  # only if needed
```

Please double-check the job steps to confirm none depend on the default token’s permissions before keeping `permissions: {}`.
</issue_to_address>

### Comment 3
<location path="README.md" line_range="71-72" />
<code_context>
+            <td><code>.zpink</code> packet streams</td>
+          </tr>
+          <tr>
+            <td>Binding packaging</td>
+            <td><code>repo-local source surfaces</code></td>
+          </tr>
           <tr>
</code_context>
<issue_to_address>
**nitpick (typo):** Consider using consistent terminology for bindings across the README.

Here you use "repo-local source surfaces," but earlier you say "repo-local sources." Choosing one term and using it throughout will make the README clearer.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

if extracted is None:
raise ValueError(f"missing tar member payload: {member.name}")
with extracted, target.open("wb") as output:
shutil.copyfileobj(extracted, output)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 suggestion (security): Zip extraction could benefit from a size guard to reduce risk from overly large or zip-bomb-style archives.

As written, this will extract arbitrarily large members and an unbounded number of entries, which can be abused (e.g., zip bombs) if the input isn’t fully trusted.

Consider enforcing:

  • A global cap on total uncompressed bytes.
  • A per-file size limit, skipping or erroring on oversized entries.
  • Optionally, a maximum entry count.

You can enforce these by wrapping shutil.copyfileobj in a helper that tracks bytes copied and stops once thresholds are exceeded.

pull_request:
types: [opened]

permissions: {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Explicitly disabling all default GITHUB_TOKEN permissions may break actions that expect at least contents: read.

This hard-disables the default GITHUB_TOKEN and can break any step that implicitly relies on contents: read or other scopes (e.g. reading repo metadata, listing PRs, posting comments). If the workflow truly only uses a separate PAT and never the default token, this is fine; otherwise, consider explicitly granting only the needed scopes, for example:

permissions:
  contents: read
  pull-requests: write  # only if needed

Please double-check the job steps to confirm none depend on the default token’s permissions before keeping permissions: {}.

Comment on lines +71 to +72
<td>Binding packaging</td>
<td><code>repo-local source surfaces</code></td>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (typo): Consider using consistent terminology for bindings across the README.

Here you use "repo-local source surfaces," but earlier you say "repo-local sources." Choosing one term and using it throughout will make the README clearer.

@Zer0pa-Architect-Prime Zer0pa-Architect-Prime force-pushed the polish/unified-improvement-pass branch from 31afb4a to cc6a28b Compare April 9, 2026 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant