Add modular forensic processor and batch downloads by ChrisAdamsdevelopment · Pull Request #22 · ChrisAdamsdevelopment/SpectraCleanseAI

ChrisAdamsdevelopment · 2026-05-13T11:33:02Z

Motivation

Move forensic metadata wipe/inject/verify logic into a focused, testable module and add paid-only batch processing plus one-time secure downloads while preserving existing single-file behavior and headers.
Introduce explicit marker rules and benign/allowed tag classification to reduce false positives and provide richer audit evidence for forensic review.
Add durable tracking for generated outputs and safe cleanup to avoid leaking files from uploads/ and support one-time batch downloads.

Description

Added server/metadataRules.js, exporting MARKER_RULES, ALLOWED_INJECTED_TAGS, isBenign, and isAllowedInjected for suspicious/benign/allowed tag classification.
Added server/processor.js, encapsulating processMediaFile(...) plus helper logic for marker detection, final-state verification, platform-aware metadata writing, and unsupported-cleanse 422 errors.
Added server/cleanup.js, persisting a SQLite-backed cleanup_queue with upload-dir safety guards, immediate deletion, periodic sweeping, and row pruning.
Added server/downloadTokens.js, persisting one-time download_tokens with TTL, account validation, consumption semantics, and expiry cleanup.
Integrated the new modules into server.js while preserving auth (req.user.sub), free-tier usage checks, Stripe/Gemini behavior, MP3 server-cleanse rejection, existing /api/process behavior, and SPA/API route ordering.
Added authenticated POST /api/process-batch and GET /api/download/:token before the API catch-all.
Extended frontend forensic report typing in app.tsx with optional richer audit fields while preserving legacy fields (removedCount, removedTags, timestamp).
Added concise README.md notes for batch processing, one-time download tokens, MP3 server-side exclusion, and deployment/body-size guard expectations.

Explicit exclusions preserved

Did not implement any Spotify/Apple Music album consent modal.
Did not add AlbumConsentModal.
Did not add or use albumOverrideConfirmed.
Did not change Spotify/Apple Music behavior; backend still writes Album = safeTitle, Description, Year, and optional Lyrics-eng.
Did not introduce @google/generative-ai; Gemini remains REST/fetch-based.
Did not implement server-side MP3 processing; MP3 server cleanse remains rejected.

Testing

Validated under supported Node runtime 20.20.2:

npm install passed.
npm run build passed.
node --check server.js passed.
node --check server/processor.js passed.
node --check server/metadataRules.js passed.
node --check server/cleanup.js passed.
node --check server/downloadTokens.js passed.

Runtime smoke checks performed under Node 20.20.2:

Started server with safe local environment.
GET /api/health returned JSON 200.
GET /api/nonexistent returned JSON 404.
GET /nonexistent-frontend-route served index.html when dist/ existed.
GET /api/download/badtoken without auth returned JSON 401.
After registering/authenticating a test user, GET /api/download/badtoken returned JSON 404 (Download link not found.).

Code/contract checks confirmed:

req.user.sub is still used for authenticated user identity.
users.monthly_usage is not referenced.
jobs.status is not referenced.
@google/generative-ai is not referenced.
AlbumConsentModal is not referenced.
albumOverrideConfirmed is not referenced.
/api/process-batch and /api/download/:token are before the /api catch-all.
/api/generate-seo still uses REST fetch.
Stripe routes remain present.
Existing /api/process route remains present and MP3 server-side processing remains rejected.

sourcery-ai · 2026-05-13T11:33:09Z

Reviewer's Guide

Refactors the forensic metadata cleanse/inject pipeline into modular server-side components, adds a richer three-class forensic report model, and introduces paid-only batch processing with one-time secure downloads and durable cleanup of generated files, while keeping existing single-file behavior and headers compatible.

Sequence diagram for batch processing and one-time download flow

sequenceDiagram
  actor User
  participant Server
  participant Processor
  participant DownloadTokens
  participant Cleanup

  User->>Server: POST /api/process-batch
  Server->>Server: validatePlanAndSize
  alt plan is free or size too large
    Server-->>User: 4xx error
  else paid and within limit
    loop each file
      Server->>Processor: processMediaFile(outputPath, platform, metadata)
      Processor-->>Server: report
      Server->>DownloadTokens: createToken(userId, filePath, downloadName)
      DownloadTokens-->>Server: token
      Server->>Cleanup: registerForCleanup([filePath])
    end
    Server-->>User: JSON { results[ { report, downloadToken } ] }
  end

  User->>Server: GET /api/download/:token
  Server->>DownloadTokens: consumeToken(token, userId)
  alt token valid
    DownloadTokens-->>Server: { filePath, downloadName }
    Server-->>User: res.download(filePath, downloadName)
    Server->>Cleanup: deleteImmediately(filePath)
  else invalid/expired
    DownloadTokens-->>Server: { error, code }
    Server-->>User: code error
  end

File-Level Changes

Change	Details	Files
Modularized media processing pipeline and reused it for single-file processing.	Extracted exiftool-based read→wipe→verify→inject→final-verify flow into server/processor.js as processMediaFile plus helpers. Replaced inline /api/process implementation in server.js to delegate to processMediaFile, while preserving auth, plan checks, MP3 rejection, and response headers. Standardized error handling from processMediaFile to map 422-format errors and generic failures into appropriate HTTP responses.	`server.js` `server/processor.js`
Introduced explicit marker rules and benign/allowed tag classification for forensic analysis.	Added MARKER_RULES plus isBenign and isAllowedInjected helpers to classify tags into suspicious, benign, and allowed-injected sets. Implemented detectMarkers and verifyFinalState to generate detailed forensic evidence and residual marker detection. Wired processor.js to use the rule set to compute statuses, summaries, and audit fields in the returned report.	`server/metadataRules.js` `server/processor.js`
Added durable cleanup tracking for generated outputs under uploads/.	Created cleanup_queue table and sweeper in server/cleanup.js to track and delete files after a TTL with safety checks for uploads/ paths. Registered cleanup module in server.js and used registerForCleanup/deleteImmediately in single-file and batch flows. Ensured cleanup records are marked deleted and aged-out rows are pruned to keep the table bounded.	`server/cleanup.js` `server.js`
Implemented one-time download tokens for batch outputs.	Created download_tokens table and token management in server/downloadTokens.js with TTL-based sweeping and upload-dir safety validation. Initialized download token module in server.js and added GET /api/download/:token route gated by auth. Made batch processing issue per-file tokens tied to user and file path, enforcing one-time, account-scoped downloads.	`server/downloadTokens.js` `server.js`
Added paid-only batch processing endpoint with usage headers and soft size limits.	Added POST /api/process-batch route that accepts up to 20 files, enforces paid plan requirement, and applies a 2GB soft total size guard. Sequentially processes each non-MP3 file via processMediaFile, records jobs, and registers outputs for cleanup. Returns a structured JSON result with per-file reports and download tokens plus X-Usage-* headers consistent with single-file processing.	`server.js`
Extended frontend typing and UI to consume richer forensic reports while remaining backward compatible.	Defined MarkerHit, ResidualTag, and ForensicReport interfaces in app.tsx with optional status, summaries, and audit arrays. Updated QueueItem.report to use ForensicReport, preserving legacy fields removedCount, removedTags, and timestamp. Prepared for a compact expandable audit UI (inferred from typing changes) without breaking existing consumers.	`app.tsx`
Documented new batch and download APIs and operational constraints.	Extended README with descriptions of /api/process-batch, /api/download/:token, MP3 limitations, and 2GB batch soft guard. Clarified that MP3 server cleanse remains unsupported and that infra limits must still enforce upload/body size and disk constraints.	`README.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The new removedCount in processMediaFile is now based on marker hits (beforeMarkers.length) rather than the number of tags removed (removedTags.length), which changes the meaning of the X-Forensic-Removed header compared to the old implementation; if consumers expect the old semantics, consider either keeping the previous behavior or exposing both counts explicitly.
In /api/process-batch the JSON response always returns usage.limit: null even though the X-Usage-Limit header still carries the per-plan limit; if clients read the JSON usage object, consider populating limit consistently with the header value.
Several new code paths (e.g., the /api/process, /api/process-batch handlers and processMediaFile) compress complex logic into single-line try/catch blocks and long inline expressions, which makes the control flow difficult to follow; consider expanding these into multi-line statements or extracting helpers to improve readability and future maintainability.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The new `removedCount` in `processMediaFile` is now based on marker hits (`beforeMarkers.length`) rather than the number of tags removed (`removedTags.length`), which changes the meaning of the `X-Forensic-Removed` header compared to the old implementation; if consumers expect the old semantics, consider either keeping the previous behavior or exposing both counts explicitly.
- In `/api/process-batch` the JSON response always returns `usage.limit: null` even though the `X-Usage-Limit` header still carries the per-plan limit; if clients read the JSON usage object, consider populating `limit` consistently with the header value.
- Several new code paths (e.g., the `/api/process`, `/api/process-batch` handlers and `processMediaFile`) compress complex logic into single-line `try/catch` blocks and long inline expressions, which makes the control flow difficult to follow; consider expanding these into multi-line statements or extracting helpers to improve readability and future maintainability.

## Individual Comments

### Comment 1
<location path="server.js" line_range="523" />
<code_context>
+  const usedNow = getMonthlyJobCount(userId);
+  res.setHeader('X-Usage-This-Month', usedNow);
+  res.setHeader('X-Usage-Limit', userPlan === 'free' ? FREE_MONTHLY_LIMIT : 'unlimited');
+  return res.json({ results, usage: { thisMonth: usedNow, limit: null } });
+});
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Usage `limit` is always `null`, even for limited plans.

Here `usage.limit` is always `null` even though the headers differentiate between free and unlimited plans. This makes the batch response inconsistent with `/api/process` and can break clients that rely on the JSON body to display or enforce limits. Please set `usage.limit` in line with the headers, e.g. `userPlan === 'free' ? FREE_MONTHLY_LIMIT : null`.
</issue_to_address>

### Comment 2
<location path="server/processor.js" line_range="53-54" />
<code_context>
+  const finalTags = await exiftool.read(outputPath);
+  const finalMarkers = detectMarkers(finalTags);
+  const verification = verifyFinalState(finalTags);
+  const removedTags = beforeKeys.filter((k) => !(k in finalTags));
+  const removedCount = beforeMarkers.length;
+  const status = (!wipeVerificationPassed || finalMarkers.length > 0) ? 'review_required' : (verification.unexpectedDescriptive.length > 0 ? 'clean_with_notes' : 'clean');
+  const seo = '';
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Mismatch between `removedCount` and `removedTags` may confuse downstream consumers.

`removedTags` tracks tags that disappeared between pre- and post-state, but `removedCount` is based on `beforeMarkers.length` (forensic markers), not removed tags. Since callers may interpret `removedCount` as “tags removed”, this mismatch can be misleading, especially when multiple markers share a tag or when non-marker tags are removed. Please either rename the field (e.g. `removedMarkersCount`) or make it derive from `removedTags.length` to keep the semantics consistent.

```suggestion
  const removedTags = beforeKeys.filter((k) => !(k in finalTags));
  const removedCount = removedTags.length;
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-05-13T11:35:20Z

+  const usedNow = getMonthlyJobCount(userId);
+  res.setHeader('X-Usage-This-Month', usedNow);
+  res.setHeader('X-Usage-Limit', userPlan === 'free' ? FREE_MONTHLY_LIMIT : 'unlimited');
+  return res.json({ results, usage: { thisMonth: usedNow, limit: null } });


issue (bug_risk): Usage limit is always null, even for limited plans.

Here usage.limit is always null even though the headers differentiate between free and unlimited plans. This makes the batch response inconsistent with /api/process and can break clients that rely on the JSON body to display or enforce limits. Please set usage.limit in line with the headers, e.g. userPlan === 'free' ? FREE_MONTHLY_LIMIT : null.

sourcery-ai · 2026-05-13T11:35:20Z

+  const removedTags = beforeKeys.filter((k) => !(k in finalTags));
+  const removedCount = beforeMarkers.length;


suggestion (bug_risk): Mismatch between removedCount and removedTags may confuse downstream consumers.

removedTags tracks tags that disappeared between pre- and post-state, but removedCount is based on beforeMarkers.length (forensic markers), not removed tags. Since callers may interpret removedCount as “tags removed”, this mismatch can be misleading, especially when multiple markers share a tag or when non-marker tags are removed. Please either rename the field (e.g. removedMarkersCount) or make it derive from removedTags.length to keep the semantics consistent.

Suggested change

const removedTags = beforeKeys.filter((k) => !(k in finalTags));

const removedCount = beforeMarkers.length;

const removedTags = beforeKeys.filter((k) => !(k in finalTags));

const removedCount = removedTags.length;

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cc92395f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T11:36:03Z

+  const verification = verifyFinalState(finalTags);
+  const removedTags = beforeKeys.filter((k) => !(k in finalTags));
+  const removedCount = beforeMarkers.length;
+  const status = (!wipeVerificationPassed || finalMarkers.length > 0) ? 'review_required' : (verification.unexpectedDescriptive.length > 0 ? 'clean_with_notes' : 'clean');


Base review status on filtered residual markers

verifyFinalState explicitly filters benign and allowed injected tags, but the status computation ignores that and uses finalMarkers.length > 0 from the unfiltered tag set. This means normal user-provided metadata (for example, an allowed Title containing a brand term like "OpenAI") can force review_required even when finalVerificationPassed is true, creating false-positive forensic failures.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T11:36:03Z

+  const finalMarkers = detectMarkers(finalTags);
+  const verification = verifyFinalState(finalTags);
+  const removedTags = beforeKeys.filter((k) => !(k in finalTags));
+  const removedCount = beforeMarkers.length;


Keep removedCount aligned with removedTags/header semantics

removedCount is now set to beforeMarkers.length, which counts marker-rule hits, not removed metadata keys. Because one key/value can trigger multiple rules (or none), this can diverge from removedTags and from prior X-Forensic-Removed behavior, causing inconsistent audit output and misleading usage/UI metrics that assume the count corresponds to removed tags.

Useful? React with 👍 / 👎.

Add modular forensic processor and batch downloads

7cc9239

ChrisAdamsdevelopment added the codex label May 13, 2026 — with ChatGPT Codex Connector

ChrisAdamsdevelopment merged commit af0ee06 into main May 13, 2026
4 of 5 checks passed

sourcery-ai Bot reviewed May 13, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add modular forensic processor and batch downloads#22

Add modular forensic processor and batch downloads#22
ChrisAdamsdevelopment merged 1 commit into
mainfrom
codex/create-new-branch-for-architecture-upgrade

ChrisAdamsdevelopment commented May 13, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot commented May 13, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot May 13, 2026

Uh oh!

sourcery-ai Bot May 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		const removedTags = beforeKeys.filter((k) => !(k in finalTags));
		const removedCount = beforeMarkers.length;

Conversation

ChrisAdamsdevelopment commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Explicit exclusions preserved

Testing

Uh oh!

sourcery-ai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for batch processing and one-time download flow

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ChrisAdamsdevelopment commented May 13, 2026 •

edited

Loading

sourcery-ai Bot commented May 13, 2026 •

edited

Loading