Harden metadata parsing and marker detection; sanitize Quick Cleanse ID3 writes#11
Conversation
Reviewer's GuideAdds defensive metadata parsing with diagnostic errors, boundary-aware regex-based AI marker detection, and sanitized Quick Cleanse ID3 tag writing, including updated typings for the new parseError field. Sequence diagram for hardened readFileMetadata parsing and marker detectionsequenceDiagram
participant Caller
participant readFileMetadata
participant parseBlob
participant markerToRegex
participant MARKER_REGEX_CACHE
Caller->>readFileMetadata: readFileMetadata(file)
activate readFileMetadata
readFileMetadata->>parseBlob: parseBlob(file)
activate parseBlob
parseBlob-->>readFileMetadata: parsedMetadata or throws
deactivate parseBlob
alt parseBlob succeeds
readFileMetadata->>readFileMetadata: collectStrings(parsed)
else parseBlob throws
readFileMetadata->>readFileMetadata: set parsed to null
readFileMetadata->>readFileMetadata: set parseError from error
readFileMetadata->>readFileMetadata: collectStrings(null)
end
loop for each marker in AI_MARKERS
readFileMetadata->>markerToRegex: markerToRegex(marker)
activate markerToRegex
alt regex cached
markerToRegex->>MARKER_REGEX_CACHE: get(marker)
MARKER_REGEX_CACHE-->>markerToRegex: cachedRegex
else regex not cached
markerToRegex->>markerToRegex: escapeRegex(marker)
markerToRegex->>markerToRegex: build boundaryRegex
markerToRegex->>MARKER_REGEX_CACHE: set(marker, regex)
end
markerToRegex-->>readFileMetadata: regex
deactivate markerToRegex
readFileMetadata->>readFileMetadata: regex.test(searchable)
end
readFileMetadata-->>Caller: FileMetadataResult(format, title, artist, genre, detectedMarkers, provenanceRisk, raw, parseError)
deactivate readFileMetadata
Class diagram for updated metadata utilities and FileMetadataResultclassDiagram
class FileMetadataResult {
+string format
+string title
+string artist
+string genre
+string[] detectedMarkers
+string provenanceRisk
+unknown raw
+string parseError
}
class MetadataUtils {
+AI_MARKERS string[]
+MARKER_REGEX_CACHE Map
+escapeRegex(value)
+markerToRegex(marker)
+collectStrings(metadata)
+readFileMetadata(file)
+writeMP3Metadata(file, metadata)
}
class QuickCleanseWriter {
+safeText(value)
}
class ID3WriterExternal {
+constructor(buffer)
+removeTag()
+setFrame(frameId, value)
+addTag()
+getBlob()
}
MetadataUtils ..> FileMetadataResult : returns
MetadataUtils ..> ID3WriterExternal : uses
MetadataUtils ..> QuickCleanseWriter : uses
QuickCleanseWriter ..> MetadataUtils : used by writeMP3Metadata
class ReadFileMetadataFlow {
+file File
+parsedMetadata
+parseError
+detectedMarkers string[]
}
ReadFileMetadataFlow ..> MetadataUtils : uses escapeRegex
ReadFileMetadataFlow ..> MetadataUtils : uses markerToRegex
ReadFileMetadataFlow ..> FileMetadataResult : builds
class WriteMP3MetadataFlow {
+file File
+metadata
+title string
+artist string
+genre string
}
WriteMP3MetadataFlow ..> QuickCleanseWriter : uses safeText
WriteMP3MetadataFlow ..> ID3WriterExternal : sets frames
WriteMP3MetadataFlow ..> MetadataUtils : writeMP3Metadata entrypoint
class MarkerRegexBehavior {
+marker string
+regex RegExp
}
MarkerRegexBehavior ..> MetadataUtils : markerToRegex
MarkerRegexBehavior ..> MetadataUtils : MARKER_REGEX_CACHE
MarkerRegexBehavior ..> MetadataUtils : escapeRegex
class ProvenanceRiskLogic {
+detectedMarkers string[]
+provenanceRisk string
}
ProvenanceRiskLogic ..> MetadataUtils : readFileMetadata
ProvenanceRiskLogic ..> FileMetadataResult : sets provenanceRisk
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="src/utils/metadata.js" line_range="66-69" />
<code_context>
- if (metadata.genre) writer.setFrame('TCON', [metadata.genre]);
- writer.setFrame('TENC', 'SpectraCleanseAI Browser Quick Cleanse');
+
+ const safeText = (value) => {
+ if (typeof value !== 'string') return '';
+ return value.replace(/\u0000/g, '').trim().slice(0, 500);
+ };
+
+ const title = safeText(metadata?.title);
+ const artist = safeText(metadata?.artist);
+ const genre = safeText(metadata?.genre);
+
+ if (title) writer.setFrame('TIT2', title);
</code_context>
<issue_to_address>
**suggestion (bug_risk):** `safeText` drops non-string values instead of preserving them, which may be overly strict
Currently, non-string inputs result in `''`, so values like numbers, `String`-likes, or already-joined `genre` arrays are effectively dropped and no frame is written. To avoid losing valid data, consider coercing non-nullish values to strings (e.g. `if (value == null) return ''; const text = String(value); ...`) before sanitizing (null removal, trim, truncate).
```suggestion
const safeText = (value) => {
if (value == null) return '';
const text = String(value);
return text.replace(/\u0000/g, '').trim().slice(0, 500);
};
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| const safeText = (value) => { | ||
| if (typeof value !== 'string') return ''; | ||
| return value.replace(/\u0000/g, '').trim().slice(0, 500); | ||
| }; |
There was a problem hiding this comment.
suggestion (bug_risk): safeText drops non-string values instead of preserving them, which may be overly strict
Currently, non-string inputs result in '', so values like numbers, String-likes, or already-joined genre arrays are effectively dropped and no frame is written. To avoid losing valid data, consider coercing non-nullish values to strings (e.g. if (value == null) return ''; const text = String(value); ...) before sanitizing (null removal, trim, truncate).
| const safeText = (value) => { | |
| if (typeof value !== 'string') return ''; | |
| return value.replace(/\u0000/g, '').trim().slice(0, 500); | |
| }; | |
| const safeText = (value) => { | |
| if (value == null) return ''; | |
| const text = String(value); | |
| return text.replace(/\u0000/g, '').trim().slice(0, 500); | |
| }; |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: beac7fa616
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| artist: parsed?.common?.artist || '', | ||
| genre: parsed?.common?.genre?.[0] || '', | ||
| detectedMarkers, | ||
| provenanceRisk: detectedMarkers.length > 0 ? 'High' : 'Low', |
There was a problem hiding this comment.
Treat parse failures as unknown, not low provenance risk
When parseBlob(file) throws, parsed stays null, which makes detectedMarkers empty and forces provenanceRisk to 'Low'. In this failure path the app has no metadata evidence at all, so reporting low risk is a misleading false negative (especially for unsupported/corrupt files) rather than a real classification; this should return an explicit unknown/error state or propagate the parse failure signal into risk evaluation.
Useful? React with 👍 / 👎.
| const escaped = escapeRegex(marker); | ||
| const regex = marker.length <= 2 | ||
| ? new RegExp(`\\b${escaped}\\b`, 'i') | ||
| : new RegExp(`(?:^|\\W)${escaped}(?:$|\\W)`, 'i'); |
There was a problem hiding this comment.
Match markers across underscore separators
The boundary regex for markers longer than 2 chars uses \W, but _ is considered a word character in JavaScript regex, so strings like ai_generated or content_credentials no longer match generated/content credentials. This is a regression from the previous substring-based detection and can silently miss common machine-generated marker formats, lowering provenance detection accuracy.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a7c14bbaf6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| parseBlobLoader = import('music-metadata-browser').then((mod) => { | ||
| const fn = mod?.parseBlob || mod?.default?.parseBlob; | ||
| if (typeof fn !== 'function') { | ||
| throw new Error('music-metadata-browser parseBlob export not found'); | ||
| } | ||
| return fn; | ||
| }); |
There was a problem hiding this comment.
Clear failed parseBlob loader cache before retrying
If the dynamic import('music-metadata-browser') fails once (for example due to a transient chunk/network load error in the browser), parseBlobLoader is left as a rejected promise and every later readFileMetadata() call will fail in the same session even after conditions recover. This turns a temporary import hiccup into a persistent metadata outage until page reload; clear or rebuild the cache on rejection so subsequent calls can retry.
Useful? React with 👍 / 👎.
Motivation
readFileMetadata()from crashing flows and provide useful diagnostics whenparseBlob()fails.aithat match inside unrelated words.music-metadata-browserinstead of relying on brittle static import resolution.Description
parseBlob(file)in atry/catchand return file-derived fallbacks when parsing fails, adding aparseErrorstring to the returned metadata for diagnostics.markerToRegex()and aMARKER_REGEX_CACHEto reduce false positives for short markers.escapeRegex()to safely escape marker strings before building regexes.safeText()(strip null bytes, trim, cap length) and only write the encoder (TENC) frame when at least one metadata field is present.music-metadata-browserparser resolution throughgetParseBlob()so Vite/module export differences do not break production builds.parseErrorinFileMetadataResult.Testing
music-metadata-browserparser.npm installsuccessfully.npm run buildsuccessfully.git status --shortwas clean after final verification.