Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/utils/metadata.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ export interface FileMetadataResult {
detectedMarkers: string[];
provenanceRisk: 'High' | 'Low';
raw: unknown;
parseError?: string | null;
}

export function readFileMetadata(file: File): Promise<FileMetadataResult>;
Expand Down
75 changes: 62 additions & 13 deletions src/utils/metadata.js
Original file line number Diff line number Diff line change
@@ -1,9 +1,37 @@
import { parseBlob } from 'music-metadata-browser';
import ID3Writer from 'browser-id3-writer';

const AI_MARKERS = ['ai','generated','suno','udio','boomy','aiva','soundraw','mubert','stable audio','provenance','c2pa','content credentials','watermark','synthetic','elevenlabs'];
const MARKER_REGEX_CACHE = new Map();

function collectStrings(metadata) {
let parseBlobLoader = null;

async function getParseBlob() {
if (parseBlobLoader) return parseBlobLoader;
parseBlobLoader = import('music-metadata-browser').then((mod) => {
const fn = mod?.parseBlob || mod?.default?.parseBlob;
if (typeof fn !== 'function') {
throw new Error('music-metadata-browser parseBlob export not found');
}
return fn;
});
Comment on lines +10 to +16
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear failed parseBlob loader cache before retrying

If the dynamic import('music-metadata-browser') fails once (for example due to a transient chunk/network load error in the browser), parseBlobLoader is left as a rejected promise and every later readFileMetadata() call will fail in the same session even after conditions recover. This turns a temporary import hiccup into a persistent metadata outage until page reload; clear or rebuild the cache on rejection so subsequent calls can retry.

Useful? React with 👍 / 👎.

return parseBlobLoader;
}

function escapeRegex(value) {
return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}

function markerToRegex(marker) {
if (MARKER_REGEX_CACHE.has(marker)) return MARKER_REGEX_CACHE.get(marker);
const escaped = escapeRegex(marker);
const regex = marker.length <= 2
? new RegExp(`\\b${escaped}\\b`, 'i')
: new RegExp(`(?:^|\\W)${escaped}(?:$|\\W)`, 'i');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match markers across underscore separators

The boundary regex for markers longer than 2 chars uses \W, but _ is considered a word character in JavaScript regex, so strings like ai_generated or content_credentials no longer match generated/content credentials. This is a regression from the previous substring-based detection and can silently miss common machine-generated marker formats, lowering provenance detection accuracy.

Useful? React with 👍 / 👎.

MARKER_REGEX_CACHE.set(marker, regex);
return regex;
}

function collectStrings(metadata, fileName = '') {
const common = metadata?.common || {};
const native = metadata?.native || {};
const values = [common.title,common.artist,common.album,...(common.genre || []),...(common.comment || []),common.encodedby,common.publisher]
Expand All @@ -16,32 +44,53 @@ function collectStrings(metadata) {
if (frame?.value && typeof frame.value === 'object') values.push(JSON.stringify(frame.value));
});
});
if (fileName) values.push(String(fileName));
return values.join(' | ').toLowerCase();
}

export async function readFileMetadata(file) {
const parsed = await parseBlob(file);
const searchable = collectStrings(parsed);
const detectedMarkers = AI_MARKERS.filter(marker => searchable.includes(marker));
let parsed = null;
let parseError = null;

try {
const parseBlob = await getParseBlob();
parsed = await parseBlob(file);
} catch (error) {
parseError = error;
}

const searchable = collectStrings(parsed, file?.name || '');
const detectedMarkers = AI_MARKERS.filter((marker) => markerToRegex(marker).test(searchable));
return {
format: parsed.format?.container || file.type || 'unknown',
title: parsed.common?.title || file.name.replace(/\.[^.]+$/, ''),
artist: parsed.common?.artist || '',
genre: parsed.common?.genre?.[0] || '',
format: parsed?.format?.container || file.type || 'unknown',
title: parsed?.common?.title || file.name.replace(/\.[^.]+$/, ''),
artist: parsed?.common?.artist || '',
genre: parsed?.common?.genre?.[0] || '',
detectedMarkers,
provenanceRisk: detectedMarkers.length > 0 ? 'High' : 'Low',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat parse failures as unknown, not low provenance risk

When parseBlob(file) throws, parsed stays null, which makes detectedMarkers empty and forces provenanceRisk to 'Low'. In this failure path the app has no metadata evidence at all, so reporting low risk is a misleading false negative (especially for unsupported/corrupt files) rather than a real classification; this should return an explicit unknown/error state or propagate the parse failure signal into risk evaluation.

Useful? React with 👍 / 👎.

raw: parsed,
parseError: parseError ? String(parseError?.message || parseError) : null,
};
}

export async function writeMP3Metadata(file, metadata) {
const buffer = await file.arrayBuffer();
const writer = new ID3Writer(buffer);
writer.removeTag();
if (metadata.title) writer.setFrame('TIT2', metadata.title);
if (metadata.artist) writer.setFrame('TPE1', [metadata.artist]);
if (metadata.genre) writer.setFrame('TCON', [metadata.genre]);
writer.setFrame('TENC', 'SpectraCleanseAI Browser Quick Cleanse');

const safeText = (value) => {
if (typeof value !== 'string') return '';
return value.replace(/\u0000/g, '').trim().slice(0, 500);
};
Comment on lines +81 to +84
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): safeText drops non-string values instead of preserving them, which may be overly strict

Currently, non-string inputs result in '', so values like numbers, String-likes, or already-joined genre arrays are effectively dropped and no frame is written. To avoid losing valid data, consider coercing non-nullish values to strings (e.g. if (value == null) return ''; const text = String(value); ...) before sanitizing (null removal, trim, truncate).

Suggested change
const safeText = (value) => {
if (typeof value !== 'string') return '';
return value.replace(/\u0000/g, '').trim().slice(0, 500);
};
const safeText = (value) => {
if (value == null) return '';
const text = String(value);
return text.replace(/\u0000/g, '').trim().slice(0, 500);
};


const title = safeText(metadata?.title);
const artist = safeText(metadata?.artist);
const genre = safeText(metadata?.genre);

if (title) writer.setFrame('TIT2', title);
if (artist) writer.setFrame('TPE1', [artist]);
if (genre) writer.setFrame('TCON', [genre]);
if (title || artist || genre) writer.setFrame('TENC', 'SpectraCleanseAI Browser Quick Cleanse');
writer.addTag();
return new Blob([writer.getBlob()], { type: 'audio/mpeg' });
}
Loading