The substring check entity.sourceUrl.includes('/profile_images/') at three sites in src/bookmark-media.ts is used to classify URLs as either "profile image" (deduplicated by URL alone) or "asset" (keyed by tweetId + sourceUrl). Twitter's default-pfp URL — served for any account with no profile photo set — uses a path with default_profile_images (underscore before profile, not slash), so the substring fails. Default-pfp URLs get misclassified as asset and slip through the URL-deduplication that #79 fixed for real pfps.
Effect: every new bookmark whose author uses the default avatar triggers a fresh fetch of the same 2KB PNG, attributed to that bookmark's tweetId in the manifest.
Concrete evidence
Sample manifest from a ~5200-bookmark archive — 7 manifest entries for the identical URL, different bookmarkId each time:
2026-04-08T15:14:15 bookmarkId=2037126739286253911 url=https://abs.twimg.com/sticky/default_profile_images/default_profile_400x400.png
2026-04-08T16:10:19 bookmarkId=2029867961624793147 (same url)
2026-04-08T18:05:51 bookmarkId=2009833980183376247 (same url)
2026-04-08T18:22:08 bookmarkId=2008208287939076351 (same url)
2026-04-08T18:31:54 bookmarkId=2005656385732964618 (same url)
2026-04-24T07:37:39 bookmarkId=2040426976050086218 (same url)
2026-05-10T19:41:43 bookmarkId=2047271408246395145 (same url)
Seven downloads of the same 2KB PNG across a month, one per distinct bookmark whose author has the default avatar.
Why
Two URL shapes for profile images:
Real user pfp: https://pbs.twimg.com/profile_images/<id>/<filename>_400x400.jpg
Default pfp: https://abs.twimg.com/sticky/default_profile_images/default_profile_400x400.png
The substring '/profile_images/' (with surrounding slashes) matches the first but not the second — default_profile_images has _profile_images/ with an underscore before profile, not a slash.
Cascading effect
Three call sites in src/bookmark-media.ts use the same .includes('/profile_images/') check:
// L80 — used to build the cache key, with the third arg being `isProfileImage`
function mediaEntryKeyFromEntry(entry: MediaFetchEntry): string {
return mediaEntryKey(entry.tweetId, entry.sourceUrl, entry.sourceUrl.includes('/profile_images/'));
}
// L214 — building coveredAssetKeys (non-pfp entries, keyed by tweetId::sourceUrl)
.filter((entry) => !entry.sourceUrl.includes('/profile_images/'))
.filter((entry) => isCoveredEntry(entry, maxBytes))
.map((entry) => `${entry.tweetId}::${entry.sourceUrl}`),
// L223 — building coveredProfileImageUrls (pfp entries, keyed by URL alone)
.filter((entry) => entry.sourceUrl.includes('/profile_images/'))
.filter((entry) => isCoveredEntry(entry, maxBytes))
.map((entry) => entry.sourceUrl),
For a default-pfp URL:
- Line 80 returns
isProfileImage: false, so the cache key includes tweetId
- Line 214's filter passes it through (since the check is negated), adding it to
coveredAssetKeys
- Line 223's filter excludes it (since
includes is false), so the URL is never added to coveredProfileImageUrls
When a new bookmark with default-pfp author arrives, the resolver checks coveredAssetKeys with a key like <newTweetId>::<defaultUrl> — that key isn't there (different tweetId) → URL classified as pending → fetched again. Manifest gets a new entry attributed to the new bookmark.
--skip-profile-images can't help because the URL isn't classified as a profile image.
Relationship to #79
#79 ("profile images are re-downloaded and duplicated on disk for every bookmark from the same author", closed) was the broader pfp-dedup fix that introduced the URL-keyed coveredProfileImageUrls set. That fix works correctly for real-user pfps (the URL pbs.twimg.com/profile_images/... matches the substring). Default-pfp URLs were missed because the substring is overly anchored — they regress to the pre-#79 per-bookmark fetch pattern.
Fix
Three identical substring checks need to broaden from '/profile_images/' to 'profile_images' (drop the surrounding slashes):
// L80
entry.sourceUrl.includes('profile_images')
// L214
.filter((entry) => !entry.sourceUrl.includes('profile_images'))
// L223
.filter((entry) => entry.sourceUrl.includes('profile_images'))
Both URL shapes now match:
…/profile_images/<id>/… contains profile_images ✓
…/default_profile_images/default_profile_… contains profile_images ✓
Alternative if matching boundaries matters: a regex like /profile_images\//.test(url) || /\/default_profile_images\//.test(url). The simpler substring drop is sufficient — no other twimg.com URL path contains profile_images as a substring outside of these two contexts.
After the fix, the existing coveredProfileImageUrls URL-dedup logic catches default-pfp URLs too. Future fetches short-circuit. No schema change, no migration. Existing duplicated default-pfp files on disk persist harmlessly (or can be cleaned up by an fetch-media --prune-style command if/when that lands).
The substring check
entity.sourceUrl.includes('/profile_images/')at three sites insrc/bookmark-media.tsis used to classify URLs as either "profile image" (deduplicated by URL alone) or "asset" (keyed bytweetId + sourceUrl). Twitter's default-pfp URL — served for any account with no profile photo set — uses a path withdefault_profile_images(underscore beforeprofile, not slash), so the substring fails. Default-pfp URLs get misclassified as asset and slip through the URL-deduplication that #79 fixed for real pfps.Effect: every new bookmark whose author uses the default avatar triggers a fresh fetch of the same 2KB PNG, attributed to that bookmark's
tweetIdin the manifest.Concrete evidence
Sample manifest from a ~5200-bookmark archive — 7 manifest entries for the identical URL, different
bookmarkIdeach time:Seven downloads of the same 2KB PNG across a month, one per distinct bookmark whose author has the default avatar.
Why
Two URL shapes for profile images:
The substring
'/profile_images/'(with surrounding slashes) matches the first but not the second —default_profile_imageshas_profile_images/with an underscore beforeprofile, not a slash.Cascading effect
Three call sites in
src/bookmark-media.tsuse the same.includes('/profile_images/')check:For a default-pfp URL:
isProfileImage: false, so the cache key includestweetIdcoveredAssetKeysincludesis false), so the URL is never added tocoveredProfileImageUrlsWhen a new bookmark with default-pfp author arrives, the resolver checks
coveredAssetKeyswith a key like<newTweetId>::<defaultUrl>— that key isn't there (different tweetId) → URL classified as pending → fetched again. Manifest gets a new entry attributed to the new bookmark.--skip-profile-imagescan't help because the URL isn't classified as a profile image.Relationship to #79
#79 ("profile images are re-downloaded and duplicated on disk for every bookmark from the same author", closed) was the broader pfp-dedup fix that introduced the URL-keyed
coveredProfileImageUrlsset. That fix works correctly for real-user pfps (the URLpbs.twimg.com/profile_images/...matches the substring). Default-pfp URLs were missed because the substring is overly anchored — they regress to the pre-#79 per-bookmark fetch pattern.Fix
Three identical substring checks need to broaden from
'/profile_images/'to'profile_images'(drop the surrounding slashes):Both URL shapes now match:
…/profile_images/<id>/…containsprofile_images✓…/default_profile_images/default_profile_…containsprofile_images✓Alternative if matching boundaries matters: a regex like
/profile_images\//.test(url) || /\/default_profile_images\//.test(url). The simpler substring drop is sufficient — no other twimg.com URL path containsprofile_imagesas a substring outside of these two contexts.After the fix, the existing
coveredProfileImageUrlsURL-dedup logic catches default-pfp URLs too. Future fetches short-circuit. No schema change, no migration. Existing duplicated default-pfp files on disk persist harmlessly (or can be cleaned up by anfetch-media --prune-style command if/when that lands).