404 File Fixer: https://beak2825.github.io/epstein-files-archive
Old VS New, see old VS new redactions
If you saw evidence of files being deleted or modified in the Epstein Files, you probably want to see how many were modified/deleted, or a list of the deleted or modified ones. This repo is just for that as it keeps track of current modifications and when a file was modified.
This is not archiving the files themselves, this is only archiving the server responses, useful for checksum and Last-Modified ETags from justice.gov are made in MD5 format. (Edit: the -part at the end is not included for the MD5, and zip files don't have the hash you think because they include extra folders like DATA,IMAGES,VOLUME on the DOJ direct zip downloads.)
This Python script fetches metadata (HTTP headers) for files from the U.S. Department of Justice (DOJ) Epstein disclosures datasets available at https://www.justice.gov/epstein/doj-disclosures. It processes each dataset sequentially, handling pagination, and saves selected response headers to text files without downloading the actual file contents. It also compiles a universal log of file names with their Last-Modified dates and ETags.
The neat thing about this program is it doesn't download any files, only gets the response using HEAD instead of GET, which just makes the process 10x faster.
Table of Data Sets and known deleted/changed files (we know which ones were deleted/changed)
| Data Set # | Files Changed/Deleted |
|---|---|
| 1 | 65 Changed, 8 Deleted |
| 2 | 1 Changed, 1 Deleted |
| 3 | 3 Changed, 2 Deleted |
| 4 | 2 Changed |
| 5 | 1 Changed |
| 6 | 2 Changed |
| 7 | N/A |
| 8 | 21 Changed, 10 Deleted |
| 9 | 401 Changed, 866 Deleted |
| 10 | 262 Counting Changed, 40 Deleted |
| 11 | 92 Changed, 29 Deleted |
| 12 | 2 New, 1 Deleted |
| 13-23 | Unreleased |
It would be appreciated if people ran 2fetcher.py and did pull requests so we can
speed up this project.
ALL DataSet ZIP Files have been removed from the DOJ website, not sure when but between (Feb 9-13)
Data Set 10 is actively being modified with the most recent change being EFTA01286686 2/12/26 9 AM EST
They are deleting/redacting mentions of "Trump" see https://github.com/beak2825/epstein-files-archive/commit/ae6e32bed1d135dcb3c14e84795cad1faf8ef5f5 and https://github.com/beak2825/epstein-files-archive/commit/71f55ba47d72d428fbdbb7f5c8e47e830dd22688
As you can see the amount of mentions is now 4731 instead of 4732
EFTA00020685 has been modified, but no new changes are visually shown, the text pixels is slightly shifted, someone check the binary data, old, new
The DOJ does not update the .zip files after they are posted it seems, so if a document is updated use the zip files.
They unredacted part of a file 3 hours after @RepThomasMassie spotlighted it EFTA00173201
They deleted all mentions of "Juan Ruiz Toro" EFTA00031428 EFTA00009897
They deleted EFTA00020508 a few days after the media spotlighted it for certain statements of Donald Trump
They rotated EFTA00001931 EFTA00000531
They redacted a painting/photo that wasn't a victim EFTA00001225
They deleted a file that contained statements about Donald Trump's Mar-a-Lago Club in Palm Beach, Florida EFTA00261604
There's files that added redactions 5 hours they were posted, the old versions are lost media, EFT00156482 EFTA00158898 EFTA00158891 EFTA00151816 EFTA00151209 EFTA00094156 EFTA00081180
https://www.justice.gov/epstein/doj-disclosures/data-set-9-files?page=17 On Data Set 9 it starts to break the pagination, and possibly makes files unlisted (Someone verify, I googled some files and they returned a lot of data, but for a suspected unlisted one it was only 1 result)
They scribbled then fully redacted a screenshot, and possible Epstein's facebook profile picture is visible EFTA00037168
More soon.
I really want to add something that lets you scrape even faster because there is just millions of files to go through, so using proxies sounds like a good idea, or someone can make a pull request if they have better ideas.
I don't know but does anyone think I should make a telegram channel or discord server on this repo or just communites for the files in general? Point me in the right direction.
This is a easy thing you can do right now with your folder, open set_mtime_from_last_modified.py.
A helper script set_mtime_from_last_modified.py is included in the repository. It scans all .txt files under the EFTA/ folder (recursively), extracts the Last-Modified header inside each file, converts it to Eastern time, and sets the filesystem modification time (mtime) to that value.
Quick usage:
-
Preview what would change without modifying files:
python set_mtime_from_last_modified.py --dry-run -
Apply changes (updates file mtimes):
python set_mtime_from_last_modified.py
Options:
--root/-r: root folder to scan (default:EFTA)--ext: file extension to search for (default:.txt)--use-est-wallclock: interpret the Eastern time as a wallclock and set the file mtime so the file displays that Eastern local time on the current machine
This is useful because Git does not preserve the original Last-Modified timestamps from the DOJ responses; the script sets the local file timestamps to match the stored headers.