Find and safely delete duplicate files β across two drives or within one. Zero dependencies, cross-platform, with undo.
π View the Roadmap β See what's planned for future versions
- π Smart Detection β SHA256 hashing finds true duplicates regardless of filename
- β‘ Performance β Two-pass scan: filter by size first, hash only size-collision candidates
- π‘οΈ Safety First β Always ask before deleting, create undo logs, detect read-only files
- π₯οΈ Cross-Platform β macOS, Linux, Windows with native progress bars (Rich UI + ANSI fallback)
- π Rich Reports β CSV/JSON output with file paths, sizes, hashes, and deletion recommendations
- π― Flexible Modes β Compare two drives, clean single drive, interactive deletion, batch operations
- βοΈ Zero Dependencies β Pure Python, optional Rich UI, works everywhere Python runs
- π¦ Multiple Install Options β pip, pipx, standalone binaries (Homebrew coming in v1.1)
diskcomp 1.0.0 is production-ready and actively maintained. The core deduplication engine has been tested with 285 comprehensive tests covering edge cases, cross-platform compatibility, and error handling.
- β Feature Complete β All planned v1.0 features implemented
- β Well Tested β 285 tests, CI on 3 platforms Γ 3 Python versions
- β Production Ready β Used for real data cleanup with safety guarantees
- β Cross-Platform β Native builds for macOS, Linux, Windows
- β Multiple Distribution Channels β PyPI, GitHub Releases (Homebrew coming in v1.1)
Download binary (no Python required):
macOS:
# Direct download (recommended)
curl -L -o diskcomp https://github.com/w1lkns/diskcomp/releases/latest/download/diskcomp-macos
chmod +x diskcomp
./diskcomp --help
# Homebrew (coming in v1.1)
# brew tap w1lkns/diskcomp
# brew install diskcompLinux:
# Download directly
curl -L -o diskcomp https://github.com/w1lkns/diskcomp/releases/latest/download/diskcomp-linux
chmod +x diskcomp
./diskcomp --helpWindows:
# Download diskcomp-windows.exe from GitHub Releases
# https://github.com/w1lkns/diskcomp/releases/latest
diskcomp-windows.exe --helpPython install (if you have Python):
pipx (recommended β handles PATH automatically):
pipx install diskcomp
diskcomp --helpDon't have pipx?
brew install pipxon macOS,pip install pipxelsewhere.
pip install:
pip install diskcomp
diskcomp --helpSingle-file version (no install, no dependencies):
curl -O https://raw.githubusercontent.com/w1lkns/diskcomp/main/diskcomp.py
python3 diskcomp.py --helpInteractive mode (no arguments β clears screen, shows menu):
diskcompThe launch menu offers:
1) Compare two drives
2) Clean up a single drive
3) Load previous report
4) Help
5) Quit
Compare two drives (command-line):
diskcomp --keep /Volumes/backup --other /Volumes/externalClean up a single drive (find internal duplicates):
diskcomp --single /Volumes/my-driveDry-run (count files without hashing):
diskcomp --keep /path/A --other /path/B --dry-runLoad a previous report (skip re-scanning):
diskcomp --delete-from ./diskcomp-report-20260322-235800.csvInteractive mode startup:
βββββββ ββββββββββββββ βββ βββββββ βββββββ ββββ βββββββββββ
ββββββββββββββββββββββ ββββββββββββββββββββββββββ βββββββββββββ
βββ βββββββββββββββββββββ βββ βββ ββββββββββββββββββββββ
βββ βββββββββββββββββββββ βββ βββ βββββββββββββββββββββ
ββββββββββββββββββββββ βββββββββββββββββββββββ βββ ββββββ
βββββββ ββββββββββββββ βββ βββββββ βββββββ βββ ββββββ
Find duplicates. Free space. Stay safe.
v1.0.0
What would you like to do?
1) Compare two drives
2) Clean up a single drive
3) Load previous report
4) Help
5) Quit
Progress display:
Drive Health: Keep=/Volumes/Photos (2TB APFS), Other=/Volumes/Backup (4TB NTFS)
Scanning: ββββββββββββββββββββββββββββββββ 1,847 files found
Hashing candidates: ββββββββββββββββββββββββββββββββββ 234/234 files (23.4 MB/s)
Found 42 duplicates. You could free 1.2 GB from /Volumes/Backup. Ready to review?
Interactive mode startup:
βββββββ ββββββββββββββ βββ βββββββ βββββββ ββββ βββββββββββ
ββββββββββββββββββββββ ββββββββββββββββββββββββββ βββββββββββββ
βββ βββββββββββββββββββββ βββ βββ ββββββββββββββββββββββ
βββ βββββββββββββββββββββ βββ βββ βββββββββββββββββββββ
ββββββββββββββββββββββ βββββββββββββββββββββββ βββ ββββββ
βββββββ ββββββββββββββ βββ βββββββ βββββββ βββ ββββββ
Find duplicates. Free space. Stay safe.
v1.0.0
What would you like to do?
1) Compare two drives
2) Clean up a single drive
3) Load previous report
4) Help
5) Quit
Progress display:
Drive Health: Keep=/Volumes/Photos (2TB APFS), Other=/Volumes/Backup (4TB NTFS)
Scanning: ββββββββββββββββββββββββββββββββ 1,847 files found
Hashing candidates: ββββββββββββββββββββββββββββββββββ 234/234 files (23.4 MB/s)
Found 42 duplicates. You could free 1.2 GB from /Volumes/Backup. Ready to review?
Your files are safe. diskcomp prioritizes safety over convenience:
- π No Automatic Deletion β Every file deletion requires explicit user confirmation
- π Undo Logs β Complete audit trail written before any file is deleted
β οΈ Read-Only Detection β Automatically detects and warns about read-only drives- π Dry-Run Mode β Preview operations without any file system changes
- βΉοΈ Abort Anytime β Press
Ctrl+Cat any prompt to stop safely - β¨ Interactive Mode β Review each file individually before deletion
- π SHA256 Verification β Cryptographic hashing ensures only true duplicates are identified
| Flag | Description | Example |
|---|---|---|
--keep PATH |
Path to the "keep" drive (files to retain). Required unless interactive. | --keep /Volumes/backup |
--other PATH |
Path to the "other" drive (duplicates deleted from here). Required unless interactive. | --other /Volumes/external |
--single PATH |
Scan one drive for internal duplicates (redundant copies on the same drive). | --single /Volumes/photos |
--dry-run |
Walk and count files without hashing (quick preview). | --dry-run |
--limit N |
Hash only first N files per drive (testing only). | --limit 100 |
--output PATH |
Custom report path (default: ~/diskcomp-report-YYYYMMDD-HHMMSS.csv). |
--output ./my-report.csv |
--format csv|json |
Report format: csv or json (default: csv). |
--format json |
--min-size SIZE |
Minimum file size to include (default: 1KB). Accepts bytes, KB, MB, GB. |
--min-size 10MB |
--delete-from PATH |
Load an existing report and start deletion workflow (skip re-scanning). | --delete-from ./diskcomp-report-20260322.csv |
--undo PATH |
View the audit log of a previous deletion session. | --undo ./diskcomp-undo-20260322.json |
-
Drive Health Checks (pre-scan, two-drive mode):
- Space summary for both drives
- Filesystem detection (HFS+, NTFS, ext4, exFAT, etc.)
- Read-only detection (warns if "keep" drive is read-only)
- Read speed benchmark (128MB)
- Optional SMART data (if
smartmontoolsavailable)
-
Scanning & Hashing:
- Walks drives recursively
- Skips OS noise (
.DS_Store,Thumbs.db,System Volume Information, etc.) - Two-pass optimization: size-filter candidates first, then SHA256 hash
- Live progress bar with speed and ETA
-
Reporting:
- CSV or JSON report saved to
~/diskcomp-report-YYYYMMDD-HHMMSS.{csv,json} - Atomic writes (temp β rename, safe against crashes mid-write)
- CSV or JSON report saved to
-
Deletion Workflow (optional):
- Mode A (Interactive): Shows both copies numbered
(1)and(2)β you pick which to delete, skip, or abort. Running space freed shown after each deletion. - Mode B (Batch): Dry-run preview with file type breakdown β type
DELETEto confirm β progress bar - Undo log written before each deletion (audit-first pattern)
- Always abortable with
Ctrl+C - Can re-run from a saved report without re-scanning (option 3 in menu or
--delete-from)
- Mode A (Interactive): Shows both copies numbered
-
Undo Log (
--undoflag):- JSON file listing all deleted files with paths, sizes, hashes, and timestamps
- Deletion is permanent β the log is an audit trail, not a restore mechanism
CSV format (default, spreadsheet-friendly):
status,original_file,duplicate_file,size_mb,verification_hash
DELETE_FROM_OTHER,/Volumes/keep/photos/pic1.jpg,/Volumes/other/photos/pic1.jpg,2.5,abc123...
UNIQUE_IN_KEEP,/Volumes/keep/docs/resume.pdf,,0.1,def456...
UNIQUE_IN_OTHER,,/Volumes/other/temp/junk.tmp,5.0,ghi789...| Column | Values |
|---|---|
status |
DELETE_FROM_OTHER, UNIQUE_IN_KEEP, UNIQUE_IN_OTHER |
original_file |
Path to the copy to keep |
duplicate_file |
Path to the copy to delete |
size_mb |
File size in MB |
verification_hash |
SHA256 hex string |
JSON format (programmatic use):
diskcomp --keep /Volumes/keep --other /Volumes/other --format jsonNTFS (Windows filesystem) drives are read-only on macOS and Linux by default:
- diskcomp can scan and identify duplicates on NTFS drives
- diskcomp cannot delete files from NTFS drives without a third-party driver
Workaround:
- macOS: ntfs-3g with macFUSE or Tuxera NTFS
- Linux:
sudo apt install ntfs-3g(Debian/Ubuntu) orsudo dnf install ntfs-3g(Fedora)
diskcomp detects this and warns during health checks.
Rich library β professional progress bars and color styling:
pip install diskcomp[rich]smartmontools β enables SMART data display:
- macOS:
brew install smartmontools - Linux:
apt-get install smartmontoolsorpacman -S smartmontools - Windows:
wmic logicaldisk(built-in, no install needed)
Without these, diskcomp uses ANSI progress bars and skips SMART data.
CI validates diskcomp on 9 combinations:
- macOS (latest) Γ Python 3.8, 3.10, 3.12
- Linux (Ubuntu latest) Γ Python 3.8, 3.10, 3.12
- Windows (latest) Γ Python 3.8, 3.10, 3.12
All tests pass and the single-file build is verified on each combination.
Run tests locally:
python -m pytest tests/Generate single-file version:
python build_single.py
python diskcomp.py --help- π Planning & Roadmap: View what's planned for future versions
- π Found a bug? Report it on GitHub Issues
- π‘ Feature request? Check the roadmap or share your idea
- π Documentation? Improve the README
- π§ Want to contribute code? Check good first issues or fork & submit a PR
β Like diskcomp? Star it on GitHub to show support!
MIT β See LICENSE file for details.