A curated collection of high‑quality malware and benign datasets for cybersecurity researchers, AI Cybersecurity researchers, machine learning, and malware analysis.
| Dataset | Year | Size / Samples | Labels | Format | Description | Access | Link |
|---|---|---|---|---|---|---|---|
| VirusShare | 2010–Present | Millions | Malware families | Binary | Large malware binary archive (requires access by request) | Required | Link |
| Malimg | 2011 | ~9,458 images | Malware families | Image | Grayscale images for malware classification | Public | Link |
| Android Malware Genome | 2012 | 1,260 malware | Malware Families | APK | Historic dataset of early Android malware | Public | Link |
| Virus-MNIST | 2017 | ~10,000 images | Malware | Image | Dataset for malware detection using image-based methods | Public | Link |
| CICAndMal2017 | 2017 | 10,854 samples | Adware, Ransomware, Scareware, SMS Malware | APK, PCAP, CSV | Real-device collected malware samples with network and behavior data | Public | Link |
| CIC-AAGM2017 | 2017 | 1,900 apps | Adware, General Malware, Benign | .pcap, .csv | Real-device collected network traffic from Android adware and general malware apps | Public | Link |
| Microsoft Malware Prediction | 2019 | ~8M rows | Binary labels | CSV | Contains Windows system telemetry data for predicting malicious or benign files based on system behavior. | Public | Link |
| Malware Bazaar | 2019–Present | 10M+ samples | Malware | Binary | Community malware sample exchange | Public | Link |
| VxHeaven | 2019 | 595 to 2955 files | Malware/Benign | CSV | Static and dynamic features extracted from VxHeaven and VirusTotal datasets, with 1087 features for classification | Public | Link |
| SOREL-20M | 2020 | ~20M samples (8TB) | Malicious/Benign | Binary/Features | Large scale benchmark dataset for malicious PE detection, including malware samples, feature vectors, and models. | Public | Link |
| DikeDataset | 2020 | (PE binaries) | Malware/Benign | Binary | PE binaries | Public | Link |
| Dumpware 10 | 2020 | ~4,294 images | Malware/Benign | Image (RGB) | Malware images | Public | Link |
| MalMem-2022 | 2022 | 29,298 benign, 29,298 malicious | Malware/Benign | Binary | Memory analysis dataset for obfuscated malware detection using memory dumps | Public | Link |
| MalRadar | 2022 | 4,534 malware samples | Malware | Various | A growing Android malware dataset, manually verified, containing 4,534 samples across 121 families | Restricted | Link |
| CIC-Evasive-PDFMal2022 | 2022 | 10,025 records | Malicious/Benign | CSV, PDF | A dataset with 5,557 malicious and 4,468 benign PDF records that attempt to evade common detection techniques. | Public | Link |
| Microsoft BIG 2015 | 2015 | ~20K | Malware types | Binary | PE malware binaries | Public | Link |
| EMBER2017-2018 | 2018 | ~3.2M files | Malware/Benign | Features/metadata | Large public benchmark for malware classifiers | Public | Link |
| BODMAS | 2021 | 57,293 malware, 77,142 benign | Malware/Benign | Binary | Blue Hexagon dataset with malware samples and family info | Required | Link |
| EMBER2024 (New Benchmark) | 2025 | ~3.2M files | Malware/Benign | Features/metadata | Large public benchmark for malware classifiers | Public | Link |
| Android-Malware-2023 (AIM-2023) | 2023 | 250K apps | Malware/Benign | APK, CSV | New Android malware + benign apps with detailed metadata | Public | Link |
| AndroZoo (2022+) | 2022 | 25M+ samples | Malware/Benign | APK | The largest Android dataset; malware + benign apps | Restricted | Link |
| Kronodroid | N/A | 70,000+ samples | Malware/Benign | CSV, APK | A dataset designed to study concept drift and cross-device detection issues, with 289 dynamic and 200 static features | Public | Link |
| ContagioDump | N/A | N/A | Malware | Binary | Collection of malware samples for research | Public | Link |
| Dataset | Year | Size / Samples | Labels | Format | Description | Access | Link |
|---|---|---|---|---|---|---|---|
| Android Malware Genome | 2012 | 1,260 malware | Malware Families | APK | Historic dataset of early Android malware | Public | Link |
| Drebin | 2014 | 5,560 malware | Malware | APK, Features | One of the most famous Android malware datasets | Public | Link |
| CICAndMal2017 | 2017 | 10,854 samples | Adware, Ransomware, Scareware, SMS Malware | APK, PCAP, CSV | Real-device collected malware samples with network and behavior data | Public | Link |
| CIC-AAGM2017 | 2017 | 1,900 apps | Adware, General Malware, Benign | .pcap, .csv | Real-device collected network traffic from Android adware and general malware apps | Public | Link |
| PRAGuard Android Dataset | 2017 | 25K apps | Malware/Benign | APK, CFG | Focuses on obfuscation + packed apps | Public | Link |
| Kronodroid | N/A | 70,000+ samples | Malware/Benign | CSV, APK | A dataset designed to study concept drift and cross-device detection issues, with 289 dynamic and 200 static features | Public | Link |
| CICMalDroid-2020 | 2020 | 17,341 samples | Adware, Banking, SMS, Riskware, Benign | APK, CSV | Comprehensive Android malware dataset with dynamic and static features | Public | Link |
| MalNet | 2021 | 1,262,024 images/graphs | Malware/Benign | Image/Graph | Large-scale dataset of Android malware with function call graphs and images | Public | Link |
| MalRadar | 2022 | 4,534 malware samples | Malware | Various | A growing Android malware dataset, manually verified, containing 4,534 samples across 121 families | Restricted | Link |
| CIC-Evasive-PDFMal2022 | 2022 | 10,025 records | Malicious/Benign | CSV, PDF | A dataset with 5,557 malicious and 4,468 benign PDF records that attempt to evade common detection techniques. | Public | Link |
| AMD 2.0 | 2022 | 150K malware | Malware Families | APK, JSON | Updated AMD with modern Android malware families | Public | Link |
| AndroZoo (2022+) | 2022 | 25M+ samples | Malware/Benign | APK | The largest Android dataset; malware + benign apps | Restricted | Link |
| Android-Malware-2023 (AIM-2023) | 2023 | 250K apps | Malware/Benign | APK, CSV | New Android malware + benign apps with detailed metadata | Public | Link |
| Dataset | Year | Size / Samples | Labels | Format | Description | Access | Link |
|---|---|---|---|---|---|---|---|
| Malicious PDF Generator | 2020 | 10 PDFs (generated) | Malicious | Generates 10 different malicious PDFs for penetration testing with phone-home functionality. | Public | Link | |
| Dike | 2020 | 1,871 documents | Malware/Benign | doc, docx, docm, xls, xlsx, xlsm, ppt, pptx, pptm | A dataset containing various document formats (doc, xls, ppt) for malware detection. | Public | Link |
| CIC-Evasive-PDFMal2022 | 2022 | 10,025 records | Malicious/Benign | CSV, PDF | A dataset with 5,557 malicious and 4,468 benign PDF records that attempt to evade common detection techniques. | Public | Link |
Contributions are always welcome 🤝
Thank you for helping make this project better! Please review the Contribution Guidelines.
This repository is licensed under a Creative Commons Attribution 4.0 International License.



