We'd like to work on a project that involves variable naming practices and vulnerabilities. Here's what we need.
For as many vulnerabilities as we know about, we need:
- CVE identifier
- Fix patch(s) in a file
- Full file that was vulnerable (i.e. before the fix)
- Full file that was fixed (i.e. after the fix)
Let's use a folder structure that looks like this. For example, say we have CVE-123 and CVE-456:
/
├── CVE-123/
│ ├── abcdef.patch
│ ├── abcdef-old/
│ │ └── file1.py
│ └── abcdef-new/
│ └── file1.py
└── CVE-456/
├── abc123.patch
├── abc123-old/
│ └── file2.py
└── abc123-new/
└── file2.py
Notes:
- The
abcdef and abc123 here are fix commit hashes. These will be 40-character hexadecimal strings normally.
- In the
-old and -new folders there's no need for recreating the original folder structure from the repo - just get the full copies of the files involved
- The
.patch files should be generated from the Git diff (which pydriller can calculate)
- All of these should be grabbed locally from repositories, not hitting an API - it'll be much faster that way.
I recommend pydriller for these. You can get most of what you need from this: https://pydriller.readthedocs.io/en/latest/modifiedfile.html#modifiedfile-toplevel
The two place we have vulnerabilities are:
We'd like to work on a project that involves variable naming practices and vulnerabilities. Here's what we need.
For as many vulnerabilities as we know about, we need:
Let's use a folder structure that looks like this. For example, say we have CVE-123 and CVE-456:
Notes:
abcdefandabc123here are fix commit hashes. These will be 40-character hexadecimal strings normally.-oldand-newfolders there's no need for recreating the original folder structure from the repo - just get the full copies of the files involved.patchfiles should be generated from the Git diff (which pydriller can calculate)I recommend pydriller for these. You can get most of what you need from this: https://pydriller.readthedocs.io/en/latest/modifiedfile.html#modifiedfile-toplevel
The two place we have vulnerabilities are:
cvesfolder for every.ymlfile. Parse thatymland get any non-empty entries from thefixeskey. That will give you the fix hash. For the original repos, just google how to get that repo cloned. Also: skip chromium (you'll thank me later...)drill_scriptsfolder for how we generated that in the first place.