William Johnson, Kellan Christ (we're open to more members interested)
The bottle neck is software security lies in the supply chain. Developers often utilized code by other developers through packages, code, libraries, but instances of vulnerabilities and exploits may be oblivious to these developers. Additionally, packages may not be made up to date when new vulnerabilities are discovered.
With CVE Identification (and Secure Git), the goal is to determine a point of interest (or multiple points of interest) for a CVE. There are two main procedures:
- Look for a file with the vulnerability and keep metadata (including hash) of file (i.e. grep)
- Create a hashmap to map the interest point to other projects across Github found through World of Code.
(Preliminary) Naive Identification:
- Use hash file to find exact hash match with another file (i.e. when file of project that is directly cloned has not been modified)
Advanced Identification:
- Find specific code snippets containing vulnerability
- Sliding window to look for range of tokens to find vulnerable code
- Sandboxed fuzzing? (Requires executing code)
Collection:
- CVE collection prioritized by most critical vulnerabilities to least
- One instance of CVE links to collection of Github projects with vulnerability
Cleaning:
- Remove projects with orphaned files or files not utilized
- Remove projects that have least activity
- Implementation of sgit utility (a layer built on top of git)
- Example: Verifies project and asks user to confirm, before cloning
- Operations: Hash lookup, secure scan
- Build hashmap database and CVEs mapped
- Collect metrics (what % of projects have vulnerabilities)
- Secure Git utility