Add brief semgrep command to generate rules from sink data

Generate a Semgrep/Opengrep ruleset tailored to the detected project stack. brief doesn't become a scanner, it outputs `.semgrep.yml` that you run with an existing engine. The value is automated curation: a Django+SQLAlchemy project gets Django and SQLAlchemy rules, not Rails or GORM.

```
brief semgrep [flags] [path]
  --output FILE           Write to file (default stdout)
  --min-severity LEVEL    Filter rules (low/medium/high)
```

Each sink in the KB becomes a Semgrep rule with a pattern, language, severity, CWE metadata, and a message pulled from the sink note.

Pattern generation from a prototype against all 771 sinks:
- **675 structural patterns** (87%) — auto-generated from symbol names. `html_safe` becomes `$X.html_safe`, `requests.get` becomes `requests.get(...)`, `eval` becomes `eval(...)`. These work correctly as-is.
- **55 multi-patterns** (7%) — Ruby methods that can be called with or without parens, generating both `$X.method` and `$X.method(...)`.
- **41 regex fallbacks** (5%) — template syntax (`{{{`, `<%-`, `v-html`, `|safe`, `dangerouslySetInnerHTML`) that isn't code. Uses Semgrep's `pattern-regex` with `languages: [generic]`.

The prototype found three areas where the auto-generation needs refinement:
- `::` means module separator in Ruby, scope in C++, namespace in PHP, path in Rust. `Digest::MD5` should be `Digest::MD5.new(...)` in Ruby but `new Digest::MD5(...)` doesn't make sense. Needs per-language handling of `::` symbols.
- Capitalized single-word symbols like `ProcessStartInfo`, `Random`, `SqlCommand` are constructors in C#/Java but the heuristic treats them as method calls. Should generate `new X(...)` for those languages.
- Duplicate rule IDs from symbols like backtick and `!{` that produce empty slugs.

An optional `patterns` field on the Sink struct would let individual sinks override auto-generation for the ~15% of cases where heuristics aren't enough:

```toml
[[security.sinks]]
symbol = "where"
threat = "sql_injection"
cwe = "CWE-89"
note = "With string interpolation; safe with hash"
patterns = ['$MODEL.where("..." + $X)', '$MODEL.where("...#{...}")']
```

When present, use those patterns directly. When absent, fall back to auto-generation.

Ecosystem to Semgrep language mapping: ruby→ruby, python→python, node→javascript+typescript, go→go, java→java, php→php, csharp→csharp, rust→rust, kotlin→kotlin, scala→scala, swift→swift, c→c, cpp→cpp. Elixir, Dart, Perl, Lua fall back to generic with regex.

Severity mapping from threat IDs: `sql_injection`/`command_injection`/`code_injection`/`deserialization` → ERROR. `xss`/`ssrf`/`path_traversal`/`ssti` → WARNING. `weak_crypto`/`open_redirect`/`dos` → INFO. Could also add a `severity` field to the threat registry in `_threats.toml` so it's data not code.

Implementation:
- [ ] Add optional `patterns []string` field to `Sink` in `kb/kb.go`
- [ ] Add `severity` field to `ThreatDef` in `kb/kb.go` and seed in `_threats.toml`
- [ ] `detect/semgrep.go` — pattern generation heuristics, language mapping, rule assembly
- [ ] `report/semgrep.go` — YAML output (not JSON, Semgrep expects YAML)
- [ ] `cmd/brief/semgrep.go` — command wiring via `runDetection`
- [ ] Tests: pattern generation unit tests for each symbol class, golden-file test for YAML output shape
- [ ] Validate generated rules parse: `semgrep --validate --config .semgrep.yml`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add brief semgrep command to generate rules from sink data #39

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add brief semgrep command to generate rules from sink data #39

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions