Semantic prompt injection detection via LLM classifier

### Problem or use case

Current prompt injection detection uses regex pattern matching against known phrases ("ignore previous instructions", "you are now", etc.). Real-world attacks use paraphrasing, obfuscation, homoglyphs, and multi-step decomposition that regex cannot catch. SkillScan-Security ships a fine-tuned DeBERTa classifier for this. Trail of Bits demonstrated that moderate evasion effort bypasses all pattern-based detection.

### Proposed solution

Add optional `--deep-scan` mode using an offline ML classifier:

```bash
agentsec scan --deep-scan     # Enable semantic detection
```

Use a fine-tuned DeBERTa or DistilBERT adapter that classifies text segments as benign/injection. Runs fully offline (no API calls). Optional dependency (`agentsec-ai[ml]`).

Falls back to regex-only when ML dependency not installed.

### Area

Skill scanner / MCP scanner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantic prompt injection detection via LLM classifier #59

Problem or use case

Proposed solution

Area

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Semantic prompt injection detection via LLM classifier #59

Description

Problem or use case

Proposed solution

Area

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions