Problem or use case
Current prompt injection detection uses regex pattern matching against known phrases ("ignore previous instructions", "you are now", etc.). Real-world attacks use paraphrasing, obfuscation, homoglyphs, and multi-step decomposition that regex cannot catch. SkillScan-Security ships a fine-tuned DeBERTa classifier for this. Trail of Bits demonstrated that moderate evasion effort bypasses all pattern-based detection.
Proposed solution
Add optional --deep-scan mode using an offline ML classifier:
agentsec scan --deep-scan # Enable semantic detection
Use a fine-tuned DeBERTa or DistilBERT adapter that classifies text segments as benign/injection. Runs fully offline (no API calls). Optional dependency (agentsec-ai[ml]).
Falls back to regex-only when ML dependency not installed.
Area
Skill scanner / MCP scanner
Problem or use case
Current prompt injection detection uses regex pattern matching against known phrases ("ignore previous instructions", "you are now", etc.). Real-world attacks use paraphrasing, obfuscation, homoglyphs, and multi-step decomposition that regex cannot catch. SkillScan-Security ships a fine-tuned DeBERTa classifier for this. Trail of Bits demonstrated that moderate evasion effort bypasses all pattern-based detection.
Proposed solution
Add optional
--deep-scanmode using an offline ML classifier:agentsec scan --deep-scan # Enable semantic detectionUse a fine-tuned DeBERTa or DistilBERT adapter that classifies text segments as benign/injection. Runs fully offline (no API calls). Optional dependency (
agentsec-ai[ml]).Falls back to regex-only when ML dependency not installed.
Area
Skill scanner / MCP scanner