Skip to content

feat(evaluators): ATR regex-based threat detection evaluator #169

@eeee2345

Description

@eeee2345

Problem

Agent Control's evaluator ecosystem has Cisco AI Defense (cloud API) and Galileo Luna (LLM-based), but no local, regex-based evaluator for detecting known AI agent threat patterns without API keys or network calls.

Proposed solution

A contrib evaluator using ATR (Agent Threat Rules) — community-maintained regex rules for AI agent threats.

# Usage
from agent_control_evaluator_atr.threat_rules import ATREvaluator, ATRConfig

evaluator = ATREvaluator(ATRConfig(
    min_severity="medium",
    categories=["prompt-injection", "tool-poisoning"],
))
result = await evaluator.evaluate("Ignore all previous instructions...")
# EvaluatorResult(matched=True, confidence=0.9, metadata={findings: [...]})

Key characteristics:

  • atr.threat_rules evaluator name, auto-discovered via entry points
  • 20 rules, 306 patterns covering OWASP Agentic Top 10
  • Configurable: min_severity, categories filter, block_on_match, on_error (fail-open/closed)
  • Pure regex, no API keys, <5ms evaluation
  • Returns all matching rules (not just first match) with metadata
  • Follows the Cisco evaluator pattern exactly (pyproject.toml, Makefile, entry points)
  • Rules maintained at agentthreatrule.org (MIT licensed)
  • ATR is already used by Cisco AI Defense

Willingness to contribute

Yes — full implementation ready with 22 tests covering detection, false-positive safety, config options, error handling, and multi-match behavior. Happy to submit a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions