Skip to content

dortort/ai-tool-guard

Repository files navigation

AI Tool Guard

CI npm version TypeScript License: MIT Node.js Vercel AI SDK Docs

Policy enforcement middleware for Vercel AI SDK tool calls.

Guards, approvals, argument validation, rate limiting, output filtering, prompt-injection detection, MCP drift detection, and OpenTelemetry observability — as a composable middleware layer around your AI SDK tools.

Read the full documentation

npm install @dortort/ai-tool-guard

Quick start

import { createToolGuard, deny, requireApproval, defaultPolicy } from "@dortort/ai-tool-guard";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { tool } from "ai";
import { z } from "zod";

// 1. Define your tools as usual.
const getWeather = tool({
  description: "Get the weather for a city",
  parameters: z.object({ city: z.string() }),
  execute: async ({ city }) => `Weather in ${city}: sunny, 72°F`,
});

const deleteUser = tool({
  description: "Delete a user account",
  parameters: z.object({ userId: z.string() }),
  execute: async ({ userId }) => `User ${userId} deleted`,
});

// 2. Create a guard with policy rules.
const guard = createToolGuard({
  rules: defaultPolicy(),
  onApprovalRequired: async (token) => {
    console.log(`Approval needed for ${token.toolName}:`, token.originalArgs);
    return { approved: true, approvedBy: "admin" };
  },
  onDecision: (record) => {
    console.log(`[${record.verdict}] ${record.toolName}: ${record.reason}`);
  },
});

// 3. Wrap tools with per-tool risk levels.
const tools = guard.guardTools({
  getWeather: { tool: getWeather, riskLevel: "low" },
  deleteUser: { tool: deleteUser, riskLevel: "high" },
});

// 4. Use with AI SDK as normal.
const result = await generateText({
  model: openai("gpt-4o"),
  tools,
  prompt: "What's the weather in Tokyo?",
});

Features

Feature Description
Policy engine Rule-based allow/deny/require-approval with glob patterns, risk levels, priorities, and async conditions
External policy backends Adapter interface for OPA/Rego, Cedar, or custom ABAC engines
Decision records Structured audit output for every evaluation (matched rules, risk category, attributes, redactions)
Dry-run / simulation Evaluate policies across recorded traces without executing tools
Conversation-aware policies Policies can incorporate session risk score, prior failures, recent approvals
Approve with edits Approval handler can patch arguments before execution
Approval correlation Payload-hash tokens with TTL prevent mismatch between request and resolution
Argument guards Zod schemas, allowlists, denylists, regex, PII scanning per field
Injection detection Heuristic prompt-injection detector that can deny or downgrade to approval
Output filtering Secrets stripping, PII redaction, custom filters on tool results
Rate limiting Sliding-window rate limits + concurrency caps with reject or queue backpressure
OpenTelemetry Opinionated spans for policy eval, approval wait, tool execution, redaction
MCP drift detection SHA-256 schema fingerprinting, drift detection, actionable remediation

Architecture

Every guarded tool call passes through a 7-stage execution pipeline: injection detection, argument validation, policy evaluation, approval flow, rate limiting, tool execution, and output filtering. Each stage emits an OpenTelemetry span.

See the architecture overview for the full pipeline diagram.

API reference

createToolGuard(options)

Creates a ToolGuard instance. All options are optional.

interface GuardOptions {
  rules?: PolicyRule[];           // Built-in policy rules
  backend?: PolicyBackend;        // External policy backend
  defaultRiskLevel?: RiskLevel;   // Default risk for unconfigured tools ("low")
  onApprovalRequired?: ApprovalHandler;  // Approval callback
  injectionDetection?: InjectionDetectorConfig;
  defaultRateLimit?: RateLimitConfig;
  defaultMaxConcurrency?: number;
  otel?: OtelConfig;
  dryRun?: boolean;               // Simulation mode
  onDecision?: (record: DecisionRecord) => void | Promise<void>;
  resolveUserAttributes?: () => Record<string, unknown> | Promise<Record<string, unknown>>;
  resolveConversationContext?: () => ConversationContext | Promise<ConversationContext>;
}

guard.guardTool(name, tool, config?)

Wrap a single AI SDK tool.

const guarded = guard.guardTool("sendEmail", sendEmailTool, {
  riskLevel: "medium",
  riskCategories: ["network", "pii"],
  argGuards: [piiGuard("body")],
  outputFilters: [secretsFilter()],
  rateLimit: { maxCalls: 10, windowMs: 60_000 },
  maxConcurrency: 2,
});

guard.guardTools(map)

Wrap multiple tools at once. Returns a flat tools map compatible with generateText({ tools }).

const tools = guard.guardTools({
  readFile:  { tool: readFileTool,  riskLevel: "low" },
  writeFile: { tool: writeFileTool, riskLevel: "high", requireApproval: true },
  search:    { tool: searchTool },
});

Policy rules

Built-in rule builders

import { allow, deny, requireApproval } from "@dortort/ai-tool-guard";

const rules = [
  allow({ tools: "read*", description: "Allow all read tools" }),
  requireApproval({ tools: "write*", riskLevels: ["medium", "high"] }),
  deny({
    tools: "delete*",
    condition: (ctx) => ctx.userAttributes.role !== "admin",
    description: "Only admins can delete",
    priority: 10,
  }),
];

Preset policies

import { defaultPolicy, readOnlyPolicy } from "@dortort/ai-tool-guard";

// low → allow, medium → require-approval, high/critical → deny
const rules = defaultPolicy();

// Allow specific tools, deny everything else
const rules = readOnlyPolicy(["getUser", "listItems", "search*"]);

External policy backend (OPA, Cedar, custom)

import type { PolicyBackend } from "@dortort/ai-tool-guard";

const opaBackend: PolicyBackend = {
  name: "opa",
  async evaluate(ctx) {
    const res = await fetch("http://opa:8181/v1/data/tool_policy", {
      method: "POST",
      body: JSON.stringify({ input: ctx }),
    });
    const data = await res.json();
    return {
      verdict: data.result.allow ? "allow" : "deny",
      reason: data.result.reason,
      matchedRules: data.result.matched_rules ?? [],
    };
  },
};

const guard = createToolGuard({ backend: opaBackend });

Approval flow

The approval handler receives an ApprovalToken and returns an ApprovalResolution.

Basic approval

const guard = createToolGuard({
  rules: [requireApproval({ tools: "payment*" })],
  onApprovalRequired: async (token) => {
    const answer = await askUser(
      `Allow ${token.toolName} with args ${JSON.stringify(token.originalArgs)}?`
    );
    return { approved: answer === "yes" };
  },
});

Approve with edits (parameter patching)

onApprovalRequired: async (token) => {
  // User can modify the amount before approving
  const editedAmount = await showEditableForm(token.originalArgs);
  return {
    approved: true,
    patchedArgs: { amount: editedAmount },
    approvedBy: "finance-team",
  };
},

The ApprovalToken includes a payloadHash for correlation — the SHA-256 of the canonical { toolName, args } object. This prevents mismatch bugs when message history is reshaped.


Argument guards

Validate tool arguments before policy evaluation.

import {
  zodGuard, allowlist, denylist, regexGuard, piiGuard
} from "@dortort/ai-tool-guard";
import { z } from "zod";

const guarded = guard.guardTool("queryDb", queryTool, {
  argGuards: [
    // Zod schema validation
    zodGuard({ field: "limit", schema: z.number().int().min(1).max(100) }),

    // Allowlist
    allowlist("table", ["users", "orders", "products"]),

    // Denylist
    denylist("operation", ["DROP", "TRUNCATE"]),

    // Regex: must match allowed domain
    regexGuard("url", /^https:\/\/.*\.example\.com/, {
      message: "Only example.com URLs are allowed",
    }),

    // Regex: must NOT match forbidden pattern
    regexGuard("query", /DROP\s+TABLE/i, {
      mustMatch: false,
      message: "SQL injection detected",
    }),

    // PII scanning
    piiGuard("userInput", { allowedTypes: ["email"] }),
  ],
});

Guards support dot-path field access for nested arguments:

allowlist("config.region", ["us-east-1", "eu-west-1"])

Output filtering

Control what comes back from tool execution.

import { secretsFilter, piiOutputFilter, customFilter } from "@dortort/ai-tool-guard";

const guarded = guard.guardTool("fetchData", fetchTool, {
  outputFilters: [
    // Strip AWS keys, GitHub tokens, JWTs, API keys, bearer tokens, private keys
    secretsFilter(),

    // Redact emails, SSNs, phone numbers, credit card numbers
    piiOutputFilter({ allowedTypes: ["email"] }),

    // Custom filter
    customFilter("size-limit", async (result) => {
      const str = JSON.stringify(result);
      if (str.length > 100_000) {
        return { verdict: "block", output: null };
      }
      return { verdict: "pass", output: result };
    }),
  ],
});

Filters run in order after tool execution. If any filter returns "block", the filter chain stops, the tool result is discarded, and a ToolGuardError is thrown.


Injection detection

Heuristic prompt-injection detection at the tool boundary.

const guard = createToolGuard({
  injectionDetection: {
    threshold: 0.5,    // Suspicion score 0-1
    action: "deny",    // "deny" | "downgrade" | "log"
  },
});
  • deny — Block the tool call entirely.
  • downgrade — Convert the call to require approval.
  • log — Allow but flag in the decision record.

Custom detectors (e.g., LLM-as-judge):

injectionDetection: {
  threshold: 0.7,
  action: "downgrade",
  detect: async (args) => {
    const score = await myLlmJudge(JSON.stringify(args));
    return score; // 0-1
  },
},

Rate limiting and concurrency

const guard = createToolGuard({
  // Global defaults
  defaultRateLimit: { maxCalls: 100, windowMs: 60_000, strategy: "reject" },
  defaultMaxConcurrency: 5,
});

// Per-tool overrides
guard.guardTool("expensiveApi", tool, {
  rateLimit: { maxCalls: 5, windowMs: 60_000, strategy: "queue" },
  maxConcurrency: 1,
});
  • reject — Immediately throw ToolGuardError with code "rate-limited".
  • queue — Wait for a slot to become available (backpressure).

Dry-run / simulation mode

Evaluate policies without executing tools.

Global dry-run

const guard = createToolGuard({ dryRun: true, rules: [...] });
// All tool calls return { dryRun: true, toolName, args } instead of executing.

Trace simulation

import { simulate } from "@dortort/ai-tool-guard";

const result = await simulate(
  [
    { toolName: "readFile", args: { path: "/etc/passwd" } },
    { toolName: "deleteUser", args: { id: "123" } },
    { toolName: "getWeather", args: { city: "NYC" } },
  ],
  { rules: defaultPolicy() },
  {
    readFile: { riskLevel: "medium" },
    deleteUser: { riskLevel: "critical" },
    getWeather: { riskLevel: "low" },
  },
);

console.log(result.summary);
// { total: 3, allowed: 1, denied: 1, requireApproval: 1 }

console.log(result.blocked);
// [{ toolCall: { toolName: "deleteUser", ... }, decision: { verdict: "deny", ... } }, ...]

Decision records

Every policy evaluation produces a structured DecisionRecord:

interface DecisionRecord {
  id: string;                    // Unique correlation id
  timestamp: string;             // ISO-8601
  verdict: "allow" | "deny" | "require-approval";
  toolName: string;
  matchedRules: string[];        // Rule ids that matched
  riskLevel: RiskLevel;
  riskCategories: RiskCategory[];
  attributes: Record<string, unknown>;  // User attributes consumed
  reason: string;                // Human-readable explanation
  redactions?: string[];         // Fields redacted in output
  evalDurationMs: number;        // Policy eval time
  dryRun: boolean;
}

Subscribe via onDecision:

const guard = createToolGuard({
  onDecision: (record) => {
    auditLog.write(record);
    if (record.verdict === "deny") {
      alerting.fire("tool-denied", record);
    }
  },
});

Conversation-aware policies

Policies can incorporate conversation metadata for contextual decisions.

const guard = createToolGuard({
  resolveConversationContext: () => ({
    sessionId: currentSession.id,
    riskScore: currentSession.riskScore,
    priorFailures: currentSession.failureCount,
    recentApprovals: currentSession.approvedTools,
  }),
  rules: [
    deny({
      tools: "*",
      condition: (ctx) => (ctx.conversation?.riskScore ?? 0) > 0.8,
      description: "Block all tools when conversation risk is high",
    }),
    requireApproval({
      tools: "*",
      condition: (ctx) => (ctx.conversation?.priorFailures ?? 0) > 3,
      description: "Require approval after repeated failures",
    }),
  ],
});

MCP drift detection

Pin tool schemas and detect when MCP servers change.

import {
  pinFingerprint, detectDrift, FingerprintStore
} from "@dortort/ai-tool-guard/mcp";

// Pin fingerprints for your MCP tools
const store = new FingerprintStore();
store.set(await pinFingerprint("readFile", "fs-server", readFileSchema, "production"));
store.set(await pinFingerprint("queryDb", "db-server", queryDbSchema, "production"));

// Before using tools, check for drift
const result = await detectDrift(store.getAll(), [
  { toolName: "readFile", serverId: "fs-server", schema: currentReadFileSchema },
  { toolName: "queryDb",  serverId: "db-server",  schema: currentQueryDbSchema },
]);

if (result.drifted) {
  for (const change of result.changes) {
    console.error(change.remediation);
    // "Tool "queryDb" from server "db-server" has changed since it was pinned
    //  at 2025-01-15T... Expected hash: a1b2c3..., got: d4e5f6...
    //  Re-pin with pinFingerprint() after reviewing the schema change."
  }
  throw new Error("MCP schema drift detected. Aborting.");
}

Persist fingerprints:

// Export to file
fs.writeFileSync("fingerprints.json", store.export());

// Import from file
store.import(fs.readFileSync("fingerprints.json", "utf-8"));

OpenTelemetry

Automatic spans when @opentelemetry/api is installed.

const guard = createToolGuard({
  otel: {
    enabled: true,
    tracerName: "my-app",
    defaultAttributes: { "service.name": "ai-agent" },
  },
});

Spans emitted:

Span name When Key attributes
ai_tool_guard.policy_eval Every policy evaluation tool.name, tool.risk_level, decision.verdict, decision.reason
ai_tool_guard.tool_execute Tool execution tool.name
ai_tool_guard.approval_wait Waiting for approval tool.name, approval.token_id
ai_tool_guard.injection_check Injection suspected injection.score, injection.suspected
ai_tool_guard.rate_limit Rate limit hit rate_limit.allowed
ai_tool_guard.output_filter Output redacted/blocked output.redacted, output.blocked

All attribute keys are exported as ATTR for custom span creation.


Error handling

All guard failures throw ToolGuardError with a machine-readable code:

import { ToolGuardError } from "@dortort/ai-tool-guard";

try {
  await generateText({ model, tools, prompt: "..." });
} catch (err) {
  // AI SDK wraps tool errors in ToolExecutionError — unwrap with .cause
  const cause = err instanceof Error ? (err as { cause?: unknown }).cause : err;
  if (cause instanceof ToolGuardError) {
    switch (cause.code) {
      case "policy-denied":         // Policy rule blocked the call
      case "approval-denied":       // Human denied approval
      case "no-approval-handler":   // Approval required but no handler set
      case "arg-validation-failed": // Argument guard failed
      case "injection-detected":    // Prompt injection suspected
      case "rate-limited":          // Rate limit exceeded
      case "output-blocked":        // Output filter blocked the result
      case "mcp-drift":             // MCP schema drift detected
    }
    console.log(cause.toolName);   // Which tool
    console.log(cause.decision);   // Full DecisionRecord (if available)
  }
}

TypeScript

The library is written in TypeScript and exports all types:

import type {
  // Core
  RiskLevel, RiskCategory, DecisionVerdict, DecisionRecord,
  PolicyContext, ConversationContext, GuardOptions,
  // Policy
  PolicyRule, PolicyBackend, PolicyBackendResult,
  // Tools
  ToolGuardConfig, AiSdkTool,
  // Guards
  ArgGuard, ZodArgGuard, OutputFilter, OutputFilterResult,
  // Approval
  ApprovalToken, ApprovalResolution, ApprovalHandler,
  // Rate limiting
  RateLimitConfig, RateLimitState,
  // Injection
  InjectionDetectorConfig,
  // MCP
  McpToolFingerprint, McpDriftResult, McpDriftChange,
  // OTel
  OtelConfig,
} from "@dortort/ai-tool-guard";

Subpath exports

import { evaluatePolicy, allow, deny } from "@dortort/ai-tool-guard/policy";
import { ApprovalManager } from "@dortort/ai-tool-guard/approval";
import { zodGuard, secretsFilter, RateLimiter } from "@dortort/ai-tool-guard/guards";
import { createTracer, ATTR } from "@dortort/ai-tool-guard/otel";
import { detectDrift, FingerprintStore } from "@dortort/ai-tool-guard/mcp";

Examples

Full worked examples are available in the documentation:

  • Next.js Integration — App Router setup with per-tool config, approval flow, and error mapping
  • Chatbot Safety — Multi-layered defense for a customer support chatbot (5 risk levels, injection detection, PII redaction)
  • Multi-Tenant Policies — SaaS platform with plan/role-based access and per-tenant audit logs
  • Audit Logging — Structured audit system with denial alerting and OpenTelemetry correlation
  • MCP Drift Detection — Schema fingerprinting, drift detection, and environment-scoped pinning
  • Simulation & Testing — Policy validation with recorded traces and CI/CD integration

License

MIT

About

Policy enforcement middleware for AI SDK tool calls — guards, approvals, rate limiting, and observability.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors