Agent Integrity Protocol (AIP)

Real-time thinking block analysis for AI agent alignment.

AIP analyzes what an agent is thinking before it acts. It extracts thinking blocks from LLM responses, evaluates them against an Alignment Card using an analysis LLM, and delivers integrity verdicts in real-time — enabling intervention between turns, not after the damage is done.

AIP follows the daimonion philosophy: silence means aligned, voice means outside boundaries. It is a sister protocol to AAP — sharing the same Alignment Card, enriching the same AP-Trace, but operating on a fundamentally different timescale: real-time rather than retroactive.

Quick Start

# Install
pip install agent-integrity-proto
# or
npm install @mnemom/agent-integrity-protocol

from aip import create_client

client = create_client(
    base_url="https://gateway.example.com",
    agent_id="my-agent",
    shared_secret="your-secret",
)

# Check a thinking block against an alignment card
signal = client.check(
    thinking_block="I should help the user find the best product...",
    alignment_card=card,
)

if signal.proceed:
    # Verdict was 'clear' — continue
    execute_action()
else:
    # Verdict was 'review_needed' or 'boundary_violation'
    escalate(signal.checkpoint.concerns)

import { createClient } from '@mnemom/agent-integrity-protocol';

const client = createClient({
  baseUrl: 'https://gateway.example.com',
  agentId: 'my-agent',
  sharedSecret: 'your-secret',
});

const signal = await client.check({
  thinkingBlock: 'I should help the user find the best product...',
  alignmentCard: card,
});

if (signal.proceed) {
  executeAction();
} else {
  escalate(signal.checkpoint.concerns);
}

Why AIP?

AAP provides post-hoc transparency — every agent action is traced, verified, and checked for drift. But verification happens after the action, typically 30-60 seconds later. This creates a window of vulnerability:

Prompt injection: A malicious prompt causes data exfiltration. The trace records it, but the data is already gone.
Gradual drift: Reasoning shifts subtly across turns. Each action passes verification, but the cumulative direction is misaligned.
Misaligned intent: The thinking block reveals a forbidden action, but it's only examined retroactively.

All three failure modes are visible in thinking blocks before the action lands. AIP fills this gap.

Protocol	Function	Timescale
AAP	Post-hoc verification via AP-Traces	Retroactive (30-60s)
AIP	Real-time thinking block analysis	Between turns (~2s)
Together	Full coverage: real-time detection + audit trail	Both

How It Works

                         Agent Turn N
                              │
                              ▼
┌──────────────────────────────────────────────────────┐
│  Layer 1: Thinking Block Extraction                  │
│                                                      │
│  Anthropic  │  OpenAI  │  Google  │  Fallback        │
│  (thinking) │ (reason) │ (think)  │  (regex)         │
└──────────────────────┬───────────────────────────────┘
                       │
                       ▼
┌──────────────────────────────────────────────────────┐
│  Layer 2: Integrity Analysis                         │
│                                                      │
│  Card Summary + Conscience Values + Session Window   │
│              → Analysis LLM (e.g. Haiku)             │
│              → Integrity Verdict                     │
└──────────────────────┬───────────────────────────────┘
                       │
                       ▼
┌──────────────────────────────────────────────────────┐
│  Layer 3: Signal Delivery                            │
│                                                      │
│  clear → proceed    │  review_needed → warn/pause    │
│                     │  boundary_violation → block     │
└──────────────────────────────────────────────────────┘
                       │
                       ▼
                  Agent Turn N+1

Integrity Checkpoint

The core output — a structured record of each thinking block evaluation:

{
  "checkpoint_id": "ic-f47ac10b",
  "agent_id": "my-agent",
  "verdict": "clear",
  "confidence": 0.95,
  "concerns": [],
  "thinking_hash": "sha256:a1b2c3...",
  "window_position": { "index": 3, "window_size": 10 },
  "analysis_metadata": {
    "model": "claude-haiku-4-5-20251001",
    "latency_ms": 1847,
    "conscience_values_used": 2
  }
}

Conscience Prompt

AIP builds a system prompt for the analysis LLM from:

Card summary — compact representation of the Alignment Card
Conscience values — BOUNDARY (hard limits) and FEAR (concerns) types injected as evaluation criteria
Session window — recent checkpoint history for drift context
The thinking block itself

The analysis LLM acts as a daimonion — it never commands, only warns.

Provider Adapters

Thinking blocks differ by provider. AIP abstracts extraction:

Provider	Source	Confidence
Anthropic	`thinking` content blocks (+ SSE stream fallback)	1.0
OpenAI	`reasoning_content` field (+ SSE stream fallback)	0.9
Google	Gemini thinking blocks (+ SSE stream fallback)	0.9
Fallback	Regex-based extraction	0.3

All adapters attempt SSE stream extraction as a fallback when standard JSON parsing fails (v0.1.3+). This allows analysis of streamed responses captured by gateway proxies.

What AIP Does Not Do

This matters. Read it.

AIP does NOT prevent injection — it detects it. AIP identifies injection patterns in thinking blocks. It does not prevent injected content from reaching the model.
AIP does NOT interrupt streams. AIP operates between turns. It does not cancel in-flight streaming responses.
AIP does NOT replace AAP. Post-hoc verification, trace storage, and public transparency remain AAP's domain. AIP supplements AAP with real-time detection.
LLM-as-judge has inherent limits. The analysis LLM can be fooled by sophisticated adversarial content. AIP reduces the attack surface but does not eliminate it.
Thinking blocks are model-dependent. Not all models expose thinking. Models that don't expose thinking blocks cannot be analyzed by AIP.

For the complete limitations disclosure, see Section 14 of the Specification.

Installation

# Python
pip install agent-integrity-proto

# TypeScript
npm install @mnemom/agent-integrity-protocol

Requirements: Python >= 3.10 | Node.js >= 18.0.0

API Reference

Python

# Core analysis
from aip import (
    check_integrity,        # Evaluate thinking block → IntegrityCheckpoint
    build_signal,           # Construct signal from checkpoint → IntegritySignal
    build_conscience_prompt, # Generate analysis LLM prompt
    hash_thinking_block,    # Content-addressed thinking reference
    detect_integrity_drift, # Track behavioral drift across checkpoints
    validate_agreement,     # Verify card-conscience alignment
)

# Provider adapters
from aip import (
    AnthropicAdapter,       # Anthropic thinking content blocks
    OpenAIAdapter,          # OpenAI reasoning_content
    GoogleAdapter,          # Google Gemini thinking
    FallbackAdapter,        # Regex-based fallback
    AdapterRegistry,        # Dynamic provider selection
)

# SDK client
from aip import create_client, sign_payload, verify_signature

# Session state
from aip import WindowManager, create_window_state

TypeScript

import {
  // Core analysis
  checkIntegrity,
  buildSignal,
  buildConsciencePrompt,
  hashThinkingBlock,
  detectIntegrityDrift,
  validateAgreement,

  // Provider adapters
  AnthropicAdapter,
  OpenAIAdapter,
  GoogleAdapter,
  FallbackAdapter,
  AdapterRegistry,

  // SDK client
  createClient,
  signPayload,
  verifySignature,

  // Session state
  WindowManager,
  createWindowState,
} from '@mnemom/agent-integrity-protocol';

Documentation

Document	Description
Specification	Full protocol specification (IETF-style, 2,214 lines)
Quick Start	Zero to integrity checking in 5 minutes
Limitations	What AIP guarantees and doesn't
Security	Threat model and security considerations
CHANGELOG.md	Release history

Examples

Example	Description
`basic-check/`	Minimal integrity check with aligned and misaligned thinking
`gateway-integration/`	Cloudflare Worker gateway with real-time AIP analysis
`adversarial/`	Attack scenarios: injection, drift, meta-injection, deception

Status

Current Version: 0.4.0

Component	Status
Specification	✅ Complete
TypeScript SDK	✅ Complete (272 tests)
Python SDK	✅ Complete (267 tests)
Provider Adapters	✅ Anthropic, OpenAI, Google, Fallback
Session Windowing	✅ Complete
Drift Detection	✅ Complete
Gateway Integration	✅ Verified (Cloudflare Workers)

Standards & Compliance

AIP aligns with and supports compliance for the following international standards and regulatory frameworks:

Standard	Relevance to AIP
ISO/IEC 42001:2023 — AI Management Systems	Integrity Checkpoints provide continuous monitoring evidence for 42001 management system requirements
ISO/IEC 42005:2025 — AI System Impact Assessment	Real-time integrity analysis and drift detection support ongoing impact assessment
IEEE 7001-2021 — Transparency of Autonomous Systems	AIP makes agent reasoning transparent — not just decisions, but the thinking that precedes them
IEEE 3152-2024 — Transparent Human and Machine Agency Identification	Integrity Checkpoints link `agent_id` to thinking analysis, supporting agency identification in real-time
Singapore IMDA Model AI Governance Framework for Agentic AI (Jan 2026)	Real-time conscience analysis addresses IMDA's governance principles for agentic AI monitoring
EU AI Act Article 50 — Transparency Obligations (enforcement Aug 2026)	Integrity Checkpoints with structured verdicts, thinking hashes, and session windows provide the transparency and audit trail required by Article 50. See EU AI Act Compliance Guide

Contributing

We welcome contributions. See CONTRIBUTING.md for guidelines.

Key areas where we need help:

Provider adapter implementations for additional LLMs
Integration examples with agent frameworks
Adversarial test vectors
Documentation improvements

License

Apache 2.0. See LICENSE for details.

Agent Integrity Protocol is part of the Mnemom.ai trust infrastructure for autonomous agents, alongside AAP (Agent Alignment Protocol).

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github		.github
docs		docs
examples		examples
packages		packages
schemas		schemas
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Integrity Protocol (AIP)

Quick Start

Why AIP?

How It Works

Integrity Checkpoint

Conscience Prompt

Provider Adapters

What AIP Does Not Do

Installation

API Reference

Python

TypeScript

Documentation

Examples

Status

Standards & Compliance

Contributing

License

About

Uh oh!

Releases 12

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Integrity Protocol (AIP)

Quick Start

Why AIP?

How It Works

Integrity Checkpoint

Conscience Prompt

Provider Adapters

What AIP Does Not Do

Installation

API Reference

Python

TypeScript

Documentation

Examples

Status

Standards & Compliance

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages