Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions docs/adapter-designs/DESIGN-01.md
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,7 @@ means no observed violation. It does not mean no violation occurred.
| Surface | Value | Notes |
|---|---|---|
| contextInjectionSurface | `rules_file` + `agents_md` | .cursor/rules/*.mdc + AGENTS.md |
| eventSurface | `hook_rich` | 18 hook events including subagentStart/Stop |
| eventSurface | `hook_rich` | ~21 hook events including subagentStart/Stop; includes Tab hooks and workspaceOpen |
| controlSurface | `tool_level` | failClosed: true blocks on hook failure |
| approvalSurface | `pre_tool` + `pre_session` | AWF intake gate + per-tool via hooks |
| artifactSurface | `git_diff` + `pr_url` + `test_results` + `handoff_note` | Cloud agent REST API artifacts |
Expand All @@ -361,15 +361,21 @@ means no observed violation. It does not mean no violation occurred.
| sandboxSurface | `sandbox_mode` + `worktree` | --sandbox + --worktree flags |
| trustSubjectType | `agent` | .cursor/agents/ definitions |
| enforcementDepth | `tool_level` | Event-rich, comparable to Claude Code |
| currentEvidenceStrength | E1 | Artifact level until hooks wired to AWF |
| targetEvidenceStrength | E3 | Local hooks / E2 cloud agents (SSE) |
| currentEvidenceStrength | per-mode (see below) | Mode A local: E1 until hooks wired. Mode B cloud: E1 before adapter / E2 after Gate 2 SSE verification. Mode C self-hosted: E1. Mode D human: E1. |
| targetEvidenceStrength | per-mode (see below) | Mode A local: E3 (conditions in DESIGN-06 §8). Mode B cloud: E2 hard ceiling. Mode C self-hosted: E3 pending worker hook test. Mode D human: E3-observed not E3-enforced. |
| requiresRuntimeHookInstall | true | AWF hooks deployed to .cursor/hooks.json |
| supportsFailClosed | true | failClosed: true blocks on hook failure |
| supportsFailClosed | true — requires failClosed: true per blocking hook entry. Default is fail-open. E3 claims require Cursor build >= 2026-05-14 patched release. | See DESIGN-06 §8 |

**Notes:**
- Cloud agent outbound webhooks: "coming soon." Use SSE stream as substitute.
- Cursor is classified event-rich/controlled. Closer to Claude Code than Codex.

**Known gaps:**
- Fail-closed requires explicit per-hook opt-in; default is fail-open.
- E3 claims require Cursor build >= 2026-05-14 patch.
- SDK has no canUseTool-equivalent; hooks are the only synchronous enforcement path.
- Enterprise audit log is admin-action only, not a tool-call audit surface.

---

### Devin
Expand Down
27 changes: 17 additions & 10 deletions docs/adapter-designs/DESIGN-02.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ OpenClaw (sub-agents via ACP)

**subject_key pattern:**
```
{runtime}::subagent::{parent_agent_id}::{task_class}::{session_id}
{runtime}::subagent::{parent_agent_id}::{task_class}::{workspace}::{repo}::{session_id}
```

**Examples:**
Expand Down Expand Up @@ -148,7 +148,7 @@ No internal agent decomposition is visible to AWF.

**subject_key pattern:**
```
{runtime}::session::{workspace}::{task_type}::{playbook_or_config}
{runtime}::session::{workspace}::{repo}::{task_type}::{playbook_or_config}
```

**Examples:**
Expand Down Expand Up @@ -215,7 +215,7 @@ not an autonomous agent's behavior.

**subject_key pattern:**
```
{runtime}::human::{human_actor_id}::{workspace}::{task_type}
{runtime}::human::{human_actor_id}::{workspace}::{repo}::{task_type}
```

**Examples:**
Expand Down Expand Up @@ -348,11 +348,11 @@ the same subject_key.
| Trust type | Stable components | Unstable (exclude) |
|---|---|---|
| agent | runtime, "agent", agent_name, workspace | session_id, run_id |
| subagent | runtime, "subagent", parent_agent_id, task_class, session_id | run_id |
| role_profile | runtime, role, workspace, task_type, risk_lane, sandbox_mode, skill_set | session_id |
| session | runtime, "session", workspace, task_type, playbook | session_id, run_id |
| subagent | runtime, "subagent", parent_agent_id, task_class, workspace, repo, session_id | run_id |
| role_profile | runtime, role, workspace, repo, task_type, risk_lane, sandbox_mode, skill_set | session_id |
| session | runtime, "session", workspace, repo, task_type, playbook | session_id, run_id |
| graph_node | "langgraph", graph_id, node_name, workspace | invocation_id |
| human_runtime | runtime, "human", human_actor_id, workspace, task_type | session_id |
| human_runtime | runtime, "human", human_actor_id, workspace, repo, task_type | session_id |
| task | runtime, "task", workspace, task_class, work_item_id | run_id |

---
Expand Down Expand Up @@ -409,7 +409,7 @@ confidence in a STANDARD tier (not enough sessions to be certain).
An agent-srv with HIGH trust on Claude Code starts PROVISIONAL on Codex.

2. **Trust subjects are runtime-scoped.**
`claude_code::agent::agent-srv::ruvoni` and `codex::worker::ruvoni::backend::medium-risk::workspace-write`
`claude_code::agent::agent-srv::ruvoni::family-trip-ai` and `codex::worker::ruvoni::family-trip-ai::backend::medium-risk::workspace-write::backend-skill`
are different trust subjects even if they represent the same "backend agent concept."

3. **Workspace isolation is enforced.**
Expand All @@ -433,6 +433,11 @@ confidence in a STANDARD tier (not enough sessions to be certain).
trust subject was created. It covers agent instruction files, role TOML configs,
Devin playbook content and skill_set composition.

config_hash covers agent/role/playbook definition, allowed tools, skill set,
sandbox mode, and governance-relevant runtime config. Changing allowed tools
or sandbox mode resets the confidence band even if the prompt definition is
unchanged.

**When config_hash changes:**
- The trust_subject_id remains the same (history is preserved)
- confidence_band resets to LOW unless the change is verified cosmetic
Expand All @@ -449,8 +454,10 @@ Devin playbook content and skill_set composition.

## Open Questions

1. Should subagent trust_subject_id accumulate across sessions for named subagents in Claude Code?
Currently specified as session-scoped. Needs decision.
1. OQ-1 RESOLVED: Subagent trust remains session-scoped by default. If a runtime exposes
persistent named subagents with stable definitions and reliable lifecycle events, AWF may
promote them to agent trust subjects in a future adapter version. Until then, subagents are
contributing subjects only, not long-lived primary trust subjects.
2. For Codex role_profile subjects, if the .codex/agents/*.toml file is modified mid-sprint,
should config_hash check fire before or after session start?
3. LangGraph: should graph_node trust_subject_id be per-graph version or per-graph-name?
Expand Down
4 changes: 3 additions & 1 deletion schemas/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,14 @@
Generic, product-agnostic JSON Schemas for the Agentic Workforce Framework.
All schemas are written for **AJV with JSON Schema Draft 2020-12**.

These schemas describe five governance artifacts that any single-workspace
These schemas describe six governance artifacts that any single-workspace
deployment must produce and consume:

- **AgentTaskManifest** the dispatch contract. No manifest = no dispatch.
- **QAVerdict** structured QA result with defect classification and trust impact.
- **FailureRecord** entry in the self-learning failure library (17-class taxonomy).
- **TrustScore** D1-D4 session score plus the 8-dimension long-term profile.
- **TrustSubject** accountable identity AWF scores: agent, subagent, role_profile, session, graph_node, human_runtime, or task (DESIGN-02).

## v1 schemas (current)

Expand All @@ -19,6 +20,7 @@ deployment must produce and consume:
| [`v1/qa-verdict.schema.json`](v1/qa-verdict.schema.json) | `QAVerdict` | Structured QA verdict with per-finding evidence and trust delta |
| [`v1/failure-record.schema.json`](v1/failure-record.schema.json) | `FailureRecord` | 17-class taxonomy, recurrence count, prevention artifacts, agents involved |
| [`v1/trust-score.schema.json`](v1/trust-score.schema.json) | `TrustScore` | D1-D4 session score plus 8-dimension continuous profile, trust tier, confidence band |
| [`v1/trust-subject.schema.json`](v1/trust-subject.schema.json) | `TrustSubject` | Accountable trust subject (agent, subagent, role_profile, session, graph_node, human_runtime, task) with subject_key construction rules, config_hash, archive lifecycle (DESIGN-02) |
| [AgentSpawnSidecar](v1/agent-spawn-sidecar.schema.json) | Hook-readable spawn authorization record. Written by the Orchestrator before Agent tool call. Validated by PreToolUse hook. The enforcement artifact for agent spawn governance. |

> **Schema dependency:** The AgentSpawnSidecar schema is the
Expand Down
117 changes: 117 additions & 0 deletions schemas/v1/trust-subject.schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://github.com/agentic-workforce-framework/schemas/v1/trust-subject.schema.json",
"title": "TrustSubject",
"description": "Accountable trust subject for every unit of agentic work. The trust subject is what AWF actually scores, replacing the V4.3 assumption that every runtime exposes a stable agent identity. The subject_type is determined by what the runtime exposes (agent, subagent, role_profile, session, graph_node, human_runtime, or task), not by what we would prefer it to expose. Trust subjects are runtime-scoped, workspace-isolated, and immutable once created (archive, never delete). See DESIGN-02 (Trust Subject Model) for the full specification.",
"version": "1.0",
"type": "object",
"required": [
"id",
"subject_type",
"runtime_provider",
"subject_key",
"workspace_id",
"created_at"
],
"properties": {
"id": {
"type": "string",
"format": "uuid",
"description": "Immutable UUID primary key. Once assigned, this value never changes. Failure records and capability profiles reference this id. To retire a subject, set archived_at — do not delete."
},
"subject_type": {
"type": "string",
"enum": [
"agent",
"subagent",
"role_profile",
"session",
"graph_node",
"human_runtime",
"task"
],
"description": "Which of the seven trust subject types this row represents. Determined by what the runtime exposes: agent (named agent with stable identity, e.g. Claude Code, Cursor, OpenClaw), subagent (spawned within a session), role_profile (Codex worker/explorer/custom roles), session (Devin, Multica — runtime exposes no internal agent structure), graph_node (LangGraph node/edge with embedded AWF governance), human_runtime (human + runtime pair for IDE-assisted work), task (lowest-granularity fallback when no higher subject type is identifiable)."
},
"runtime_provider": {
"type": "string",
"enum": [
"claude_code",
"codex",
"cursor",
"openclaw",
"devin",
"multica",
"langgraph"
],
"description": "Runtime that produces sessions for this subject. Trust does not transfer across runtimes — the same logical agent on a new runtime starts PROVISIONAL. Implementations may extend this enum when onboarding additional runtimes; doing so does not loosen the per-runtime isolation rule."
},
"subject_key": {
"type": "string",
"pattern": "^[a-z0-9_-]+(::[a-z0-9_-]+)+$",
"maxLength": 200,
"description": "Stable, deterministic identity string for this subject. Same runtime + same config MUST produce the same subject_key. Construction rules: (1) all lowercase; (2) components separated by '::'; (3) no spaces — use hyphens within a component; (4) include only components stable across sessions for this subject_type; (5) maximum 200 characters. Patterns by subject_type — agent: {runtime}::agent::{agent_name}::{workspace}::{repo}; subagent: {runtime}::subagent::{parent_agent_id}::{task_class}::{workspace}::{repo}::{session_id}; role_profile: codex::{role}::{workspace}::{repo}::{task_type}::{risk_lane}::{sandbox_mode}::{skill_set}; session: {runtime}::session::{workspace}::{task_type}::{playbook_or_config}; graph_node: langgraph::{graph_id}::{node_name}::{workspace}; human_runtime: {runtime}::human::{human_actor_id}::{workspace}::{task_type}; task: {runtime}::task::{workspace}::{task_class}::{work_item_id}. Unique per (workspace_id, runtime_provider, subject_key)."
},
"workspace_id": {
"type": "string",
"format": "uuid",
"description": "Workspace this subject belongs to. Trust is workspace-isolated: history accumulated in workspace A does not count toward workspace B, even for the same subject_key string."
},
"repo": {
"type": ["string", "null"],
"description": "Repository scope for this subject. Required for agent and role_profile subject_types when the workspace contains multiple repos — trust must not bleed across repos. Null for subject_types where repo is not part of the key (session, graph_node, task)."
},
"task_type": {
"type": ["string", "null"],
"description": "Task classification this subject is scoped to (e.g. frontend-bugfix, auth-change, database-migration). Part of the subject_key for role_profile, session, and human_runtime types. A subject trusted for one task_type starts PROVISIONAL for a new task_type — see cross-runtime trust rule 4 in DESIGN-02."
},
"risk_lane": {
"type": ["string", "null"],
"description": "Risk lane this subject operates in (e.g. low-risk, medium-risk, high-risk, restricted). Part of the subject_key for role_profile so that the same role at different risk lanes has separate trust history."
},
"sandbox_mode": {
"type": ["string", "null"],
"description": "Sandbox/permission mode the subject runs under (e.g. read-only, workspace-write, full-access). Part of the trust identity for role_profile — a workspace-write role and a read-only role are different subjects even with otherwise identical configuration."
},
"skill_set": {
"type": ["array", "null"],
"items": { "type": "string" },
"description": "Skill bundle composition for this subject (e.g. ['frontend-skill'], ['awf-risk-plan']). Part of the subject_key for role_profile. Changes to skill_set composition are a material configuration change and trigger config_hash recomputation."
},
"human_actor_id": {
"type": ["string", "null"],
"format": "uuid",
"description": "Required and non-null when subject_type is 'human_runtime' — identifies the human driving the IDE-assisted session. Null for all other subject_types. Trust for human_runtime subjects tracks human+runtime productivity patterns and is used for analytics, not autonomy gating."
},
"config_hash": {
"type": ["string", "null"],
"description": "Hash of the underlying configuration (agent instruction file, role TOML, Devin playbook content, skill_set composition) captured at subject creation. Recomputed by the adapter at each session start and compared. When it changes: trust_subject_id remains stable, confidence_band resets to LOW unless the change is verified cosmetic (whitespace, comments, formatting), and a new entry is logged in subject metadata."
},
"subject_version": {
"type": ["string", "null"],
"description": "Human-readable version label for the subject configuration. Bumped for cosmetic changes that keep config_hash unchanged, and also recorded alongside config_hash changes for audit readability."
},
"archived_at": {
"type": ["string", "null"],
"format": "date-time",
"description": "ISO 8601 timestamp when this subject was retired. Subjects are never deleted — archiving preserves history and keeps existing failure records and capability profiles intact. Null while the subject is active."
},
"created_at": {
"type": "string",
"format": "date-time",
"description": "ISO 8601 timestamp when the subject row was created."
}
},
"additionalProperties": false,
"$defs": {
"trustLevel": {
"type": "string",
"enum": ["PROVISIONAL", "RESTRICTED", "STANDARD", "HIGH", "PROBATION"],
"description": "Autonomy tier derived from total D1-D4 score across recent sessions, attached to the trust subject (not to a session). PROVISIONAL (<60): read-only analysis, every task needs human approval — new subjects always start here. RESTRICTED (60-74): draft plans and draft PRs only, human reviews before merge, 3+ sessions to exit. STANDARD (75-89): PR creation allowed after CI, no high-risk files without approval, 5+ sessions at STANDARD to reach HIGH. HIGH (90-100): low-risk task lane, human merge approval still required (Invariant 5 — never waived). PROBATION (any session <40): immediate demotion regardless of prior tier, 3 consecutive clean sessions required to exit. Lives on trust_capability_profiles, not on the trust_subject row itself — defined here for cross-schema reuse."
},
"evidenceStrength": {
"type": "string",
"enum": ["E0", "E1", "E2", "E3"],
"description": "Strength of evidence backing a trust assessment. E0: post-hoc / session-outcome only (Devin, Multica when AWF runs above it). E1: session-outcome plus surfaced telemetry. E2: per-decision evidence from runtime hooks. E3: full per-action evidence including pre-tool and post-tool hook coverage. Lives on trust_capability_profiles — defined here for cross-schema reuse."
}
}
}
Loading