Skip to content

Feat/add AgentCore Policy integration with user identity propagation#97

Open
MichaelBMC wants to merge 9 commits intomainfrom
feat/agentcore-identity-policy
Open

Feat/add AgentCore Policy integration with user identity propagation#97
MichaelBMC wants to merge 9 commits intomainfrom
feat/agentcore-identity-policy

Conversation

@MichaelBMC
Copy link
Copy Markdown
Contributor

feat: add AgentCore Policy integration with user identity propagation

Summary

Adds AgentCore Policy integration to FAST by propagating user identity from frontend JWT through M2M tokens to Cedar policy evaluation at the AgentCore Gateway. This enables fine-grained, user-level access control on Gateway tools — for example, allowing finance users to access billing tools while denying guest users. All 6 agent patterns are updated with identity-aware Gateway authentication, and 4 pre-existing ZIP packager bugs are fixed.

Motivation

AgentCore Policy enables fine-grained access control on Gateway tools by evaluating Cedar policies against user claims (e.g., department, role) in the request token. However, the existing M2M authentication flow used pure machine credentials — the M2M token contained no user identity information. Without user claims in the token, the Policy Engine has nothing to evaluate against.

This PR bridges that gap by:

  • Propagating user identity into M2M tokens via Cognito V3 Pre-Token Lambda and aws_client_metadata
  • Managing the Cedar Policy Engine lifecycle and attaching it to the Gateway via Custom Resource
  • Providing department-based Cedar policies that the Policy Engine evaluates before tool execution

Changes

Infrastructure (infra-cdk/)

New Files

  • lambdas/cedar-policy/index.py: Custom Resource Lambda for Cedar Policy Engine lifecycle (Create, Update, Delete)

    • Three-step process: create Policy Engine → create Cedar Policy → attach to Gateway
    • Uses official boto3 waiters for all operations (policy_engine_active, policy_engine_deleted, policy_active, policy_deleted)
    • Custom polling for Gateway status changes (no official waiter available)
    • Shared _delete_managed_policies helper with stale ID fallback via naming convention
    • Returns same PhysicalResourceId on Update to prevent CloudFormation cleanup Delete
  • lambdas/cedar-policy/requirements.txt: boto3>=1.42.0 dependency

  • lambdas/pretoken-v3/index.py: Cognito V3 Pre-Token Generation Lambda

    • Fires on M2M token generation (Client Credentials flow), skips user login flows
    • Reads verified_user_id from clientMetadata (passed via aws_client_metadata)
    • Injects user_id, department, and role claims into M2M access token
    • Hardcoded group mapping for demo (with instructions on replacing with DynamoDB, directory service, or Cognito AdminGetUser lookup)

Modified Files

  • lib/backend-stack.ts:

    • Added Cedar Policy Custom Resource with Lambda, IAM permissions, and provider
    • Added Gateway role permissions for Policy Engine operations (GetPolicyEngine, AuthorizeAction, PartiallyAuthorizeActions)
    • Cedar policy loaded from gateway/policies/policy.cedar with comment stripping and {{GATEWAY_ARN}} replacement
    • ZIP packager fixes:
      1. Recursive readPythonFiles() to include tools/ subdirectory
      2. Added patterns/utils/ to deployment package
      3. Renamed repo-root tools/ to agentcore_tools/ to avoid conflict with pattern's tools/ directory
      4. Dynamic entry point detection instead of hardcoded basic_agent.py
  • lib/cognito-stack.ts:

    • Added featurePlan: cognito.FeaturePlan.ESSENTIALS (required for V3 triggers)
    • Added Pre-Token Lambda using Code.fromAsset with path import
    • Added Cognito invoke permission and V3 trigger via L1 escape hatch (addPropertyOverride)
    • Fixed domain creation ordering: create without v2 → add branding → update to v2 via L1 escape hatch (resolves "Internal error from downstream service" with newer CDK versions)

Cedar Policy (gateway/)

New Files

  • policies/policy.cedar: Cedar policy file with two versions
    • Version 1 (active): All departments including guest can access tools
    • Version 2 (commented out): Only finance and engineering can access tools (guest denied by Cedar deny-by-default)
    • {{GATEWAY_ARN}} placeholder replaced by CDK at deploy time
    • Comment lines stripped before sending to AgentCore create_policy API

Agent Patterns (patterns/)

Modified Files

  • utils/auth.py:

    • Added get_secret() for Secrets Manager access with explicit exception handling
    • Replaced decorator-based get_gateway_access_token() with direct Cognito /oauth2/token call accepting user_id parameter
    • Passes user_id as aws_client_metadata[verified_user_id] for V3 Pre-Token Lambda enrichment
  • strands-single-agent/tools/gateway.py: Two authentication approaches

    • Approach 1 (active): create_gateway_mcp_client(user_id) with direct Cognito call
    • Approach 2 (commented out): @requires_access_token decorator for pure M2M
    • Includes 5-step switching instructions in module docstring
  • langgraph-single-agent/tools/gateway.py: Async version of above for LangGraph/MultiServerMCPClient

  • agui-strands-agent/tools/gateway.py: Same as strands-single-agent version

  • agui-langgraph-agent/tools/gateway.py: Same as langgraph-single-agent async version

  • strands-single-agent/basic_agent.py: create_gateway_mcp_client()create_gateway_mcp_client(user_id)

  • langgraph-single-agent/langgraph_agent.py: create_langgraph_agent()create_langgraph_agent(user_id: str), passes user_id through

  • agui-strands-agent/agent.py: create_gateway_mcp_client()create_gateway_mcp_client(user_id)

  • agui-langgraph-agent/agent.py:

    • Added ActorAwareLangGraphAgent.__init__ storing _user_id with placeholder graph for newer copilotkit validation
    • create_langgraph_agent()create_langgraph_agent(user_id: str)
  • claude-agent-sdk-single-agent/agent.py: get_gateway_access_token()get_gateway_access_token(user_id)

  • claude-agent-sdk-multi-agent/agent.py: Same as single-agent

  • langgraph-single-agent/requirements.txt: langgraph==1.1.3langgraph>=1.1.5 (fixes ServerInfo import error)

  • agui-langgraph-agent/requirements.txt: copilotkit>=0.1.84, langchain>=1.2.10, langgraph>=1.1.5 (fixes ExecutionInfo import error + copilotkit compatibility)

Documentation

New Files

  • docs/IDENTITY_POLICY.md: Comprehensive guide covering end-to-end identity propagation flow, all components (Cognito ESSENTIALS, Pre-Token Lambda, Cedar Policy, Policy Engine Custom Resource, Gateway Authorizer), Cedar policy syntax guide, two authentication approaches with switching instructions, and customization (group assignment, new claims, VPC mode)

Modified Files

  • CHANGELOG.md: Added entries under [Unreleased] for Added, Changed, and Fixed
  • README.md: Updated Flow 3 architecture description, added new files to project structure (cedar-policy/, pretoken-v3/, policy.cedar, IDENTITY_POLICY.md)
  • docs/DEPLOYMENT.md: Updated NAT Gateway section — required for Approach 1 (direct Cognito call), not required for Approach 2 (AgentCore Identity handles token exchange server-side)
  • docs/GATEWAY.md: Added references to IDENTITY_POLICY.md and RUNTIME_GATEWAY_AUTH.md in Related Documentation
  • docs/RUNTIME_GATEWAY_AUTH.md: Added cross-reference to IDENTITY_POLICY.md for identity propagation
  • docs/AGENT_CONFIGURATION.md: Updated code snippet from create_gateway_mcp_client(access_token) to create_gateway_mcp_client(user_id)
  • infra-cdk/README.md: Updated Cognito Stack description (ESSENTIALS tier, Pre-Token Lambda) and Gateway deployment step (includes Cedar Policy Engine)
  • patterns/strands-single-agent/README.md: Updated Gateway auth description to include identity propagation
  • patterns/langgraph-single-agent/README.md: Same as above
  • patterns/claude-agent-sdk-single-agent/README.md: Same as above
  • patterns/claude-agent-sdk-multi-agent/README.md: Same as above

Security Considerations

Identity Chain Security

  • User identity (user_id) is extracted from the validated JWT sub claim in the Runtime's Session Context, not from the LLM or request payload
  • The aws_client_metadata parameter carries the verified user_id to Cognito, where the V3 Pre-Token Lambda injects claims
  • Cedar policies evaluate these claims at the Gateway before tool execution
  • No user identity data touches the LLM at any point in the authentication flow

Two Authentication Approaches

  • Approach 1 (active): Direct Cognito call — requires outbound HTTPS to Cognito hosted domain (NAT Gateway needed in VPC mode)
  • Approach 2 (commented out): @requires_access_token decorator — AgentCore Identity handles token exchange server-side (no NAT Gateway needed)

Testing

  • CDK Deployment: Stack deploys successfully with Cedar Policy Engine, Pre-Token Lambda, and all supporting resources
  • Strands Agent (Docker): Identity propagation and Gateway tool execution verified
  • Strands Agent (ZIP): Identity propagation and Gateway tool execution verified
  • LangGraph Agent (Docker): Identity propagation and Gateway tool execution verified
  • LangGraph Agent (ZIP): Identity propagation and Gateway tool execution verified with dynamic entry point detection
  • AgUI Strands Agent (Docker): Identity propagation and Gateway tool execution verified
  • AgUI LangGraph Agent (Docker): Identity propagation and Gateway tool execution verified
  • Claude Agent SDK Single (Docker): Identity propagation and Gateway tool execution verified
  • Claude Agent SDK Multi (Docker): Identity propagation and Gateway tool execution verified
  • Cedar Policy V1 (Allow): Guest user can access Gateway tool
  • Cedar Policy V2 (Deny): Guest user denied, agent falls back to Code Interpreter
  • Policy Update (V1→V2): In-place policy update via cdk deploy without recreating Policy Engine
  • Policy Delete with Stale ID: Fallback to naming convention cleanup works correctly
  • Pre-Token Lambda: CloudWatch logs confirm claim injection (department, role, user_id)
  • Frontend: End-to-end flow verified via Amplify-hosted frontend
  • Long-Term Memory: Cross-session fact recall verified after merge with main
  • PII/Credential Scan: No credentials, account IDs, or PII in committed code

Verifying Policy Allow/Deny via Tracing (Optional)

To verify Cedar policy decisions in CloudWatch logs:

  1. Go to AWS Console → Bedrock AgentCore → Runtime
  2. Click on your runtime (e.g., FAST_stack_FASTAgent) from the Runtime resources section
  3. Scroll down to Tracing, click Edit, and toggle Enable tracing to Enable
  4. Go to Bedrock AgentCore → Gateways
  5. Click on your gateway (e.g., FAST-stack-gateway), scroll down to Tracing, click Edit, and toggle Enable tracing to Enable
  6. Run a query from the frontend that triggers a tool call
  7. Go to CloudWatch Console → Log Management → Log groups
  8. Find and click on the aws/spans log group, then click on the default log stream
  9. In the Filter events search box, type policy
  10. Look for the AgentCore.Policy.PartiallyAuthorizeActions span — it contains:
    • aws.agentcore.policy.allowed_tools: tools the user is permitted to use
    • aws.agentcore.policy.denied_tools: tools the user is denied access to
    • aws.agentcore.gateway.policy.mode: should show ENFORCE

Files Changed (31)

Added (4)

  1. gateway/policies/policy.cedar - Cedar policy with V1 (allow) and V2 (deny) versions
  2. infra-cdk/lambdas/cedar-policy/index.py - Custom Resource Lambda for Policy Engine lifecycle
  3. infra-cdk/lambdas/cedar-policy/requirements.txt - boto3 dependency
  4. infra-cdk/lambdas/pretoken-v3/index.py - Cognito V3 Pre-Token Generation Lambda
  5. docs/IDENTITY_POLICY.md - Identity propagation and Cedar policy documentation

Modified (26)

  1. CHANGELOG.md - Added entries under [Unreleased]
  2. README.md - Updated architecture flow and project structure
  3. docs/AGENT_CONFIGURATION.md - Updated gateway client code snippet
  4. docs/DEPLOYMENT.md - Updated NAT Gateway section for identity propagation
  5. docs/GATEWAY.md - Added related documentation references
  6. docs/RUNTIME_GATEWAY_AUTH.md - Added cross-reference to IDENTITY_POLICY.md
  7. infra-cdk/README.md - Updated stack descriptions
  8. infra-cdk/lib/backend-stack.ts - Cedar Policy Custom Resource, Gateway role permissions, ZIP packager fixes
  9. infra-cdk/lib/cognito-stack.ts - ESSENTIALS tier, Pre-Token Lambda, domain ordering fix
  10. patterns/utils/auth.py - Added get_secret(), replaced get_gateway_access_token() with identity-aware version
  11. patterns/strands-single-agent/basic_agent.py - Pass user_id to gateway client
  12. patterns/strands-single-agent/tools/gateway.py - Two auth approaches with identity propagation
  13. patterns/strands-single-agent/README.md - Updated gateway auth description
  14. patterns/langgraph-single-agent/langgraph_agent.py - Pass user_id through agent creation
  15. patterns/langgraph-single-agent/tools/gateway.py - Async two auth approaches with identity propagation
  16. patterns/langgraph-single-agent/requirements.txt - langgraph>=1.1.5
  17. patterns/langgraph-single-agent/README.md - Updated gateway auth description
  18. patterns/agui-strands-agent/agent.py - Pass user_id to gateway client
  19. patterns/agui-strands-agent/tools/gateway.py - Two auth approaches with identity propagation
  20. patterns/agui-langgraph-agent/agent.py - ActorAwareLangGraphAgent with user_id and placeholder graph
  21. patterns/agui-langgraph-agent/tools/gateway.py - Async two auth approaches with identity propagation
  22. patterns/agui-langgraph-agent/requirements.txt - copilotkit, langchain, langgraph version bumps
  23. patterns/claude-agent-sdk-single-agent/agent.py - Pass user_id to get_gateway_access_token
  24. patterns/claude-agent-sdk-single-agent/README.md - Updated gateway auth description
  25. patterns/claude-agent-sdk-multi-agent/agent.py - Pass user_id to get_gateway_access_token
  26. patterns/claude-agent-sdk-multi-agent/README.md - Updated gateway auth description

Key Architectural Decisions

  1. Direct Cognito Call (Approach 1): The @requires_access_token decorator does not support aws_client_metadata, so a direct Cognito /oauth2/token call is required to propagate user identity into M2M tokens.

  2. Custom Resource for Policy Engine: No L1/L2 CDK construct exists for AgentCore Policy Engine or Cedar Policy. A Custom Resource Lambda manages the full lifecycle, following the same pattern as the existing OAuth2 Credential Provider.

  3. Same PhysicalResourceId on Update: The Custom Resource returns the same PhysicalResourceId during updates to prevent CloudFormation from interpreting the change as a resource replacement (which would trigger a cleanup Delete that detaches the Policy Engine from the Gateway).

  4. Two Approaches Preserved: Both authentication approaches are kept in each pattern's tools/gateway.py with clear switching instructions, allowing users to choose based on their needs (identity-aware vs pure M2M).

  5. Cognito Domain Ordering Fix: Newer CDK versions fail when ESSENTIALS tier + managed login v2 + branding are created simultaneously. Fixed by creating domain without v2 first, adding branding, then updating to v2 via L1 escape hatch.

Compliance with FAST Tenets

  • Security: User identity extracted from validated JWT. Cedar policies enforce access control at the Gateway before tool execution.
  • Simplicity: Single cdk deploy sets up the entire identity propagation chain. Policy changes require only editing policy.cedar and redeploying.
  • Adoptability through Documentation: New IDENTITY_POLICY.md covers the full feature with architecture flow, component details, Cedar policy guide, and customization instructions.
  • Vibe Friendly: Two clearly documented approaches with step-by-step switching instructions. AI coding assistants can easily understand and modify the Cedar policy or Pre-Token Lambda.

Additional Notes

  • Terraform: The infra-terraform/ directory is not updated in this PR. The Terraform maintainer can mirror the CDK changes using this PR as reference.
  • VPC Mode: Approach 1 requires NAT Gateway for outbound HTTPS to Cognito hosted domain. Approach 2 does not require NAT Gateway. Updated in docs/DEPLOYMENT.md.
  • ZIP Deployment: Four pre-existing bugs fixed (recursive reader, utils inclusion, agentcore_tools naming, dynamic entry point). All patterns tested with both Docker and ZIP where supported.
  • CDK Version Compatibility: The Cognito domain ordering fix resolves an issue introduced when aws-cdk-lib was bumped from 2.241.0 to 2.243.0 in main.

- Add Cognito V3 Pre-Token Lambda for M2M token claim injection
- Add Cedar Policy Engine lifecycle via Custom Resource Lambda
- Add user identity propagation from frontend JWT to Gateway Cedar policies
- Move Cedar policy to gateway/policies/policy.cedar
- Add direct Cognito token call with aws_client_metadata for user identity
- Update backend-stack.ts with Policy Engine, GatewayRole permissions
- Update cognito-stack.ts with featurePlan ESSENTIALS and V3 trigger
Add user identity propagation from frontend JWT through M2M tokens to
AgentCore Policy Cedar policy evaluation at the Gateway. Includes Cedar
Policy Engine lifecycle management via Custom Resource Lambda, Cognito V3
Pre-Token Generation Lambda for claim injection, and department-based
Cedar policy for fine-grained access control.

All 6 agent patterns updated with two authentication approaches:
- Approach 1 (active): direct Cognito call with aws_client_metadata
- Approach 2 (commented out): @requires_access_token decorator

New documentation: docs/IDENTITY_POLICY.md

Fixed: ZIP packager (recursive reader, utils inclusion, agentcore_tools
naming, dynamic entry point), Cognito domain ordering for newer CDK,
langgraph/copilotkit version bumps.
@MichaelBMC MichaelBMC requested a review from a team April 23, 2026 18:57
@github-actions github-actions Bot added documentation Improvements or additions to documentation backend infrastructure labels Apr 23, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

Latest scan for commit: 2444dd7 | Updated: 2026-04-23 19:23:41 UTC

Security Scan Results

Scan Metadata

  • Project: ASH
  • Scan executed: 2026-04-23T19:22:20+00:00
  • ASH version: 3.2.2

Summary

Scanner Results

The table below shows findings by scanner, with status based on severity thresholds and dependencies:

Column Explanations:

Severity Levels (S/C/H/M/L/I):

  • Suppressed (S): Security findings that have been explicitly suppressed/ignored and don't affect the scanner's pass/fail status
  • Critical (C): The most severe security vulnerabilities requiring immediate remediation (e.g., SQL injection, remote code execution)
  • High (H): Serious security vulnerabilities that should be addressed promptly (e.g., authentication bypasses, privilege escalation)
  • Medium (M): Moderate security risks that should be addressed in normal development cycles (e.g., weak encryption, input validation issues)
  • Low (L): Minor security concerns with limited impact (e.g., information disclosure, weak recommendations)
  • Info (I): Informational findings for awareness with minimal security risk (e.g., code quality suggestions, best practice recommendations)

Other Columns:

  • Time: Duration taken by each scanner to complete its analysis
  • Action: Total number of actionable findings at or above the configured severity threshold that require attention

Scanner Results:

  • PASSED: Scanner found no security issues at or above the configured severity threshold - code is clean for this scanner
  • FAILED: Scanner found security vulnerabilities at or above the threshold that require attention and remediation
  • MISSING: Scanner could not run because required dependencies/tools are not installed or available
  • SKIPPED: Scanner was intentionally disabled or excluded from this scan
  • ERROR: Scanner encountered an execution error and could not complete successfully

Severity Thresholds (Thresh Column):

  • CRITICAL: Only Critical severity findings cause scanner to fail
  • HIGH: High and Critical severity findings cause scanner to fail
  • MEDIUM (MED): Medium, High, and Critical severity findings cause scanner to fail
  • LOW: Low, Medium, High, and Critical severity findings cause scanner to fail
  • ALL: Any finding of any severity level causes scanner to fail

Threshold Source: Values in parentheses indicate where the threshold is configured:

  • (g) = global: Set in the global_settings section of ASH configuration
  • (c) = config: Set in the individual scanner configuration section
  • (s) = scanner: Default threshold built into the scanner itself

Statistics calculation:

  • All statistics are calculated from the final aggregated SARIF report
  • Suppressed findings are counted separately and do not contribute to actionable findings
  • Scanner status is determined by comparing actionable findings to the threshold
Scanner S C H M L I Time Action Result Thresh
bandit 0 0 0 0 0 0 937ms 0 PASSED MED (g)
cdk-nag 0 0 0 0 0 0 34.0s 0 PASSED MED (g)
cfn-nag 0 0 0 0 0 0 6ms 0 PASSED MED (g)
checkov 0 0 0 0 0 0 6.3s 0 PASSED MED (g)
detect-secrets 0 0 0 0 0 0 898ms 0 PASSED MED (g)
grype 0 0 0 0 0 0 41.5s 0 PASSED MED (g)
npm-audit 0 0 0 0 0 0 187ms 0 PASSED MED (g)
opengrep 11 3 0 0 0 0 23.1s 3 FAILED MED (g)
semgrep 0 0 0 0 0 0 <1ms 0 MISSING MED (g)
syft 0 0 0 0 0 0 2.0s 0 PASSED MED (g)

Detailed Findings

Show 3 actionable findings

Finding 1: python.lang.security.audit.logging.logger-credential-leak.python-logger-credential-disclosure

  • Severity: HIGH
  • Scanner: opengrep
  • Rule ID: python.lang.security.audit.logging.logger-credential-leak.python-logger-credential-disclosure
  • Location: patterns/utils/auth.py:161-163

Description:
Detected a python logger call with a potential hardcoded secret "Getting access token for stack: %s, region: %s" being logged. This may lead to secret credentials being exposed. Make sure that the logger is not logging sensitive information.

Code Snippet:

logger.info(
        "Getting access token for stack: %s, region: %s", stack_name, region
    )  # nosemgrep: python.lang.security.audit.logging.logger-credential-leak.python-logger-credential-disclosure

Finding 2: python.lang.security.audit.logging.logger-credential-leak.python-logger-credential-disclosure

  • Severity: HIGH
  • Scanner: opengrep
  • Rule ID: python.lang.security.audit.logging.logger-credential-leak.python-logger-credential-disclosure
  • Location: patterns/utils/auth.py:196-198

Description:
Detected a python logger call with a potential hardcoded secret "Requesting token from: %s" being logged. This may lead to secret credentials being exposed. Make sure that the logger is not logging sensitive information.

Code Snippet:

logger.info(
        "Requesting token from: %s", token_url
    )  # nosemgrep: python.lang.security.audit.logging.logger-credential-leak.python-logger-credential-disclosure

Finding 3: python.lang.security.audit.logging.logger-credential-leak.python-logger-credential-disclosure

  • Severity: HIGH
  • Scanner: opengrep
  • Rule ID: python.lang.security.audit.logging.logger-credential-leak.python-logger-credential-disclosure
  • Location: patterns/utils/auth.py:205-207

Description:
Detected a python logger call with a potential hardcoded secret "Token request failed: %s" being logged. This may lead to secret credentials being exposed. Make sure that the logger is not logging sensitive information.

Code Snippet:

logger.error(
            "Token request failed: %s", response.status_code
        )  # nosemgrep: python.lang.security.audit.logging.logger-credential-leak.python-logger-credential-disclosure

Report generated by Automated Security Helper (ASH) at 2026-04-23T19:22:13+00:00

Comment thread docs/IDENTITY_POLICY.md

This document describes how FAST propagates user identity from the frontend through to AgentCore Gateway Cedar policies, enabling fine-grained, user-level access control on Gateway tools.

## Overview
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love even a little more "explain like I'm 5" content in this overview, maybe just a few more sentences.

Comment thread docs/IDENTITY_POLICY.md
@@ -0,0 +1,226 @@
# Identity Propagation & Cedar Policy Guide

This document describes how FAST propagates user identity from the frontend through to AgentCore Gateway Cedar policies, enabling fine-grained, user-level access control on Gateway tools.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is an obvious question but: are these Cedar policies only designed for user accessing tools? E.g. David is not allowed to use XYZ tool no matter what agent he uses?
Or, are there more capabilities but we're just highlighting/demonstrating that one? I wouldn't mind a few sentences saying exactly everything that can be done with AC Policy even if we don't implement it all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend documentation Improvements or additions to documentation infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants