From 75d0c9f82ac2373c84a886c4817f4e8af158135f Mon Sep 17 00:00:00 2001 From: micheleRP Date: Tue, 21 Apr 2026 15:40:29 -0600 Subject: [PATCH] docs: rewrite AI Gateway for ADP GA release The AI Gateway feature was fully rewritten upstream for the ADP 2026-06-15 GA, and today's pages describe an obsolete architecture (gateway-as-resource, AI Hub / Custom modes, CEL routing rules, MCP aggregation at the gateway, gateway discovery). Replace the page set with a structure matching ai-agents/agents/: - index.adoc (thin landing) - overview.adoc (conceptual, absorbs what-is-ai-gateway.adoc and all deleted-concept aliases) - configure-provider.adoc (primary how-to, grounded in llm_provider.proto and adp-ui form labels) - connect-agent.adoc (client-side how-to) Page aliases redirect every old URL to its natural successor. UI specifics that depend on the upcoming standalone ADP product surface (sign-in URL, IAM model, OIDC endpoints) are marked with // TODO: for the live-environment verification pass. Planning: https://redpandadata.atlassian.net/wiki/spaces/DOC/pages/1862598662 Co-Authored-By: Claude Opus 4.7 (1M context) --- modules/ROOT/nav.adoc | 13 +- .../pages/ai-gateway/admin/setup-guide.adoc | 396 ------- .../builders/connect-your-agent.adoc | 665 ----------- .../builders/discover-gateways.adoc | 310 ----- .../ai-gateway/cel-routing-cookbook.adoc | 953 ---------------- .../pages/ai-gateway/configure-provider.adoc | 255 +++++ .../pages/ai-gateway/connect-agent.adoc | 364 ++++++ .../ai-gateway/gateway-architecture.adoc | 221 ---- .../pages/ai-gateway/gateway-quickstart.adoc | 542 --------- modules/ai-agents/pages/ai-gateway/index.adoc | 5 +- .../ai-gateway/mcp-aggregation-guide.adoc | 1005 ----------------- .../ai-agents/pages/ai-gateway/overview.adoc | 118 ++ .../pages/ai-gateway/what-is-ai-gateway.adoc | 194 ---- 13 files changed, 742 insertions(+), 4299 deletions(-) delete mode 100644 modules/ai-agents/pages/ai-gateway/admin/setup-guide.adoc delete mode 100644 modules/ai-agents/pages/ai-gateway/builders/connect-your-agent.adoc delete mode 100644 modules/ai-agents/pages/ai-gateway/builders/discover-gateways.adoc delete mode 100644 modules/ai-agents/pages/ai-gateway/cel-routing-cookbook.adoc create mode 100644 modules/ai-agents/pages/ai-gateway/configure-provider.adoc create mode 100644 modules/ai-agents/pages/ai-gateway/connect-agent.adoc delete mode 100644 modules/ai-agents/pages/ai-gateway/gateway-architecture.adoc delete mode 100644 modules/ai-agents/pages/ai-gateway/gateway-quickstart.adoc delete mode 100644 modules/ai-agents/pages/ai-gateway/mcp-aggregation-guide.adoc create mode 100644 modules/ai-agents/pages/ai-gateway/overview.adoc delete mode 100644 modules/ai-agents/pages/ai-gateway/what-is-ai-gateway.adoc diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 51ea42248..5c9a49763 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -64,16 +64,9 @@ *** xref:ai-agents:observability/transcripts.adoc[View Transcripts] *** xref:ai-agents:observability/ingest-custom-traces.adoc[Ingest Traces from Custom Agents] ** xref:ai-agents:ai-gateway/index.adoc[AI Gateway] -*** xref:ai-agents:ai-gateway/what-is-ai-gateway.adoc[Overview] -*** xref:ai-agents:ai-gateway/gateway-quickstart.adoc[Quickstart] -*** xref:ai-agents:ai-gateway/gateway-architecture.adoc[Architecture] -*** For Administrators -**** xref:ai-agents:ai-gateway/admin/setup-guide.adoc[Setup Guide] -*** For Builders -**** xref:ai-agents:ai-gateway/builders/discover-gateways.adoc[Discover Gateways] -**** xref:ai-agents:ai-gateway/builders/connect-your-agent.adoc[Connect Your Agent] -**** xref:ai-agents:ai-gateway/cel-routing-cookbook.adoc[CEL Routing Patterns] -**** xref:ai-agents:ai-gateway/mcp-aggregation-guide.adoc[MCP Gateway] +*** xref:ai-agents:ai-gateway/overview.adoc[Overview] +*** xref:ai-agents:ai-gateway/configure-provider.adoc[Configure an LLM Provider] +*** xref:ai-agents:ai-gateway/connect-agent.adoc[Connect Your Agent] //*** Observability //**** xref:ai-agents:ai-gateway/observability-logs.adoc[Request Logs] //**** xref:ai-agents:ai-gateway/observability-metrics.adoc[Metrics and Analytics] diff --git a/modules/ai-agents/pages/ai-gateway/admin/setup-guide.adoc b/modules/ai-agents/pages/ai-gateway/admin/setup-guide.adoc deleted file mode 100644 index 15acbcd86..000000000 --- a/modules/ai-agents/pages/ai-gateway/admin/setup-guide.adoc +++ /dev/null @@ -1,396 +0,0 @@ -= AI Gateway Setup Guide -:description: Set up AI Gateway for your organization. Enable providers, configure failover for high availability, set budget controls, and create gateways with team-level isolation. -:page-topic-type: how-to -:personas: platform_admin -:learning-objective-1: Enable LLM providers and models in the catalog -:learning-objective-2: Create and configure gateways with routing policies, rate limits, and spend limits -:learning-objective-3: Set up MCP tool aggregation for AI agents - -include::ai-agents:partial$adp-la.adoc[] - -This guide walks administrators through the setup process for AI Gateway, from enabling LLM providers to configuring routing policies and MCP tool aggregation. - -After completing this guide, you will be able to: - -* [ ] Enable LLM providers and models in the catalog -* [ ] Create and configure gateways with routing policies, rate limits, and spend limits -* [ ] Set up MCP tool aggregation for AI agents - -== Prerequisites - -* Access to the Redpanda Cloud Console with administrator privileges -* API keys for at least one LLM provider (OpenAI, Anthropic, Google AI) -* (Optional) MCP server endpoints if you plan to use tool aggregation - -== Enable a provider - -Providers represent upstream services (Anthropic, OpenAI, Google AI) and associated credentials. Providers are disabled by default and must be enabled explicitly by an administrator. - -. In the Redpanda Cloud Console, navigate to *Agentic* → *AI Gateway* → *Providers*. -. Select a provider (for example, Anthropic). -. On the Configuration tab for the provider, click *Add configuration*. -. Enter your API Key for the provider. -+ -TIP: Store provider API keys securely. Each provider configuration can have multiple API keys for rotation and redundancy. - -. Click *Save* to enable the provider. - -Repeat this process for each LLM provider you want to make available through AI Gateway. - -== Enable models - -The model catalog is the set of models made available through the gateway. Models are disabled by default. After enabling a provider, you can enable its models. - -The infrastructure that serves the model differs based on the provider you select. For example, OpenAI has different reliability and availability metrics than Anthropic. When you consider all metrics, you can design your gateway to use different providers for different use cases. - -. Navigate to *Agentic* → *AI Gateway* → *Models*. -. Review the list of available models from enabled providers. -. For each model you want to expose through gateways, toggle it to *Enabled*. For example: -+ --- -* `openai/gpt-5.2` -* `openai/gpt-5.2-mini` -* `anthropic/claude-sonnet-4.5` -* `anthropic/claude-opus-4.6` --- - -. Click *Save changes*. - -Only enabled models will be accessible through gateways. You can enable or disable models at any time without affecting existing gateways. - -=== Model naming convention - -Model requests must use the `vendor/model_id` format in the model property of the request body. This format allows AI Gateway to route requests to the appropriate provider. For example: - -* `openai/gpt-5.2` -* `anthropic/claude-sonnet-4.5` -* `openai/gpt-5.2-mini` - -ifdef::ai-hub-available[] -== Choose a gateway mode - -Before creating a gateway, decide which mode fits your needs. - -*AI Hub Mode* is ideal when you: - -* Want to minimize configuration complexity -* Need to quickly enable LLM access for multiple teams -* Want pre-configured intelligent routing with automatic provider failover -* Are satisfied with managed routing rules and backend pools (17 pre-configured rules) -* Need only basic customization (provider credentials, 6 preference toggles) -* Use OpenAI and/or Anthropic providers - -*Custom Mode* is ideal when you: - -* Need custom routing rules based on specific business logic -* Require full control over backend pool configuration -* Want to implement custom failover strategies -* Need to integrate with custom infrastructure (Azure OpenAI, AWS Bedrock, other providers) -* Have specialized requirements not covered by AI Hub's pre-configured rules - -[TIP] -==== -You can start with AI Hub mode and later eject to Custom mode if you need more control. Ejection is a one-way transition. See xref:ai-gateway/admin/eject-to-custom-mode.adoc[]. -==== - -For detailed comparison, see xref:ai-gateway/gateway-modes.adoc[]. - -*Next sections:* - -* *AI Hub Mode*: See xref:ai-gateway/admin/configure-ai-hub.adoc[] for setup instructions -* *Custom Mode*: Continue with "Create a gateway" below for manual configuration -endif::[] - -== Create a gateway - -A gateway is a logical configuration boundary (policies + routing + observability) on top of a single deployment. It's a "virtual gateway" that you can create per team, environment (staging/production), product, or customer. - -. Navigate to *Agentic* → *AI Gateway* → *Gateways*. -. Click *Create Gateway*. -. Configure the gateway: -+ --- -* *Name*: Choose a descriptive name (for example, `production-gateway`, `team-ml-gateway`, `staging-gateway`) -* *Workspace*: Select the workspace this gateway belongs to -+ -TIP: A workspace is conceptually similar to a resource group in Redpanda streaming. -+ -* *Description* (optional): Add context about this gateway's purpose -* *Tags* (optional): Add metadata for organization and filtering --- - -. Click *Create*. - -. After creation, note the following information: -+ --- -* *Gateway endpoint*: URL for API requests (for example, `https://example/gateways/d633lffcc16s73ct95mg/v1`) -+ -The gateway ID is embedded in the URL. --- - -You'll share the gateway endpoint with users who need to access this gateway. - -== Configure LLM routing - -On the gateway details page, select the *LLM* tab to configure rate limits, spend limits, routing, and provider pools with fallback options. - -The LLM routing pipeline visually represents the request lifecycle: - -. *Rate Limit*: Global rate limit (for example, 100 requests/second) -. *Spend Limit / Monthly Budget*: Monthly budget with blocking enforcement (for example, $15K/month) -. *Routing*: Primary provider pool with optional fallback provider pools - -=== Configure rate limits - -Rate limits control how many requests can be processed within a time window. - -. In the *LLM* tab, locate the *Rate Limit* section. -. Click *Add rate limit*. -. Configure the limit: -+ --- -* *Requests per second*: Maximum requests per second (for example, `100`) -* *Burst allowance* (optional): Allow temporary bursts above the limit --- - -. Click *Save*. - -Rate limits apply to all requests through this gateway, regardless of model or provider. - -=== Configure spend limits and budgets - -Spend limits prevent runaway costs by blocking requests after a monthly budget is exceeded. - -. In the *LLM* tab, locate the *Spend Limit* section. -. Click *Configure budget*. -. Set the budget: -+ --- -* *Monthly budget*: Maximum spend per month (for example, `$15000`) -* *Enforcement*: Choose *Block* to reject requests after the budget is exceeded, or *Alert* to notify but allow requests -* *Notification threshold* (optional): Alert when X% of budget is consumed (for example, `80%`) --- - -. Click *Save*. - -Budget tracking uses estimated costs based on token usage and public provider pricing. - -=== Configure routing and provider pools - -Provider pools define which LLM providers handle requests, with support for primary and fallback configurations. - -. In the *LLM* tab, locate the *Routing* section. -. Click *Add provider pool*. -. Configure the primary pool: -+ --- -* *Name*: For example, `primary-anthropic` -* *Providers*: Select one or more providers (for example, Anthropic) -* *Models*: Choose which models to include (for example, `anthropic/claude-sonnet-4.5`) -* *Load balancing*: If multiple providers are selected, choose distribution strategy (round-robin, weighted, etc.) --- - -. (Optional) Click *Add fallback pool* to configure automatic failover: -+ --- -* *Name*: For example, `fallback-openai` -* *Providers*: Select fallback provider (for example, OpenAI) -* *Models*: Choose fallback models (for example, `openai/gpt-5.2`) -* *Trigger conditions*: When to activate fallback: - ** Rate limit exceeded (429 from primary) - ** Timeout (primary provider slow) - ** Server errors (5xx from primary) --- - -. Configure routing rules using CEL expressions (optional): -+ -For simple routing, select *Route all requests to primary pool*. -+ -For advanced routing based on request properties, use CEL expressions. See xref:ai-gateway/cel-routing-cookbook.adoc[] for examples. -+ -Example CEL expression for tier-based routing: -+ -[source,cel] ----- -request.headers["x-user-tier"] == "premium" - ? "anthropic/claude-opus-4.6" - : "anthropic/claude-sonnet-4.5" ----- - -. Click *Save routing configuration*. - -TIP: Provider pool (UI) = Backend pool (API) - -=== Load balancing and multi-provider distribution - -If a provider pool contains multiple providers, you can distribute traffic to balance load or optimize for cost/performance: - -* Round-robin: Distribute evenly across all providers -* Weighted: Assign weights (for example, 80% to Anthropic, 20% to OpenAI) -* Least latency: Route to fastest provider based on recent performance -* Cost-optimized: Route to cheapest provider for each model - -== Configure MCP tools (optional) - -If your users will build glossterm:AI agent[,AI agents] that need access to glossterm:MCP tool[,tools] via glossterm:MCP[,Model Context Protocol (MCP)], configure MCP tool aggregation. - -On the gateway details page, select the *MCP* tab to configure tool discovery and execution. The MCP proxy aggregates multiple glossterm:MCP server[,MCP servers], allowing agents to find and call tools through a single endpoint. - -=== Configure MCP rate limits - -Rate limits for MCP work the same way as LLM rate limits. - -. In the *MCP* tab, locate the *Rate Limit* section. -. Click *Add rate limit*. -. Configure the maximum requests per second and optional burst allowance. -. Click *Save*. - -=== Add MCP servers - -. In the *MCP* tab, click *Create MCP Server*. -. Configure the server: -+ --- -* *Server ID*: Unique identifier for this server -* *Display Name*: Human-readable name (for example, `database-server`, `slack-server`) -* *Server Address*: Endpoint URL for the MCP server (for example, `https://mcp-database.example.com`) --- - -. Configure server settings: -+ --- -* *Timeout (seconds)*: Maximum time to wait for a response from this server -* *Enabled*: Whether this server is active and accepting requests -* *Defer Loading Override*: Controls whether tools from this server are loaded upfront or on demand -+ -[cols="1,2"] -|=== -|Option |Description - -|Inherit from gateway -|Use the gateway-level deferred loading setting (default) - -|Enabled -|Always defer loading from this server. Agents receive only a search tool initially and query for specific tools when needed. This can reduce token usage by 80-90%. - -|Disabled -|Always load all tools from this server upfront. -|=== - -* *Forward OIDC Token Override*: Controls whether the client's OIDC token is forwarded to this MCP server -+ -[cols="1,2"] -|=== -|Option |Description - -|Inherit from gateway -|Use the gateway-level OIDC forwarding setting (default) - -|Enabled -|Always forward the OIDC token to this server - -|Disabled -|Never forward the OIDC token to this server -|=== --- - -. Click *Save* to add the server to this gateway. - -Repeat for each MCP server you want to aggregate. - -See xref:ai-gateway/mcp-aggregation-guide.adoc[] for detailed information about MCP aggregation. - -=== Configure the MCP orchestrator - -The MCP orchestrator is a built-in MCP server that enables programmatic tool calling. Agents can generate JavaScript code to call multiple tools in a single orchestrated step, reducing the number of round trips. - -Example: A workflow requiring 47 file reads can be reduced from 49 round trips to just 1 round trip using the orchestrator. - -The orchestrator is pre-configured when you initialize the MCP gateway. Its server configuration (Server ID, Display Name, Transport, Command, and Timeout) is system-managed and cannot be modified. - -You can configure blocked tool patterns to prevent specific tools from being called through the orchestrator: - -. In the *MCP* tab, select the orchestrator server to edit it. -. Under *Blocked Tools*, click *Add Pattern* to add glob patterns for tools that should be blocked from execution. -+ -Example patterns: -+ --- -* `server_id:*` - Block all tools from a specific server -* `*:dangerous_tool` - Block a specific tool across all servers -* `specific:tool` - Block a single tool on a specific server --- -+ -NOTE: The orchestrator's own tools are blocked by default to prevent recursive execution. - -. Click *Save*. - -== Verify your setup - -After completing the setup, verify that the gateway is working correctly: - -=== Test the gateway endpoint - -[source,bash] ----- -curl ${GATEWAY_ENDPOINT}/models \ - -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" ----- - -Expected result: List of enabled models. - -=== Send a test request - -[source,bash] ----- -curl ${GATEWAY_ENDPOINT}/chat/completions \ - -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \ - -H "Content-Type: application/json" \ - -d '{ - "model": "openai/gpt-5.2-mini", - "messages": [{"role": "user", "content": "Hello, AI Gateway!"}], - "max_tokens": 50 - }' ----- - -Expected result: Successful completion response. - -=== Check the gateway overview - -. Navigate to *Gateways* → Select your gateway → *Overview*. -. Check the aggregate metrics to verify your test request was processed: -+ --- -* Total Requests: Should have incremented -* Total Tokens: Should show tokens consumed -* Total Cost: Should show estimated cost --- - -== Share access with users - -Now that your gateway is configured, share access with users (builders): - -. Provide the *Gateway Endpoint* (for example, `https://example/gateways/gw_abc123/v1`) -. Share API credentials (Redpanda Cloud tokens with appropriate permissions) -. (Optional) Document available models and any routing policies -. (Optional) Share rate limits and budget information - -Users can then discover and connect to the gateway using the information provided. See xref:ai-gateway/builders/discover-gateways.adoc[] for user documentation. - -== Next steps - -*Configure and optimize:* - -// * xref:ai-gateway/admin/manage-gateways.adoc[Manage Gateways] - List, edit, and delete gateways -* xref:ai-gateway/cel-routing-cookbook.adoc[CEL Routing Cookbook] - Advanced routing patterns -// * xref:ai-gateway/admin/networking-configuration.adoc[Networking Configuration] - Configure private endpoints and connectivity - -//*Monitor and observe:* -// - -ifdef::integrations-available[] -*Integrate tools:* - -* xref:ai-gateway/integrations/index.adoc[Integrations] - Admin guides for Claude Code, Cursor, and other tools -endif::[] diff --git a/modules/ai-agents/pages/ai-gateway/builders/connect-your-agent.adoc b/modules/ai-agents/pages/ai-gateway/builders/connect-your-agent.adoc deleted file mode 100644 index 37b435958..000000000 --- a/modules/ai-agents/pages/ai-gateway/builders/connect-your-agent.adoc +++ /dev/null @@ -1,665 +0,0 @@ -= Connect Your Agent -:description: Integrate your AI agent or application with Redpanda Agentic Data Plan for unified LLM access. -:page-topic-type: how-to -:personas: app_developer -:learning-objective-1: Configure your application to use AI Gateway with OpenAI-compatible SDKs -:learning-objective-2: Make LLM requests through the gateway and handle responses appropriately -:learning-objective-3: Validate your integration end-to-end - -include::ai-agents:partial$adp-la.adoc[] - -This guide shows you how to connect your glossterm:AI agent[] or application to Redpanda Agentic Data Plan. This is also called "Bring Your Own Agent" (BYOA). You'll configure your client SDK, make your first request, and validate the integration. - -After completing this guide, you will be able to: - -* [ ] Configure your application to use AI Gateway with OpenAI-compatible SDKs -* [ ] Make LLM requests through the gateway and handle responses appropriately -* [ ] Validate your integration end-to-end - -== Prerequisites - -* You have discovered an available gateway and noted its Gateway ID and endpoint. -+ -If not, see xref:ai-gateway/builders/discover-gateways.adoc[]. - -* You have a service account with OIDC client credentials. See xref:security:cloud-authentication.adoc[]. -* You have a development environment with your chosen programming language. - -== Integration overview - -Connecting to AI Gateway requires two configuration changes: - -. *Change the base URL*: Point to the gateway endpoint instead of the provider's API. The gateway ID is embedded in the endpoint URL. -. *Add authentication*: Use an OIDC access token from your service account instead of provider API keys. - -[[authenticate-with-oidc]] -== Authenticate with OIDC - -AI Gateway uses OIDC through service accounts that can be used as a `client_credentials` grant to authenticate and exchange for access and ID tokens. - -=== Create a service account - -. In the Redpanda Cloud UI, go to https://cloud.redpanda.com/organization-iam?tab=service-accounts[*Organization IAM* > *Service account*^]. -. Create a new service account and note the *Client ID* and *Client Secret*. - -For details, see xref:security:cloud-authentication.adoc#authenticate-to-the-cloud-api[Authenticate to the Cloud API]. - -=== Configure your OIDC client - -Use the following OIDC configuration: - -[cols="1,2", options="header"] -|=== -|Parameter |Value - -|Discovery URL -|`\https://auth.prd.cloud.redpanda.com/.well-known/openid-configuration` - -|Token endpoint -|`\https://auth.prd.cloud.redpanda.com/oauth/token` - -|Audience -|`cloudv2-production.redpanda.cloud` - -|Grant type -|`client_credentials` -|=== - -The discovery URL returns OIDC metadata, including the token endpoint and other configuration details. Use an OIDC client library that supports metadata discovery (such as `openid-client` for Node.js) so that endpoints are resolved automatically. If your library does not support discovery, you can fetch the discovery URL directly and extract the required endpoints from the JSON response. - -[tabs] -==== -cURL:: -+ --- -[source,bash] ----- -AUTH_TOKEN=$(curl -s --request POST \ - --url 'https://auth.prd.cloud.redpanda.com/oauth/token' \ - --header 'content-type: application/x-www-form-urlencoded' \ - --data grant_type=client_credentials \ - --data client_id= \ - --data client_secret= \ - --data audience=cloudv2-production.redpanda.cloud | jq -r .access_token) ----- - -Replace `` and `` with your service account credentials. --- - -Python (authlib):: -+ --- -[source,python] ----- -from authlib.integrations.requests_client import OAuth2Session - -client = OAuth2Session( - client_id="", - client_secret="", -) - -# Discover token endpoint from OIDC metadata -import requests -metadata = requests.get( - "https://auth.prd.cloud.redpanda.com/.well-known/openid-configuration" -).json() -token_endpoint = metadata["token_endpoint"] - -token = client.fetch_token( - token_endpoint, - grant_type="client_credentials", - audience="cloudv2-production.redpanda.cloud", -) - -access_token = token["access_token"] ----- - -This example performs a one-time token fetch. For automatic token renewal on subsequent requests, pass `token_endpoint` to the `OAuth2Session` constructor. Note that for `client_credentials` grants, `authlib` obtains a new token rather than using a refresh token. --- - -Node.js (openid-client):: -+ -[source,javascript] ----- -import { Issuer } from 'openid-client'; - -const issuer = await Issuer.discover( - 'https://auth.prd.cloud.redpanda.com' -); - -const client = new issuer.Client({ - client_id: '', - client_secret: '', -}); - -const tokenSet = await client.grant({ - grant_type: 'client_credentials', - audience: 'cloudv2-production.redpanda.cloud', -}); - -const accessToken = tokenSet.access_token; ----- -==== - -=== Make authenticated requests - -Requests require two headers: - -* `Authorization: Bearer ` - your OIDC access token -* `rp-aigw-id: ` - your AI Gateway ID - -Set these environment variables for consistent configuration: - -[source,bash] ----- -export REDPANDA_GATEWAY_URL="" -export REDPANDA_GATEWAY_ID="" ----- - -[tabs] -==== -Python (OpenAI SDK):: -+ -[source,python] ----- -import os -from openai import OpenAI - -# Configure client to use AI Gateway with OIDC token -client = OpenAI( - base_url=os.getenv("REDPANDA_GATEWAY_URL"), - api_key=access_token, # OIDC access token from Step 2 -) - -# Make a request -response = client.chat.completions.create( - model="openai/gpt-5.2-mini", # Note: vendor/model_id format - messages=[{"role": "user", "content": "Hello, AI Gateway!"}], - max_tokens=100 -) - -print(response.choices[0].message.content) ----- - -Python (Anthropic SDK):: -+ -The Anthropic SDK can also route through AI Gateway using the OpenAI-compatible endpoint: -+ -[source,python] ----- -import os -from anthropic import Anthropic - -client = Anthropic( - base_url=os.getenv("REDPANDA_GATEWAY_URL"), - api_key=access_token, # OIDC access token from Step 2 -) - -# Make a request -message = client.messages.create( - model="anthropic/claude-sonnet-4.5", - max_tokens=100, - messages=[{"role": "user", "content": "Hello, AI Gateway!"}] -) - -print(message.content[0].text) ----- - -Node.js (OpenAI SDK):: -+ -[source,javascript] ----- -import OpenAI from 'openai'; - -const openai = new OpenAI({ - baseURL: process.env.REDPANDA_GATEWAY_URL, - apiKey: accessToken, // OIDC access token from Step 2 -}); - -// Make a request -const response = await openai.chat.completions.create({ - model: 'openai/gpt-5.2-mini', - messages: [{ role: 'user', content: 'Hello, AI Gateway!' }], - max_tokens: 100 -}); - -console.log(response.choices[0].message.content); ----- - -cURL:: -+ -[source,bash] ----- -curl ${REDPANDA_GATEWAY_URL}/chat/completions \ - -H "Authorization: Bearer ${AUTH_TOKEN}" \ - -H "Content-Type: application/json" \ - -H "rp-aigw-id: ${REDPANDA_GATEWAY_ID}" \ - -d '{ - "model": "openai/gpt-5.2-mini", - "messages": [{"role": "user", "content": "Hello, AI Gateway!"}], - "max_tokens": 100 - }' ----- -==== - -=== Token lifecycle management - -IMPORTANT: Your agent is responsible for refreshing tokens before they expire. OIDC access tokens have a limited time-to-live (TTL), determined by the identity provider, and are not automatically renewed by the AI Gateway. Check the `expires_in` field in the token response for the exact duration. - -* Proactively refresh tokens at approximately 80% of the token's TTL to avoid failed requests. -* `authlib` (Python) can handle token renewal automatically when you pass `token_endpoint` to the `OAuth2Session` constructor. For `client_credentials` grants, it obtains a new token rather than using a refresh token. -* For other languages, cache the token and its expiry time, then request a new token before the current one expires. - -== Model naming convention - -When making requests through AI Gateway, use the `vendor/model_id` format for the model parameter: - -* `openai/gpt-5.2` -* `openai/gpt-5.2-mini` -* `anthropic/claude-sonnet-4.5` -* `anthropic/claude-opus-4.6` - -This format tells AI Gateway which provider to route the request to. For example: - -[source,python] ----- -# Route to OpenAI -response = client.chat.completions.create( - model="openai/gpt-5.2", - messages=[...] -) - -# Route to Anthropic (same client, different model) -response = client.chat.completions.create( - model="anthropic/claude-sonnet-4.5", - messages=[...] -) ----- - -// To see which models are available in your gateway, see xref:ai-gateway/builders/available-models.adoc[]. - -== Handle responses - -Responses from AI Gateway follow the OpenAI API format: - -[source,python] ----- -response = client.chat.completions.create( - model="openai/gpt-5.2-mini", - messages=[{"role": "user", "content": "Explain AI Gateway"}], - max_tokens=200 -) - -# Access the response -message_content = response.choices[0].message.content -finish_reason = response.choices[0].finish_reason # 'stop', 'length', etc. - -# Token usage -prompt_tokens = response.usage.prompt_tokens -completion_tokens = response.usage.completion_tokens -total_tokens = response.usage.total_tokens - -print(f"Response: {message_content}") -print(f"Tokens: {prompt_tokens} prompt + {completion_tokens} completion = {total_tokens} total") ----- - -== Handle errors - -AI Gateway returns standard HTTP status codes: - -[source,python] ----- -from openai import OpenAI, OpenAIError - -client = OpenAI( - base_url=os.getenv("REDPANDA_GATEWAY_URL"), - api_key=access_token, # OIDC access token -) - -try: - response = client.chat.completions.create( - model="openai/gpt-5.2-mini", - messages=[{"role": "user", "content": "Hello"}] - ) - print(response.choices[0].message.content) - -except OpenAIError as e: - if e.status_code == 400: - print("Bad request - check model name and parameters") - elif e.status_code == 401: - print("Authentication failed - check OIDC token") - elif e.status_code == 404: - print("Model not found - check available models") - elif e.status_code == 429: - print("Rate limit exceeded - slow down requests") - elif e.status_code >= 500: - print("Gateway or provider error - retry with exponential backoff") - else: - print(f"Error: {e}") ----- - -Common error codes: - -* *400*: Bad request (invalid parameters, malformed JSON) -* *401*: Authentication failed (invalid or expired OIDC token) -* *403*: Forbidden (no access to this gateway) -* *404*: Model not found (model not enabled in gateway) -* *429*: Rate limit exceeded (too many requests) -* *500/502/503*: Server error (gateway or provider issue) - -== Streaming responses - -AI Gateway supports streaming for real-time token generation: - -[source,python] ----- -response = client.chat.completions.create( - model="openai/gpt-5.2-mini", - messages=[{"role": "user", "content": "Write a short poem"}], - stream=True # Enable streaming -) - -# Process chunks as they arrive -for chunk in response: - if chunk.choices[0].delta.content: - print(chunk.choices[0].delta.content, end='', flush=True) - -print() # New line after streaming completes ----- - -== Switch between providers - -One of AI Gateway's key benefits is easy provider switching without code changes: - -[source,python] ----- -# Try OpenAI -response = client.chat.completions.create( - model="openai/gpt-5.2", - messages=[{"role": "user", "content": "Explain quantum computing"}] -) - -# Try Anthropic (same code, different model) -response = client.chat.completions.create( - model="anthropic/claude-sonnet-4.5", - messages=[{"role": "user", "content": "Explain quantum computing"}] -) ----- - -Compare responses, latency, and cost to determine the best model for your use case. - -== Validate your integration - -=== Test connectivity - -[source,python] ----- -import os -from openai import OpenAI - -def test_gateway_connection(access_token): - """Test basic connectivity to AI Gateway""" - client = OpenAI( - base_url=os.getenv("REDPANDA_GATEWAY_URL"), - api_key=access_token, # OIDC access token - ) - - try: - # Simple test request - response = client.chat.completions.create( - model="openai/gpt-5.2-mini", - messages=[{"role": "user", "content": "test"}], - max_tokens=10 - ) - print("✓ Gateway connection successful") - return True - except Exception as e: - print(f"✗ Gateway connection failed: {e}") - return False - -if __name__ == "__main__": - token = get_oidc_token() # Your OIDC token retrieval - test_gateway_connection(token) ----- - -=== Test multiple models - -[source,python] ----- -def test_models(): - """Test multiple models through the gateway""" - models = [ - "openai/gpt-5.2-mini", - "anthropic/claude-sonnet-4.5" - ] - - for model in models: - try: - response = client.chat.completions.create( - model=model, - messages=[{"role": "user", "content": "Say hello"}], - max_tokens=10 - ) - print(f"✓ {model}: {response.choices[0].message.content}") - except Exception as e: - print(f"✗ {model}: {e}") ----- - -// === Check request logs -// -// After making requests, verify they appear in observability: -// -// . Navigate to *AI Gateway* → *Gateways* → Select your gateway → *Logs* -// . Filter by your request timestamp -// . Verify your requests are logged with correct model, tokens, and cost - -// See xref:ai-gateway/builders/monitor-your-usage.adoc[] for details. - -== Integrate with AI development tools - -[tabs] -==== -Claude Code:: -+ -Configure Claude Code to use AI Gateway: -+ -[source,bash] ----- -claude mcp add --transport http redpanda-aigateway ${REDPANDA_GATEWAY_URL}/mcp \ - --header "Authorization: Bearer ${AUTH_TOKEN}" ----- -+ -Or edit `~/.claude/config.json`: -+ -[source,json] ----- -{ - "mcpServers": { - "redpanda-ai-gateway": { - "transport": "http", - "url": "/mcp", - "headers": { - "Authorization": "Bearer " - } - } - } -} ----- -+ -ifdef::integrations-available[] -See xref:ai-gateway/integrations/claude-code-user.adoc[] for complete setup. -endif::[] - -VS Code Continue Extension:: -+ -Edit `~/.continue/config.json`: -+ -[source,json] ----- -{ - "models": [ - { - "title": "AI Gateway - GPT-5.2", - "provider": "openai", - "model": "openai/gpt-5.2", - "apiBase": "", - "apiKey": "" - } - ] -} ----- -+ -ifdef::integrations-available[] -See xref:ai-gateway/integrations/continue-user.adoc[] for complete setup. -endif::[] - -Cursor IDE:: -+ -. Open Cursor Settings (*Cursor* → *Settings* or `Cmd+,`) -. Navigate to *AI* settings -. Add custom OpenAI-compatible provider: -+ -[source,json] ----- -{ - "cursor.ai.providers.openai.apiBase": "" -} ----- -+ -ifdef::integrations-available[] -See xref:ai-gateway/integrations/cursor-user.adoc[] for complete setup. -endif::[] -==== - -== Best practices - -=== Use environment variables - -Store configuration in environment variables, not hardcoded in code: - -[source,python] ----- -# Good -base_url = os.getenv("REDPANDA_GATEWAY_URL") - -# Bad -base_url = "https://gw.ai.panda.com" # Don't hardcode URLs or credentials ----- - -=== Implement retry logic - -Implement exponential backoff for transient errors: - -[source,python] ----- -import time -from openai import OpenAI, OpenAIError - -def make_request_with_retry(client, max_retries=3): - for attempt in range(max_retries): - try: - return client.chat.completions.create( - model="openai/gpt-5.2-mini", - messages=[{"role": "user", "content": "Hello"}] - ) - except OpenAIError as e: - if e.status_code >= 500 and attempt < max_retries - 1: - wait_time = 2 ** attempt # Exponential backoff - print(f"Retrying in {wait_time}s...") - time.sleep(wait_time) - else: - raise ----- - -=== Monitor your usage - -Regularly check your usage to avoid unexpected costs: - -[source,python] ----- -# Track tokens in your application -total_tokens = 0 -request_count = 0 - -for request in requests: - response = client.chat.completions.create(...) - total_tokens += response.usage.total_tokens - request_count += 1 - -print(f"Total tokens: {total_tokens} across {request_count} requests") ----- - -// See xref:ai-gateway/builders/monitor-your-usage.adoc[] for detailed monitoring. - -=== Handle rate limits gracefully - -Respect rate limits and implement backoff: - -[source,python] ----- -try: - response = client.chat.completions.create(...) -except OpenAIError as e: - if e.status_code == 429: - # Rate limited - wait and retry - retry_after = int(e.response.headers.get('Retry-After', 60)) - print(f"Rate limited. Waiting {retry_after}s...") - time.sleep(retry_after) - # Retry request ----- - -== Troubleshooting - -=== "Authentication failed" - -Problem: 401 Unauthorized - -Solutions: - -* Check that your OIDC token has not expired and refresh it if necessary -* Verify the audience is set to `cloudv2-production.redpanda.cloud` -* Check that the service account has access to the specified gateway -* Ensure the `Authorization` header is formatted correctly: `Bearer ` - -=== "Model not found" - -Problem: 404 Model not found - -Solutions: - -* Verify the model name uses `vendor/model_id` format -// * Check available models: See xref:ai-gateway/builders/available-models.adoc[] -* Confirm the model is enabled in your gateway (contact administrator) - -=== "Rate limit exceeded" - -Problem: 429 Too Many Requests - -Solutions: - -* Reduce request rate -* Implement exponential backoff -* Contact administrator to review rate limits -* Consider using a different gateway if available - -=== "Connection timeout" - -Problem: Request times out - -Solutions: - -* Check network connectivity to the gateway endpoint -* Verify the gateway endpoint URL is correct -* Check if the gateway is operational (contact administrator) -* Increase client timeout if processing complex requests - -//== Next steps - -//Now that your agent is connected: - -// * xref:ai-gateway/builders/available-models.adoc[Available Models] - Learn about model selection and routing -// * xref:ai-gateway/builders/use-mcp-tools.adoc[Use MCP Tools] - Access tools from MCP servers (if enabled) -// * xref:ai-gateway/builders/monitor-your-usage.adoc[Monitor Your Usage] - Track requests and costs -ifdef::integrations-available[] -* xref:ai-gateway/integrations/index.adoc[Integrations] - Configure specific tools and IDEs -endif::[] diff --git a/modules/ai-agents/pages/ai-gateway/builders/discover-gateways.adoc b/modules/ai-agents/pages/ai-gateway/builders/discover-gateways.adoc deleted file mode 100644 index e6d612e1b..000000000 --- a/modules/ai-agents/pages/ai-gateway/builders/discover-gateways.adoc +++ /dev/null @@ -1,310 +0,0 @@ -= Discover Available Gateways -:description: Find which AI Gateways you can access and their configurations. -:page-topic-type: how-to -:personas: app_developer -:learning-objective-1: List all AI Gateways you have access to and retrieve their endpoints and IDs -:learning-objective-2: View which models and MCP tools are available through each gateway -:learning-objective-3: Test gateway connectivity before integration - -include::ai-agents:partial$adp-la.adoc[] - -As a builder, you need to know which gateways are available to you before integrating your agent or application. This page shows you how to discover accessible gateways, understand their configurations, and verify connectivity. - -After reading this page, you will be able to: - -* [ ] List all AI Gateways you have access to and retrieve their endpoints and IDs -* [ ] View which models and MCP tools are available through each gateway -* [ ] Test gateway connectivity before integration - -== Before you begin - -* You have a Redpanda Cloud account with access to at least one AI Gateway -* You have access to the Redpanda Cloud Console or API credentials - -== List your accessible gateways - -[tabs] -==== -Using the Console:: -+ -. Navigate to *Agentic* > *AI Gateway* > *Gateways* in the Redpanda Cloud Console. -. Review the list of gateways you can access. For each gateway, you'll see the gateway name, ID, endpoint URL, status, available models, and provider performance. -+ -Click the Configuration, API, MCP Tools, and Changelog tabs for additional information. - -Using the API:: -+ -To list gateways programmatically: -+ -[source,bash] ----- -curl https://api.redpanda.com/v1/gateways \ - -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" ----- -+ -Response: -+ -[source,json] ----- -{ - "gateways": [ - { - "id": "gw_abc123", - "name": "production-gateway", - "mode": "ai_hub", - "endpoint": "https://gw.ai.panda.com", - "status": "active", - "workspace_id": "ws_xyz789", - "created_at": "2025-01-15T10:30:00Z" - }, - { - "id": "gw_def456", - "name": "staging-gateway", - "mode": "custom", - "endpoint": "https://gw-staging.ai.panda.com", - "status": "active", - "workspace_id": "ws_xyz789", - "created_at": "2025-01-10T08:15:00Z" - } - ] -} ----- -==== - -== Understand gateway information - -Each gateway provides specific information you'll need for integration: - -=== Gateway endpoint - -The gateway endpoint is the URL where you send all API requests. It replaces direct provider URLs (like `api.openai.com` or `api.anthropic.com`). The gateway ID is embedded directly in the endpoint URL. - -Example: -[source,bash] ----- -https://example/gateways/gw_abc123/v1 ----- - -Your application configures this as the `base_url` in your SDK client. - -=== Available models - -Each gateway exposes specific models based on administrator configuration. Models use the `vendor/model_id` format: - -* `openai/gpt-5.2` -* `anthropic/claude-sonnet-4.5` -* `openai/gpt-5.2-mini` - -To see which models are available through a specific gateway: - -[source,bash] ----- -curl ${GATEWAY_ENDPOINT}/models \ - -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" ----- - -Response: - -[source,json] ----- -{ - "object": "list", - "data": [ - { - "id": "openai/gpt-5.2", - "object": "model", - "owned_by": "openai" - }, - { - "id": "anthropic/claude-sonnet-4.5", - "object": "model", - "owned_by": "anthropic" - }, - { - "id": "openai/gpt-5.2-mini", - "object": "model", - "owned_by": "openai" - } - ] -} ----- - -=== Rate limits and quotas - -Each gateway may have configured rate limits and monthly budgets. Check the console or contact your administrator to understand: - -* Requests per minute/hour/day -* Monthly spend limits -* Token usage quotas - -These limits help control costs and ensure fair resource allocation across teams. - -=== MCP Tools - -If glossterm:MCP[,Model Context Protocol (MCP)] aggregation is enabled for your gateway, you can access glossterm:MCP tool[,tools] from multiple glossterm:MCP server[,MCP servers] through a single endpoint. - -To discover available MCP tools: - -[source,bash] ----- -curl ${GATEWAY_ENDPOINT}/mcp/tools \ - -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \ - -H "rp-aigw-mcp-deferred: true" ----- - -With deferred loading enabled, you'll receive search and orchestrator tools initially. You can then query for specific tools as needed. - -// See xref:ai-gateway/builders/use-mcp-tools.adoc[] for more details. - -ifdef::ai-hub-available[] -== Identify gateway mode - -Gateways can operate in two modes: AI Hub mode or Custom mode. Understanding which mode your gateway uses helps you know what to expect. - -include::ai-agents:partial$ai-hub-mode-indicator.adoc[] - -=== What it means for builders - -*AI Hub Mode:* - -* Routing is pre-configured and intelligent -* Models are automatically routed based on system-managed rules -* You cannot see or modify routing rules (they're managed by Redpanda) -* Limited customization via administrator-configured preference toggles -* See xref:ai-gateway/builders/use-ai-hub-gateway.adoc[] for AI Hub-specific guidance - -*Custom Mode:* - -* Routing is configured by your administrator -* You can view configured routing rules in the console -* Administrator has full control over backend pools and policies -* Standard discovery and usage patterns apply (rest of this page) - -[TIP] -==== -If you need specific routing behavior or custom configuration that AI Hub doesn't support, ask your administrator about ejecting to Custom mode or creating a Custom mode gateway. -==== -endif::[] - -== Check gateway availability - -Before integrating your application, verify that you can successfully connect to the gateway: - -=== Test connectivity - -[source,bash] ----- -curl ${GATEWAY_ENDPOINT}/models \ - -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \ - -v ----- - -Expected result: HTTP 200 response with a list of available models. - -=== Test a simple request - -Send a minimal chat completion request to verify end-to-end functionality: - -[source,bash] ----- -curl ${GATEWAY_ENDPOINT}/chat/completions \ - -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \ - -H "Content-Type: application/json" \ - -d '{ - "model": "openai/gpt-5.2-mini", - "messages": [{"role": "user", "content": "Hello"}], - "max_tokens": 10 - }' ----- - -Expected result: HTTP 200 response with a completion. - -=== Troubleshoot connectivity issues - -If you cannot connect to a gateway: - -. *Verify authentication*: Ensure your API token is valid and has not expired -. *Check gateway endpoint*: Confirm the endpoint URL includes the correct gateway ID -. *Verify endpoint URL*: Check for typos in the gateway endpoint -. *Check permissions*: Confirm with your administrator that you have access to this gateway -. *Review network connectivity*: Ensure your network allows outbound HTTPS connections - -== Choose the right gateway - -If you have access to multiple gateways, consider which one to use based on your needs: - -=== By environment - -Organizations often create separate gateways for different environments: - -* Production gateway: Higher rate limits, access to all models, monitoring enabled -* Staging gateway: Lower rate limits, restricted models, aggressive cost controls -* Development gateway: Minimal limits, all models for experimentation - -Choose the gateway that matches your deployment environment. - -=== By team or project - -Gateways may be organized by team or project for cost tracking and isolation: - -* team-ml-gateway: For machine learning team -* team-product-gateway: For product team -* customer-facing-gateway: For production customer workloads - -Use the gateway designated for your team to ensure proper cost attribution. - -=== By capability - -Different gateways may have different features enabled: - -* Gateway with MCP tools: Use if your agent needs to call tools -* Gateway without MCP: Use for simple LLM completions -* Gateway with specific models: Use if you need access to particular models - -== Example: Complete discovery workflow - -Here's a complete workflow to discover and validate gateway access: - -[source,bash] ----- -#!/bin/bash - -# Set your API token -export REDPANDA_CLOUD_TOKEN="your-token-here" - -# Step 1: List all accessible gateways -echo "=== Discovering gateways ===" -curl -s https://api.redpanda.com/v1/gateways \ - -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \ - | jq '.gateways[] | {name: .name, id: .id, endpoint: .endpoint}' - -# Step 2: Select a gateway (example) -export GATEWAY_ENDPOINT="https://example/gateways/gw_abc123/v1" - -# Step 3: List available models -echo -e "\n=== Available models ===" -curl -s ${GATEWAY_ENDPOINT}/models \ - -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \ - | jq '.data[] | .id' - -# Step 4: Test with a simple request -echo -e "\n=== Testing request ===" -curl -s ${GATEWAY_ENDPOINT}/chat/completions \ - -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \ - -H "Content-Type: application/json" \ - -d '{ - "model": "openai/gpt-5.2-mini", - "messages": [{"role": "user", "content": "Say hello"}], - "max_tokens": 10 - }' \ - | jq '.choices[0].message.content' - -echo -e "\n=== Gateway validated successfully ===" ----- - -== Next steps - -* xref:ai-gateway/builders/connect-your-agent.adoc[Connect Your Agent] - Integrate your application -// * xref:ai-gateway/builders/available-models.adoc[Available Models] - Learn about model selection and routing -// * xref:ai-gateway/builders/use-mcp-tools.adoc[Use MCP Tools] - Access tools from MCP servers -// * xref:ai-gateway/builders/monitor-your-usage.adoc[Monitor Your Usage] - Track requests and costs diff --git a/modules/ai-agents/pages/ai-gateway/cel-routing-cookbook.adoc b/modules/ai-agents/pages/ai-gateway/cel-routing-cookbook.adoc deleted file mode 100644 index d0b982198..000000000 --- a/modules/ai-agents/pages/ai-gateway/cel-routing-cookbook.adoc +++ /dev/null @@ -1,953 +0,0 @@ -= CEL Routing Cookbook -:description: CEL routing cookbook for Redpanda AI Gateway with common patterns, examples, and best practices. -:page-topic-type: cookbook -:personas: app_developer, platform_admin -:learning-objective-1: Write CEL expressions to route requests based on user tier or custom headers -:learning-objective-2: Test CEL routing logic using the UI editor or test requests -:learning-objective-3: Troubleshoot common CEL errors using safe patterns - -include::ai-agents:partial$adp-la.adoc[] - -Redpanda AI Gateway uses CEL (Common Expression Language) for dynamic request routing. CEL expressions evaluate request properties (headers, body, context) and determine which model or provider should handle each request. - -CEL enables: - -* User-based routing (free vs premium tiers) -* Content-based routing (by prompt topic, length, complexity) -* Environment-based routing (staging vs production models) -* Cost controls (reject expensive requests in test environments) -* A/B testing (route percentage of traffic to new models) -* Geographic routing (by region header) -* Custom business logic (any condition you can express) - -== CEL basics - -=== What is CEL? - -CEL (Common Expression Language) is a non-Turing-complete expression language designed for fast, safe evaluation. It's used by Google (Firebase, Cloud IAM), Kubernetes, Envoy, and other systems. - -Key properties: - -* Safe: Cannot loop infinitely or access system resources -* Fast: Evaluates in microseconds -* Readable: Similar to Python/JavaScript expressions -* Type-safe: Errors caught at configuration time, not runtime - -=== CEL syntax primer - -Comparison operators: - -[source,cel] ----- -== // equal -!= // Not equal -< // Less than -> // Greater than -<= // Less than or equal ->= // Greater than or equal ----- - - -Logical operators: - -[source,cel] ----- -&& // AND -|| // OR -! // NOT ----- - - -Ternary operator (most common pattern): - -[source,cel] ----- -condition ? value_if_true : value_if_false ----- - - -Functions: - -[source,cel] ----- -.size() // Length of string or array -.contains("text") // String contains substring -.startsWith("x") // String starts with -.endsWith("x") // String ends with -.matches("regex") // Regex match -has(field) // Check if field exists ----- - - -Examples: - -[source,cel] ----- -// Simple comparison -request.headers["tier"] == "premium" - -// Ternary (if-then-else) -request.headers["tier"] == "premium" ? "openai/gpt-5.2" : "openai/gpt-5.2-mini" - -// Logical AND -request.headers["tier"] == "premium" && request.headers["region"] == "us" - -// String contains -request.body.messages[0].content.contains("urgent") - -// Size check -request.body.messages.size() > 10 ----- - - -== Request object schema - -CEL expressions evaluate against the `request` object, which contains: - -=== `request.headers` (map) - -All HTTP headers (lowercase keys). - -[source,cel] ----- -request.headers["x-user-tier"] // Custom header -request.headers["x-customer-id"] // Custom header -request.headers["user-agent"] // Standard header -request.headers["x-request-id"] // Standard header ----- - - -NOTE: Header names are case-insensitive in HTTP, but CEL requires lowercase keys. - -=== `request.body` (object) - -The JSON request body (for `/chat/completions`). - -[source,cel] ----- -request.body.model // String: Requested model -request.body.messages // Array: Conversation messages -request.body.messages[0].role // String: "system", "user", "assistant" -request.body.messages[0].content // String: Message content -request.body.messages.size() // Int: Number of messages -request.body.max_tokens // Int: Max completion tokens (if set) -request.body.temperature // Float: Temperature (if set) -request.body.stream // Bool: Streaming enabled (if set) ----- - - -NOTE: Fields are optional. Use `has()` to check existence: - -[source,cel] ----- -has(request.body.max_tokens) ? request.body.max_tokens : 1000 ----- - - -=== `request.path` (string) - -The request path. - -[source,cel] ----- -request.path == "/v1/chat/completions" -request.path.startsWith("/v1/") ----- - - -=== `request.method` (string) - -The HTTP method. - -[source,cel] ----- -request.method == "POST" ----- - - -== CEL routing patterns - -Each pattern follows this structure: - -* When to use: Scenario description -* Expression: CEL code -* What happens: Routing behavior -* Verify: How to test -* Cost/performance impact: Implications - -=== Tier-based routing - -When to use: Different user tiers (free, pro, enterprise) should get different model quality - -Expression: - -[source,cel] ----- -request.headers["x-user-tier"] == "enterprise" ? "openai/gpt-5.2" : -request.headers["x-user-tier"] == "pro" ? "anthropic/claude-sonnet-4.5" : -"openai/gpt-5.2-mini" ----- - - -What happens: - -* Enterprise users → GPT-5.2 (best quality) -* Pro users → Claude Sonnet 4.5 (balanced) -* Free users → GPT-5.2-mini (cost-effective) - -Verify: - -[source,python] ----- -# Test enterprise -response = client.chat.completions.create( - model="openai/gpt-5.2", # CEL routing rules override model selection - messages=[{"role": "user", "content": "Test"}], - extra_headers={"x-user-tier": "enterprise"} -) -# Check logs: Should route to openai/gpt-5.2 - -# Test free -response = client.chat.completions.create( - model="openai/gpt-5.2", # CEL routing rules override model selection - messages=[{"role": "user", "content": "Test"}], - extra_headers={"x-user-tier": "free"} -) -# Check logs: Should route to openai/gpt-5.2-mini ----- - - -Cost impact: - -* Enterprise: ~$5.00 per 1K requests -* Pro: ~$3.50 per 1K requests -* Free: ~$0.50 per 1K requests - -Use case: SaaS product with tiered pricing where model quality is a differentiator - -=== Environment-based routing - -When to use: Prevent staging from using expensive models - -Expression: - -[source,cel] ----- -request.headers["x-environment"] == "production" - ? "openai/gpt-5.2" - : "openai/gpt-5.2-mini" ----- - - -What happens: - -* Production → GPT-5.2 (best quality) -* Staging/dev → GPT-5.2-mini (10x cheaper) - -Verify: - -[source,python] ----- -# Set environment header -response = client.chat.completions.create( - model="openai/gpt-5.2", # CEL routing rules override model selection - messages=[{"role": "user", "content": "Test"}], - extra_headers={"x-environment": "staging"} -) -# Check logs: Should route to gpt-5.2-mini ----- - - -Cost impact: - -* Prevents staging from inflating costs -* Example: Staging with 100K test requests/day - * GPT-5.2: $500/day ($15K/month) - * GPT-5.2-mini: $50/day ($1.5K/month) - * *Savings: $13.5K/month* - -Use case: Protect against runaway staging costs - - -=== Content-length guard rails - -When to use: Block or downgrade long prompts to prevent cost spikes - -//// -Expression (Block): - -[source,cel] ----- -request.body.messages.size() > 10 || request.body.max_tokens > 4000 - ? "reject" - : "openai/gpt-5.2" ----- - -What happens: -* Requests with >10 messages or >4000 max_tokens -> Rejected with 400 error -* Normal requests -> GPT-5.2 -//// - -Expression (Downgrade): - -[source,cel] ----- -request.body.messages.size() > 10 || request.body.max_tokens > 4000 - ? "openai/gpt-5.2-mini" // Cheaper model - : "openai/gpt-5.2" // Normal model ----- - - -What happens: - -* Long conversations → Downgraded to cheaper model -* Short conversations → Premium model - -Verify: - -[source,python] ----- -# Test rejection -response = client.chat.completions.create( - model="openai/gpt-5.2", # CEL routing rules override model selection - messages=[{"role": "user", "content": f"Message {i}"} for i in range(15)], - max_tokens=5000 -) -# Should return 400 error (rejected) - -# Test normal -response = client.chat.completions.create( - model="openai/gpt-5.2", # CEL routing rules override model selection - messages=[{"role": "user", "content": "Short message"}], - max_tokens=100 -) -# Should route to gpt-5.2 ----- - - -Cost impact: - -* Prevents unexpected bills from verbose prompts -* Example: Block requests >10K tokens (would cost $0.15 each) - -Use case: Staging cost controls, prevent prompt injection attacks that inflate token usage - -=== Topic-based routing - -When to use: Route different question types to specialized models - -Expression: - -[source,cel] ----- -request.body.messages[0].content.contains("code") || -request.body.messages[0].content.contains("debug") || -request.body.messages[0].content.contains("programming") - ? "openai/gpt-5.2" // Better at code - : "anthropic/claude-sonnet-4.5" // Better at general writing ----- - - -What happens: - -* Coding questions → GPT-5.2 (optimized for code) -* General questions → Claude Sonnet (better prose) - -Verify: - -[source,python] ----- -# Test code question -response = client.chat.completions.create( - model="openai/gpt-5.2", # CEL routing rules override model selection - messages=[{"role": "user", "content": "Debug this Python code: ..."}] -) -# Check logs: Should route to gpt-5.2 - -# Test general question -response = client.chat.completions.create( - model="openai/gpt-5.2", # CEL routing rules override model selection - messages=[{"role": "user", "content": "Write a blog post about AI"}] -) -# Check logs: Should route to claude-sonnet-4.5 ----- - - -Cost impact: - -* Optimize model selection for task type -* Could improve quality without increasing costs - -Use case: Multi-purpose chatbot with both coding and general queries - - -=== Geographic/regional routing - -When to use: Route by user region to different providers or gateways for compliance or latency optimization - -Expression: - -[source,cel] ----- -request.headers["x-user-region"] == "eu" - ? "anthropic/claude-sonnet-4.5" // EU traffic to Anthropic - : "openai/gpt-5.2" // Other traffic to OpenAI ----- - - -What happens: - -* EU users -> Anthropic (for EU data processing requirements) -* Other users -> OpenAI (default provider) - -NOTE: To achieve true data residency, configure separate gateways per region with provider pools that meet your compliance requirements. - -Verify: - -[source,python] ----- -response = client.chat.completions.create( - model="openai/gpt-5.2", # CEL routing rules override model selection - messages=[{"role": "user", "content": "Test"}], - extra_headers={"x-user-region": "eu"} -) -# Check logs: Should route to anthropic/claude-sonnet-4.5 ----- - - -Cost impact: Varies by provider pricing - -Use case: GDPR compliance, data residency requirements - - -=== Customer-specific routing - -When to use: Different customers have different model access (enterprise features) - -Expression: - -[source,cel] ----- -request.headers["x-customer-id"] == "customer_vip_123" - ? "anthropic/claude-opus-4.6" // Most expensive, best quality - : "anthropic/claude-sonnet-4.5" // Standard ----- - - -What happens: - -* VIP customer → Best model -* Standard customers → Normal model - -Verify: - -[source,python] ----- -response = client.chat.completions.create( - model="openai/gpt-5.2", # CEL routing rules override model selection - messages=[{"role": "user", "content": "Test"}], - extra_headers={"x-customer-id": "customer_vip_123"} -) -# Check logs: Should route to claude-opus-4 ----- - - -Cost impact: - -* VIP: ~$7.50 per 1K requests -* Standard: ~$3.50 per 1K requests - -Use case: Enterprise contracts with premium model access - - -//// -=== A/B testing (percentage-based routing) - -When to use: Test new models with a percentage of traffic - -PLACEHOLDER: Confirm if CEL can access random functions or if A/B testing requires different mechanism - -Expression (if random is available): - -[source,cel] ----- -PLACEHOLDER: Verify CEL random function availability -random() < 0.10 - ? "anthropic/claude-opus-4.6" // 10% traffic to new model - : "openai/gpt-5.2" // 90% traffic to existing model ----- - - -Alternative (hash-based): - -[source,cel] ----- -// Use customer ID hash for stable routing -hash(request.headers["x-customer-id"]) % 100 < 10 - ? "anthropic/claude-opus-4.6" - : "openai/gpt-5.2" ----- - - -What happens: - -* 10% of requests -> New model (Opus 4) -* 90% of requests -> Existing model (GPT-5.2) - -Verify: - -[source,python] ----- -# Send 100 requests, count which model was used -for i in range(100): - response = client.chat.completions.create( - model="openai/gpt-5.2", - messages=[{"role": "user", "content": f"Test {i}"}], - extra_headers={"x-customer-id": f"customer_{i}"} - ) -# Check logs: ~10 should use opus-4.6, ~90 should use gpt-5.2 ----- - - -Cost impact: - -* Allows safe, incremental rollout of new models -* Monitor quality/cost for new model before full adoption - -Use case: Evaluate new models in production with real traffic -//// - -=== Complexity-based routing - -When to use: Route simple queries to cheap models, complex queries to expensive models - -Expression: - -[source,cel] ----- -request.body.messages.size() == 1 && -request.body.messages[0].content.size() < 100 - ? "openai/gpt-5.2-mini" // Simple, short question - : "openai/gpt-5.2" // Complex or long conversation ----- - - -What happens: - -* Single short message (<100 chars) → Cheap model -* Multi-turn or long messages → Premium model - -Verify: - -[source,python] ----- -# Test simple -response = client.chat.completions.create( - model="openai/gpt-5.2", # CEL routing rules override model selection - messages=[{"role": "user", "content": "Hi"}] # 2 chars -) -# Check logs: Should route to gpt-5.2-mini - -# Test complex -response = client.chat.completions.create( - model="openai/gpt-5.2", # CEL routing rules override model selection - messages=[ - {"role": "user", "content": "Long question here..." * 10}, - {"role": "assistant", "content": "Response"}, - {"role": "user", "content": "Follow-up"} - ] -) -# Check logs: Should route to gpt-5.2 ----- - - -Cost impact: - -* Can reduce costs significantly if simple queries are common -* Example: 50% of queries are simple, save 90% on those = 45% total savings - -Use case: FAQ chatbot with mix of simple lookups and complex questions - -//// -=== Time-based routing - -When to use: Use cheaper models during off-peak hours - -PLACEHOLDER: Confirm if CEL has access to current timestamp - -Expression (if time functions available): - -[source,cel] ----- -PLACEHOLDER: Verify CEL time function availability -now().hour >= 22 || now().hour < 6 // 10pm - 6am - ? "openai/gpt-5.2-mini" // Off-peak: cheaper model - : "openai/gpt-5.2" // Peak hours: best model ----- - - -What happens: - -* Off-peak hours (10pm-6am) -> Cheap model -* Peak hours (6am-10pm) -> Premium model - -Cost impact: - -* Optimize for user experience during peak usage -* Save costs during low-traffic hours - -Use case: Consumer apps with time-zone-specific usage patterns -//// - - -=== Fallback chain (multi-level) - -When to use: Complex fallback logic beyond simple primary/secondary - -Expression: - -[source,cel] ----- -request.headers["x-priority"] == "critical" - ? "openai/gpt-5.2" // First choice for critical - : request.headers["x-user-tier"] == "premium" - ? "anthropic/claude-sonnet-4.5" // Second choice for premium - : "openai/gpt-5.2-mini" // Default for everyone else ----- - - -What happens: - -* Critical requests → Always GPT-5.2 -* Premium non-critical → Claude Sonnet -* Everyone else → GPT-5.2-mini - -Verify: Test with different header combinations - -Cost impact: Ensures SLA for critical requests while optimizing costs elsewhere - -Use case: Production systems with SLA requirements - - -== Advanced CEL patterns - -=== Default values with `has()` - -Problem: Field might not exist in request - -Expression: - -[source,cel] ----- -has(request.body.max_tokens) && request.body.max_tokens > 2000 - ? "openai/gpt-5.2" // Long response expected - : "openai/gpt-5.2-mini" // Short response ----- - - -What happens: Safely checks if `max_tokens` exists before comparing - -=== Multiple conditions with parentheses - -Expression: - -[source,cel] ----- -(request.headers["x-user-tier"] == "premium" || - request.headers["x-customer-id"] == "vip_123") && -request.headers["x-environment"] == "production" - ? "openai/gpt-5.2" - : "openai/gpt-5.2-mini" ----- - - -What happens: Premium users OR VIP customer, AND production → GPT-5.2 - -=== Regex matching - -Expression: - -[source,cel] ----- -request.body.messages[0].content.matches("(?i)(urgent|asap|emergency)") - ? "openai/gpt-5.2" // Route urgent requests to best model - : "openai/gpt-5.2-mini" ----- - - -What happens: Messages containing "urgent", "ASAP", or "emergency" (case-insensitive) → GPT-5.2 - -=== String array contains - -Expression: - -[source,cel] ----- -["customer_1", "customer_2", "customer_3"].exists(c, c == request.headers["x-customer-id"]) - ? "openai/gpt-5.2" // Whitelist of customers - : "openai/gpt-5.2-mini" ----- - - -What happens: Only specific customers get premium model - -//// -=== Reject invalid requests - -Expression: - -[source,cel] ----- -!has(request.body.messages) || request.body.messages.size() == 0 - ? "reject" // PLACEHOLDER: Confirm "reject" is supported - : "openai/gpt-5.2" ----- - -What happens: Requests without messages are rejected (400 error) -//// - -== Test CEL expressions - -=== Option 1: CEL editor in UI (if available) - -1. Navigate to *Agentic* → *AI Gateway* → *Gateways* → *Routing Rules* -2. Enter CEL expression -3. Click "Test" -4. Input test headers/body -5. View evaluated result - -=== Option 2: Send test requests - -[source,python] ----- -def test_cel_routing(headers, messages): - """Test CEL routing with specific headers and messages""" - response = client.chat.completions.create( - model="openai/gpt-5.2", # CEL routing rules override model selection - messages=messages, - extra_headers=headers, - max_tokens=10 # Keep it cheap - ) - - # Check logs to see which model was used - print(f"Headers: {headers}") - print(f"Routed to: {response.model}") - -# Test tier-based routing -test_cel_routing( - {"x-user-tier": "premium"}, - [{"role": "user", "content": "Test"}] -) -test_cel_routing( - {"x-user-tier": "free"}, - [{"role": "user", "content": "Test"}] -) ----- - - -//// -=== Option 3: CLI test (if available) - -[source,bash] ----- -# PLACEHOLDER: If CLI tool exists for testing CEL -rpk cloud ai-gateway test-cel \ - --gateway-id gw_abc123 \ - --expression 'request.headers["tier"] == "premium" ? "openai/gpt-5.2" : "openai/gpt-5.2-mini"' \ - --header 'tier: premium' \ - --body '{"messages": [{"role": "user", "content": "Test"}]}' - -# Expected output: openai/gpt-5.2 ----- -//// - - -== Common CEL errors - -=== Error: "unknown field" - -Symptom: - -[source,text] ----- -Error: Unknown field 'request.headers.x-user-tier' ----- - - -Cause: Wrong syntax (dot notation instead of bracket notation for headers) - -Fix: - -[source,cel] ----- -// Wrong -request.headers.x-user-tier - -// Correct -request.headers["x-user-tier"] ----- - - -=== Error: "type mismatch" - -Symptom: - -[source,text] ----- -Error: Type mismatch: expected bool, got string ----- - - -Cause: Forgot comparison operator - -Fix: - -[source,cel] ----- -// Wrong (returns string) -request.headers["tier"] - -// Correct (returns bool) -request.headers["tier"] == "premium" ----- - - -=== Error: "field does not exist" - -Symptom: - -[source,text] ----- -Error: No such key: max_tokens ----- - - -Cause: Accessing field that doesn't exist in request - -Fix: -[source,cel] ----- -// Wrong (crashes if max_tokens not in request) -request.body.max_tokens > 1000 - -// Correct (checks existence first) -has(request.body.max_tokens) && request.body.max_tokens > 1000 ----- - - -=== Error: "index out of bounds" - -Symptom: - -[source,text] ----- -Error: Index 0 out of bounds for array of size 0 ----- - - -Cause: Accessing array element that doesn't exist - -Fix: - -[source,cel] ----- -// Wrong (crashes if messages empty) -request.body.messages[0].content.contains("test") - -// Correct (checks size first) -request.body.messages.size() > 0 && request.body.messages[0].content.contains("test") ----- - - -== CEL performance considerations - -=== Expression complexity - -Fast (<1ms evaluation): - -[source,cel] ----- -request.headers["tier"] == "premium" ? "openai/gpt-5.2" : "openai/gpt-5.2-mini" ----- - - -Slower (~5-10ms evaluation): - -[source,cel] ----- -request.body.messages[0].content.matches("complex.*regex.*pattern") ----- - - -Recommendation: Keep expressions simple. Complex regex can add latency. - -=== Number of evaluations - -Each request evaluates CEL expression once. Total latency impact: -* Simple expression: <1ms -* Complex expression: ~5-10ms - -*Acceptable for most use cases.* - -== CEL function reference - -=== String functions - -[cols="2,3,3"] -|=== -| Function | Description | Example - -| `size()` -| String length -| `"hello".size() == 5` - -| `contains(s)` -| String contains -| `"hello".contains("ell")` - -| `startsWith(s)` -| String starts with -| `"hello".startsWith("he")` - -| `endsWith(s)` -| String ends with -| `"hello".endsWith("lo")` - -| `matches(regex)` -| Regex match -| `"hello".matches("h.*o")` -|=== - -=== Array functions - -[cols="2,3,3"] -|=== -| Function | Description | Example - -| `size()` -| Array length -| `[1,2,3].size() == 3` - -| `exists(x, cond)` -| Any element matches -| `[1,2,3].exists(x, x > 2)` - -| `all(x, cond)` -| All elements match -| `[1,2,3].all(x, x > 0)` -|=== - -=== Utility functions - -[cols="2,3,3"] -|=== -| Function | Description | Example - -| `has(field)` -| Field exists -| `has(request.body.max_tokens)` -|=== - -== Next steps - -* *Apply CEL routing*: See the gateway configuration options available in the Redpanda Cloud console. diff --git a/modules/ai-agents/pages/ai-gateway/configure-provider.adoc b/modules/ai-agents/pages/ai-gateway/configure-provider.adoc new file mode 100644 index 000000000..884db7846 --- /dev/null +++ b/modules/ai-agents/pages/ai-gateway/configure-provider.adoc @@ -0,0 +1,255 @@ += Configure an LLM Provider +:description: Create an LLM provider in AI Gateway to proxy requests to OpenAI, Anthropic, Google Gemini, AWS Bedrock, or any OpenAI-compatible endpoint through a managed Redpanda URL. +:page-topic-type: how-to +:personas: platform_admin, app_developer +:page-aliases: ai-gateway/admin/setup-guide.adoc, ai-gateway/gateway-quickstart.adoc +:learning-objective-1: Create an LLM provider for OpenAI, Anthropic, Google Gemini, AWS Bedrock, or an OpenAI-compatible endpoint +:learning-objective-2: Select the models you want to expose through the provider +:learning-objective-3: Verify the provider is reachable by sending a test request through its proxy URL + +include::ai-agents:partial$adp-la.adoc[] + +An LLM provider is the primary resource in AI Gateway. When you create one, Redpanda gives you a managed proxy URL that your applications can point at: Redpanda handles the upstream API keys, forwards requests to the provider, and records usage for you. This guide walks you through creating a provider for each supported upstream. + +After completing this guide, you will be able to: + +* [ ] Create an LLM provider for OpenAI, Anthropic, Google Gemini, AWS Bedrock, or an OpenAI-compatible endpoint +* [ ] Select the models you want to expose through the provider +* [ ] Verify the provider is reachable by sending a test request through its proxy URL + +== Prerequisites + +* Access to the Redpanda Agentic Data Plane. ++ +// TODO: confirm how users obtain access to ADP once the standalone UI launches (sign-up flow, tenancy model, role requirements). +* An API key (or AWS credentials for Bedrock) for the upstream provider you want to configure. +* One or more secrets already created in the ADP secret store for the provider's credentials. Secret references must use `UPPER_SNAKE_CASE`. For example: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `AWS_ACCESS_KEY_ID`. ++ +// TODO: xref the secrets-management page for ADP once confirmed. + +== Open the LLM Providers page + +// TODO: confirm the exact sign-in URL for the standalone ADP UI. + +. Sign in to the ADP UI and open *LLM Providers*. ++ +// TODO: screenshot: LLM Providers list, empty state vs. populated list. +. Click *Create provider*. + +== Choose a provider type + +AI Gateway supports five provider types: + +[cols="1,3"] +|=== +|Type |Use when + +|OpenAI +|You have an OpenAI API account. Default endpoint is the public OpenAI API; override `Base URL` for Azure OpenAI or another OpenAI-hosted region. + +|Anthropic +|You have an Anthropic API account. Supports forwarding client `Authorization` headers to Anthropic for enterprise and Max-plan subscription passthrough (see <>). + +|Google (Gemini) +|You have a Google AI API key. Uses the standard Generative Language API. + +|AWS Bedrock +|You want to use foundation models hosted in AWS Bedrock. Requires an AWS region and credentials (static, STS-assumed role, or the default credential chain). + +|OpenAI-compatible +|You have a self-hosted endpoint that implements the OpenAI API surface (for example, vLLM, Ollama, LM Studio, or LocalAI). Requires a `Base URL`; authentication is optional. +|=== + +== Fill in the provider form + +Every provider has the same identity fields plus a type-specific configuration block. + +=== Identity fields (all types) + +[cols="1,1,3"] +|=== +|Field |Required |Notes + +|Name +|Yes +|Machine identifier. Lowercase letters, numbers, and hyphens only (`^[a-z][a-z0-9-]*$`), up to 63 characters. Immutable after creation. Appears in the proxy URL (`/llm/v1/providers//...`). + +|Display name +|No +|Human-readable label shown in the UI. Up to 253 characters. + +|Description +|No +|Operator-visible description. Shown next to the provider name in the admin UI. + +|Models +|No +|Model identifiers you want to expose through this provider. Leave empty to allow all models. See <>. + +|Enabled +|Yes (toggle) +|Disabled providers reject all API requests. +|=== + +=== Type-specific configuration + +[tabs] +====== +OpenAI:: ++ +[cols="1,3"] +|=== +|Field |Notes + +|Base URL +|Optional. Leave empty for the standard OpenAI API (`https://api.openai.com/v1`). Override for Azure OpenAI or other OpenAI-hosted endpoints. + +|API key reference +|Required. Secret-store key for the OpenAI API key. Must be `UPPER_SNAKE_CASE`, for example `OPENAI_API_KEY`. +|=== + +Anthropic:: ++ +[cols="1,3"] +|=== +|Field |Notes + +|Base URL +|Optional. Leave empty for the standard Anthropic API. + +|API key reference +|Required unless *Authorization passthrough* is enabled. `UPPER_SNAKE_CASE`, for example `ANTHROPIC_API_KEY`. + +|Authorization passthrough +|Optional toggle. When on, the client's `Authorization` header is forwarded to Anthropic instead of using a server-side API key. Used for enterprise and Max-plan OAuth passthrough: each client authenticates with its own Anthropic subscription. Leave the API key reference empty when using passthrough. +|=== + +Google (Gemini):: ++ +[cols="1,3"] +|=== +|Field |Notes + +|Base URL +|Optional. Leave empty for the standard Google AI API (`https://generativelanguage.googleapis.com`). + +|API key reference +|Required. Secret-store key for the Google AI API key. `UPPER_SNAKE_CASE`, for example `GOOGLE_AI_API_KEY`. +|=== ++ +[IMPORTANT] +==== +Gemini uses the `x-goog-api-key` header for authentication, not `Authorization: Bearer`. This matters when you wire up clients. See xref:ai-agents:ai-gateway/connect-agent.adoc[]. +==== + +AWS Bedrock:: ++ +[cols="1,3"] +|=== +|Field |Notes + +|Region +|Required. AWS region where the Bedrock endpoint is deployed, for example `us-east-1`. + +|Base URL +|Optional. Override the default regional Bedrock endpoint. + +|Credential type +|Choose one of: + +* *Default credential chain* (leave unset). Uses environment variables, IRSA, EKS Pod Identity, or instance profile. +* *Static credentials*. Secret references for `access_key_id_ref` and `secret_access_key_ref`, both `UPPER_SNAKE_CASE`. +* *Assume role*. Uses `role_arn`, `external_id` (optional, required when the role's trust policy mandates it), and `session_name` (for CloudTrail audit). +|=== + +OpenAI-compatible:: ++ +[cols="1,3"] +|=== +|Field |Notes + +|Base URL +|Required. URL of your self-hosted endpoint, for example `http://vllm.internal:8000/v1` or `http://ollama.local:11434/v1`. + +|API key reference +|Optional. Leave empty for no-auth endpoints (common for local runtimes). `UPPER_SNAKE_CASE` if set. +|=== ++ +TIP: Unlike 1P catalogs, OpenAI-compatible endpoints can serve any model. Enter the exact model identifiers your upstream server exposes (for example, `meta-llama/Llama-3.3-70B-Instruct` or `qwen3:8b`). +====== + +[[select-models]] +== Select models + +Leave the *Models* field empty to allow every model the upstream exposes, or enter explicit identifiers to restrict the provider to a subset. + +// TODO: confirm whether the UI offers a picker (backed by the provider's model catalog) or a freeform text list, and how this interacts with the model catalog two-way split once that design lands. + +Model IDs are provider-native. Examples: `gpt-4o`, `claude-sonnet-4-5`, `gemini-2.0-flash`, `anthropic.claude-3-5-sonnet-20241022-v2:0`. Check the upstream provider's documentation for the exact identifiers it supports. + +== Save and verify + +. Click *Create* (or *Save*). ++ +// TODO: confirm exact button label. +. On the provider's detail page, open the *Overview* tab. Copy the *Proxy URL*. This is where your applications point. +. Open the *API Examples* tab. The UI generates working curl, Python, Node, and Claude Code snippets tailored to this provider. Copy one and run it from your terminal to confirm the provider is reachable. ++ +// TODO: screenshot: API Examples tab with a curl snippet. + +A successful test request confirms that the provider's credentials, region (Bedrock), and network path are all correct. If the call fails, see <>. + +[[anthropic-authorization-passthrough]] +== Anthropic: authorization passthrough + +If you want each client to authenticate against Anthropic with its own subscription (Claude Pro, Max, Team, or enterprise), enable *Authorization passthrough* instead of configuring a server-side API key. In this mode: + +* Leave the *API key reference* empty. +* Clients must send their own Anthropic `Authorization` header with every request. AI Gateway forwards it unchanged. +* Use this when you want to aggregate individual client subscriptions rather than share a single API account. + +== Edit, disable, or delete a provider + +* *Edit*: you can change any field *except* `Name` and `Type`, which are immutable. Model lists, credential references, and the `Enabled` toggle can all change. +* *Disable*: toggle *Enabled* off. The provider remains in the list, but requests to its proxy URL are rejected until you enable it again. Use this when you want to pause traffic without losing configuration. +* *Delete*: removes the provider permanently. In-flight requests fail; downstream clients receive errors until reconfigured. + +[[troubleshooting]] +== Troubleshooting + +[cols="1,2"] +|=== +|Symptom |What to check + +|`Secret not found` +|Confirm the secret exists in the Redpanda Cloud secret store and the reference in the provider configuration is spelled identically (`UPPER_SNAKE_CASE`, no typos). + +|Bedrock returns `AccessDenied` or region errors +|Verify the AWS region field matches the region where your Bedrock models are enabled. Bedrock model availability varies by region. + +|Anthropic returns 401 when passthrough is enabled +|Confirm the client is sending its own `Authorization` header and the API key reference on the provider is empty. + +|Gemini returns 401 +|Gemini uses the `x-goog-api-key` header, not `Authorization`. If you're seeing 401s on Gemini, check that the client is sending the correct header. See xref:ai-agents:ai-gateway/connect-agent.adoc[]. + +|Provider list empty or 403 +|Confirm your account has the `dataplane_adp_llmprovider_*` permissions in ADP. ++ +// TODO: confirm the exact role/permission model once the standalone ADP UI launches. +|=== + +// TODO: add screenshots of common error toasts once captured from the live environment. + +== Out of scope + +AI Gateway does not provide these capabilities. For current status, consult the Redpanda Cloud release notes. + +* *Multi-provider routing, failover, and retries across providers.* A synthetic provider that fans requests to multiple upstreams is not part of AI Gateway. +* *Spend limits.* Per-user, per-org, and global cost caps are not available. +* *Rate limits.* Requests-per-second, per-minute, or per-day limits are not available. +* *Managed MCP aggregation at the gateway.* Register MCP tool servers separately under *ADP* → *MCP Servers*. + +== Next steps + +* xref:ai-agents:ai-gateway/connect-agent.adoc[Connect your agent]. Point your application's SDK at the proxy URL and make requests. diff --git a/modules/ai-agents/pages/ai-gateway/connect-agent.adoc b/modules/ai-agents/pages/ai-gateway/connect-agent.adoc new file mode 100644 index 000000000..58e81dadf --- /dev/null +++ b/modules/ai-agents/pages/ai-gateway/connect-agent.adoc @@ -0,0 +1,364 @@ += Connect Your Agent +:description: Point your application or AI agent at an AI Gateway provider's proxy URL. Covers the URL shape, OIDC authentication, and SDK examples for OpenAI, Anthropic, Google Gemini, AWS Bedrock, and OpenAI-compatible endpoints. +:page-topic-type: how-to +:personas: app_developer +:page-aliases: ai-gateway/builders/connect-your-agent.adoc +:learning-objective-1: Construct the proxy URL for an LLM provider you have configured +:learning-objective-2: Authenticate to AI Gateway with an OIDC service account token +:learning-objective-3: Send requests through the proxy URL with the SDK of your choice + +include::ai-agents:partial$adp-la.adoc[] + +This guide shows how to connect your glossterm:AI agent[] or application to AI Gateway. You'll construct the proxy URL for a provider you have already created, authenticate with an OIDC service account, and send your first request with the SDK of your choice. + +After completing this guide, you will be able to: + +* [ ] Construct the proxy URL for an LLM provider you have configured +* [ ] Authenticate to AI Gateway with an OIDC service account token +* [ ] Send requests through the proxy URL with the SDK of your choice + +== Prerequisites + +* A configured LLM provider. If you haven't created one yet, see xref:ai-agents:ai-gateway/configure-provider.adoc[]. +* A Redpanda Cloud service account with OIDC client credentials. See xref:security:cloud-authentication.adoc[]. +* A development environment with your chosen programming language. + +== Proxy URL anatomy + +Every provider you create in AI Gateway gets its own proxy URL: + +[source,text] +---- +/llm/v1/providers// +---- + +* ``: the AI Gateway base URL for your ADP environment. Copy it from the provider's *Overview* tab. +* ``: the name you gave the provider when you created it, for example `my-openai` or `prod-anthropic`. +* ``: the upstream provider's native API path (for example, `v1/chat/completions` for OpenAI, `v1/messages` for Anthropic). + +AI Gateway forwards the request to the upstream provider, attaches the configured credentials, and records the request for observability. Your application never sees the upstream API key. + +TIP: The *API Examples* tab on each provider's detail page in the Cloud UI generates ready-to-run curl, Python, Node, and Claude Code snippets pre-filled with the correct proxy URL and paths. When in doubt, copy from there. + +[[authenticate-with-oidc]] +== Authenticate with OIDC + +AI Gateway uses OIDC service accounts with the `client_credentials` grant to mint short-lived access tokens. + +=== Create a service account + +// TODO: confirm where service accounts are managed once ADP is a standalone product (current link points to Redpanda Cloud Organization IAM; ADP may host its own IAM or share Cloud's). + +. In the Redpanda Cloud UI, go to https://cloud.redpanda.com/organization-iam?tab=service-accounts[*Organization IAM* > *Service account*^]. +. Create a new service account and note the *Client ID* and *Client Secret*. + +For details, see xref:security:cloud-authentication.adoc#authenticate-to-the-cloud-api[Authenticate to the Cloud API]. + +=== Fetch an access token + +// TODO: confirm the OIDC discovery URL and audience for ADP once the standalone UI launches. Values below are the current Redpanda Cloud endpoints. + +Use an OIDC client library that supports discovery (such as `openid-client` for Node.js or `authlib` for Python). Your library resolves endpoints automatically from the discovery URL. + +[cols="1,2", options="header"] +|=== +|Parameter |Value + +|Discovery URL +|`\https://auth.prd.cloud.redpanda.com/.well-known/openid-configuration` + +|Token endpoint +|`\https://auth.prd.cloud.redpanda.com/oauth/token` + +|Audience +|`cloudv2-production.redpanda.cloud` + +|Grant type +|`client_credentials` +|=== + +[tabs] +====== +cURL:: ++ +-- +[source,bash] +---- +AUTH_TOKEN=$(curl -s --request POST \ + --url 'https://auth.prd.cloud.redpanda.com/oauth/token' \ + --header 'content-type: application/x-www-form-urlencoded' \ + --data grant_type=client_credentials \ + --data client_id= \ + --data client_secret= \ + --data audience=cloudv2-production.redpanda.cloud | jq -r .access_token) +---- + +Replace `` and `` with your service account credentials. +-- + +Python (authlib):: ++ +-- +[source,python] +---- +from authlib.integrations.requests_client import OAuth2Session +import requests + +# Discover token endpoint from OIDC metadata +metadata = requests.get( + "https://auth.prd.cloud.redpanda.com/.well-known/openid-configuration" +).json() +token_endpoint = metadata["token_endpoint"] + +client = OAuth2Session( + client_id="", + client_secret="", + token_endpoint=token_endpoint, +) + +token = client.fetch_token( + grant_type="client_credentials", + audience="cloudv2-production.redpanda.cloud", +) + +access_token = token["access_token"] +---- + +Passing `token_endpoint` to the `OAuth2Session` constructor lets `authlib` handle renewal automatically. For `client_credentials` grants, it fetches a new token rather than using a refresh token. +-- + +Node.js (openid-client):: ++ +[source,javascript] +---- +import { Issuer } from 'openid-client'; + +const issuer = await Issuer.discover( + 'https://auth.prd.cloud.redpanda.com' +); + +const client = new issuer.Client({ + client_id: '', + client_secret: '', +}); + +const tokenSet = await client.grant({ + grant_type: 'client_credentials', + audience: 'cloudv2-production.redpanda.cloud', +}); + +const accessToken = tokenSet.access_token; +---- +====== + +=== Token lifecycle management + +IMPORTANT: Your client is responsible for refreshing tokens before they expire. OIDC access tokens have a limited TTL set by the identity provider and are not automatically renewed by AI Gateway. Check the `expires_in` field in the token response for the exact duration. + +* Proactively refresh at ~80% of the token's TTL to avoid failed requests. +* `authlib` (Python) handles renewal automatically when you pass `token_endpoint` to `OAuth2Session`. +* For other languages, cache the token and its expiry, then request a new token before the current one expires. + +== Send requests with your SDK + +Set these environment variables. The examples in this section use them throughout: + +[source,bash] +---- +export PROXY_URL="/llm/v1/providers/" +export AUTH_TOKEN="" +---- + +[tabs] +====== +OpenAI SDK:: ++ +[source,python] +---- +import os +from openai import OpenAI + +client = OpenAI( + base_url=os.environ["PROXY_URL"], # .../llm/v1/providers/my-openai + api_key=os.environ["AUTH_TOKEN"], # OIDC access token +) + +response = client.chat.completions.create( + model="gpt-4o", # native OpenAI model ID + messages=[{"role": "user", "content": "Hello from AI Gateway"}], +) +print(response.choices[0].message.content) +---- ++ +The OpenAI SDK calls the proxy's `/v1/chat/completions` path, which AI Gateway forwards to OpenAI unchanged. Use it with any OpenAI provider and, with a different `base_url`, with any OpenAI-compatible provider (vLLM, Ollama, LM Studio, etc.). + +Anthropic SDK:: ++ +[source,python] +---- +import os +from anthropic import Anthropic + +client = Anthropic( + base_url=os.environ["PROXY_URL"], # .../llm/v1/providers/my-anthropic + auth_token=os.environ["AUTH_TOKEN"], # OIDC access token +) + +message = client.messages.create( + model="claude-sonnet-4-5", + max_tokens=1024, + messages=[{"role": "user", "content": "Hello from AI Gateway"}], +) +print(message.content[0].text) +---- ++ +The Anthropic SDK hits `v1/messages` on the proxy, which AI Gateway forwards to Anthropic. If the provider is configured with *Authorization passthrough*, use your own Anthropic `Authorization` header instead of the OIDC token. AI Gateway forwards it unchanged. + +Google Gemini SDK:: ++ +[source,python] +---- +import os +from google import genai + +client = genai.Client( + api_key=os.environ["AUTH_TOKEN"], # forwarded as x-goog-api-key + http_options={"base_url": os.environ["PROXY_URL"]}, # .../llm/v1/providers/my-google +) + +response = client.models.generate_content( + model="gemini-2.0-flash", + contents="Hello from AI Gateway", +) +print(response.text) +---- ++ +[IMPORTANT] +==== +Gemini authenticates with the `x-goog-api-key` header, not `Authorization: Bearer`. Most Google SDKs set `x-goog-api-key` automatically from the `api_key` parameter. If you hand-roll the request, set the header yourself. +==== + +AWS Bedrock:: ++ +Bedrock is different: SigV4 signing is performed *server-side* by AI Gateway using the credentials on the provider. Your client only needs to call the proxy URL with an OIDC token. ++ +[source,python] +---- +import os, httpx + +response = httpx.post( + f"{os.environ['PROXY_URL']}/model/anthropic.claude-3-5-sonnet-20241022-v2:0/invoke", + headers={"Authorization": f"Bearer {os.environ['AUTH_TOKEN']}"}, + json={ + "anthropic_version": "bedrock-2023-05-31", + "messages": [{"role": "user", "content": "Hello"}], + "max_tokens": 1024, + }, +) +print(response.json()) +---- ++ +// TODO: verify Bedrock request shape end-to-end on adp-production once credentials are available; replace placeholder model ID if the provider gates a different set. + +OpenAI-compatible:: ++ +Use the OpenAI SDK with the proxy URL of the OpenAI-compatible provider and whatever model identifier the upstream exposes: ++ +[source,python] +---- +import os +from openai import OpenAI + +client = OpenAI( + base_url=os.environ["PROXY_URL"], # .../llm/v1/providers/my-vllm + api_key=os.environ["AUTH_TOKEN"], +) + +response = client.chat.completions.create( + model="meta-llama/Llama-3.3-70B-Instruct", # as exposed by your vLLM/Ollama endpoint + messages=[{"role": "user", "content": "Hello"}], +) +---- +====== + +== Streaming responses + +Streaming passes through unchanged. Use the SDK's native streaming API; the proxy forwards the stream byte-for-byte. + +[source,python] +---- +response = client.chat.completions.create( + model="gpt-4o", + messages=[{"role": "user", "content": "Write a short poem"}], + stream=True, +) + +for chunk in response: + if chunk.choices[0].delta.content: + print(chunk.choices[0].delta.content, end="", flush=True) +---- + +== Handle errors + +AI Gateway returns standard HTTP status codes. The upstream provider's error body passes through, so your existing SDK error handling works: + +[cols="1,3"] +|=== +|Status |Meaning + +|400 +|Bad request. Invalid parameters or malformed JSON. + +|401 +|Authentication failed. OIDC token invalid, expired, or (for Gemini) sent in the wrong header. + +|403 +|Forbidden. The service account lacks the required role, or the provider is disabled. + +|404 +|Provider or model not found. Verify the provider name in the URL and the model identifier. + +|429 +|Rate limited. Either by the upstream provider or (when configured) by AI Gateway itself. Respect `Retry-After` if present. + +|5xx +|Upstream or gateway error. Retry with exponential backoff. +|=== + +== Best practices + +* *Use environment variables* for the proxy URL and token; never hard-code them. +* *Implement retry with exponential backoff* for 5xx and timeout conditions. +* *Respect `Retry-After`* on 429 responses. +* *Rotate service account credentials* on a schedule your organization accepts. +* *Observe usage* through the Cloud UI. Requests and token counts appear on the provider's *Overview* tab. + +== Troubleshooting + +=== 401 Unauthorized + +* Check the OIDC token hasn't expired and refresh it. +* Verify the audience is `cloudv2-production.redpanda.cloud`. +* Confirm the `Authorization` header is formatted `Bearer `. +* For Gemini: ensure the token is sent as `x-goog-api-key`, not `Authorization`. +* For Anthropic with passthrough: ensure the client is sending a valid Anthropic `Authorization` header. + +=== 404 Not found + +* Re-check the provider name in the proxy URL. The segment after `/providers/` must match the provider's `Name` exactly. +* For model-not-found: confirm the model identifier is one your upstream actually serves. OpenAI-compatible endpoints accept whatever model IDs the upstream exposes. + +=== 403 Forbidden + +* The service account may lack the required roles. Ask an admin to grant `dataplane_adp_llmprovider_get` at minimum. +* The provider may be disabled. Check the *Enabled* toggle on its detail page. + +=== Connection timeout or reset + +* Verify the proxy URL is correct (copy directly from the provider's *Overview* tab). +* Check that the provider isn't pointing at a private base URL your client can't reach (OpenAI-compatible providers only). +* Confirm the upstream provider's status page. + +== Next steps + +* xref:ai-agents:ai-gateway/configure-provider.adoc[Configure an LLM provider]. Add another provider to your ADP environment. diff --git a/modules/ai-agents/pages/ai-gateway/gateway-architecture.adoc b/modules/ai-agents/pages/ai-gateway/gateway-architecture.adoc deleted file mode 100644 index dbf58af39..000000000 --- a/modules/ai-agents/pages/ai-gateway/gateway-architecture.adoc +++ /dev/null @@ -1,221 +0,0 @@ -= AI Gateway Architecture -:description: Technical architecture of Redpanda AI Gateway, including how the control plane, data plane, and observability plane deliver high availability, cost governance, and multi-tenant isolation. -:page-topic-type: concept -:personas: app_developer, platform_admin -:learning-objective-1: Describe the three architectural planes of AI Gateway -:learning-objective-2: Explain the request lifecycle through policy evaluation stages -:learning-objective-3: Identify supported providers, features, and current limitations - -include::ai-agents:partial$adp-la.adoc[] - -This page provides technical details about AI Gateway's architecture, request processing, and capabilities. For an overview of AI Gateway, see xref:ai-agents:ai-gateway/what-is-ai-gateway.adoc[] - -== Architecture overview - -AI Gateway consists of a glossterm:control plane[] for configuration and management, a glossterm:data plane[] for request processing and routing, and an observability plane for monitoring and analytics. - -// PLACEHOLDER: Add architecture diagram showing: -// 1. Control Plane: -// - Workspace management -// - Provider/model configuration -// - Gateway creation and policy definition -// - Admin console -// -// 2. Data Plane: -// - Request ingestion -// - Policy evaluation (rate limits → spend limits → routing → execution) -// - Provider pool selection and failover -// - MCP aggregation layer -// - Response logging and metrics -// -// 3. Observability Plane: -// - Request logs storage -// - Metrics aggregation -// - Dashboard UI - -=== Control plane - -The control plane manages gateway configuration and policy definition: - -* **Workspace management**: Multi-tenant isolation with separate namespaces for different teams or environments -* **Provider configuration**: Enable and configure LLM providers (such as OpenAI and Anthropic) -* **Gateway creation**: Define gateways with specific routing rules, budgets, and rate limits -* **Policy definition**: Create CEL-based routing policies, spend limits, and rate limits -* **MCP server registration**: Configure which MCP servers are available to agents - -=== Data plane - -The data plane handles all runtime request processing: - -* **Request ingestion**: Accept requests via OpenAI-compatible API endpoints -* **Authentication**: Validate API keys and gateway access -* **Policy evaluation**: Apply rate limits, spend limits, and routing policies -* **Provider pool management**: Select primary or fallback providers based on availability -* **MCP proxy**: Aggregate tools from multiple MCP servers with deferred loading -* **Response transformation**: Normalize provider-specific responses to OpenAI format -* **Metrics collection**: Record token usage, latency, and cost for every request - -=== Observability plane - -The observability plane provides monitoring and analytics: - -* **Request logs**: Store full request/response history with prompt and completion content -* **Metrics aggregation**: Calculate token usage, costs, latency percentiles, and error rates -* **Dashboard UI**: Display real-time and historical analytics per gateway, model, or provider -* **Cost tracking**: Estimate spend based on provider pricing and token consumption - -== Request lifecycle - -When a request flows through AI Gateway, it passes through several policy and routing stages before reaching the LLM provider. Understanding this lifecycle helps you configure policies effectively and troubleshoot issues: - -. Application sends request to gateway endpoint -. Gateway authenticates request -. Rate limit policy evaluates (allow/deny) -. Spend limit policy evaluates (allow/deny) -. Routing policy evaluates (which model/provider to use) -. Provider pool selects backend (primary/fallback) -. Request forwarded to LLM provider -. Response returned to application -. Request logged with tokens, cost, latency, status - -Each policy evaluation happens synchronously in the request path. If rate limits or spend limits reject the request, the gateway returns an error immediately without calling the LLM provider, which helps you control costs. - -=== MCP tool request lifecycle - -For MCP tool requests, the lifecycle differs slightly to support deferred tool loading: - -. Application discovers tools via `/mcp` endpoint -. Gateway aggregates tools from approved MCP servers -. Application receives search + orchestrator tools (deferred loading) -. Application invokes specific tool -. Gateway routes to appropriate MCP server -. Tool execution result returned -. Request logged with execution time, status - -The gateway only loads and exposes specific tools when requested, which dramatically reduces the token overhead compared to loading all tools upfront. - -ifdef::ai-hub-available[] -== AI Hub mode architecture - -AI Gateway supports two modes. In Custom mode, administrators configure all routing rules and backend pools manually. In AI Hub mode, the gateway provides pre-configured intelligent routing. - -=== Intelligent router - -AI Hub mode implements an intelligent router with immutable system rules and user-configurable preferences: - -*6 Pre-configured Backend Pools:* - -* OpenAI (standard requests) -* OpenAI Streaming -* Anthropic with OpenAI-compatible transform (standard requests) -* Anthropic with OpenAI-compatible transform (streaming) -* Anthropic Native (direct passthrough for `/v1/messages`) -* Anthropic Native Streaming - -*17 System Routing Rules:* - -Immutable rules that route requests based on: - -* Model prefix: `openai/*`, `anthropic/*` -* Model name patterns: `gpt-*`, `claude-*`, `o1-*` -* Special routing: embeddings, images, audio, content moderation, legacy completions → OpenAI only -* Native SDK detection: `/v1/messages` → Anthropic passthrough -* Streaming detection → Extended timeout backends - -*Automatic Failover:* - -Built-in fallback behavior when primary providers are unavailable (configurable via preference toggles). - -*6 User Preference Toggles:* - -Configurable preferences that influence routing without modifying rules (see xref:ai-gateway/admin/configure-ai-hub.adoc[] for details). - -Configurable preferences that influence routing without modifying rules. - -=== System-managed vs user-configurable resources - -In AI Hub mode, resources are divided into two categories: - -*System-Managed Resources* (immutable): - -* Backend pool definitions -* Core routing rules -* Failover logic -* Provider selection algorithms - -*User-configurable resources:* - -* Provider credentials (OpenAI, Anthropic, Google Gemini) -* 6 preference toggles -* Rate limits (within bounds) -* Spend limits - -This separation ensures consistent, reliable behavior while allowing customization of common preferences. - -=== Ejecting to Custom mode - -Gateways can be ejected from AI Hub mode to Custom mode in a one-way transition. After ejection: - -* `gateway.mode` changes from `ai_hub` to `custom` -* All previously system-managed resources become user-configurable -* No more automatic AI Hub version updates -* Full control over routing rules, backend pools, and policies - -This allows organizations to start with zero-configuration simplicity and graduate to full control when needed. - -See xref:ai-gateway/admin/eject-to-custom-mode.adoc[] for the ejection process. -endif::[] - -// == Supported features - -// === LLM providers - -// * OpenAI -// * Anthropic -// * // PLACEHOLDER: Google, AWS Bedrock, Azure OpenAI, others? - -// === API compatibility - -// * OpenAI-compatible `/v1/chat/completions` endpoint -// * // PLACEHOLDER: Streaming support? -// * // PLACEHOLDER: Embeddings support? -// * // PLACEHOLDER: Other endpoints? - -// === Policy features - -// * CEL-based routing expressions -// * Rate limiting (// PLACEHOLDER: per-gateway, per-header, per-tenant?) -// * Monthly spend limits (// PLACEHOLDER: per-gateway, per-workspace?) -// * Provider pools with automatic failover -// * // PLACEHOLDER: Caching support? - -// === MCP support - -// * MCP server aggregation -// * Deferred tool loading (often 80-90% token reduction depending on configuration) -// * JavaScript orchestrator for multi-step workflows -// * PLACEHOLDER: Tool execution sandboxing? - -// === Observability - -// * Request logs with full prompt/response history -// * Token usage tracking -// * Estimated cost per request -// * Latency metrics -// * PLACEHOLDER: Metrics export? OpenTelemetry support? - -// == Current limitations - -// * // PLACEHOLDER: List current limitations, for example: -// ** // - Custom model deployments (Azure OpenAI BYOK, AWS Bedrock custom models) -// ** // - Response caching -// ** // - Prompt templates/versioning -// ** // - Guardrails (PII detection, content moderation) -// ** // - Multi-region active-active deployment -// ** // - Metrics export to external systems -// ** // - Budget alerts/notifications - -== Next steps - -* xref:ai-agents:ai-gateway/gateway-quickstart.adoc[]: Route your first request through AI Gateway -* xref:ai-agents:ai-gateway/mcp-aggregation-guide.adoc[]: Configure MCP server aggregation for AI agents diff --git a/modules/ai-agents/pages/ai-gateway/gateway-quickstart.adoc b/modules/ai-agents/pages/ai-gateway/gateway-quickstart.adoc deleted file mode 100644 index c126b173c..000000000 --- a/modules/ai-agents/pages/ai-gateway/gateway-quickstart.adoc +++ /dev/null @@ -1,542 +0,0 @@ -= AI Gateway Quickstart -:description: Get started with AI Gateway. Configure providers, create your first gateway with failover and budget controls, and route your first request. -:page-topic-type: quickstart -:personas: evaluator, app_developer, platform_admin -:learning-objective-1: Enable an LLM provider and create your first gateway -:learning-objective-2: Route your first request through AI Gateway and verify it works -:learning-objective-3: Verify request routing and token usage in the gateway overview - -include::ai-agents:partial$adp-la.adoc[] - -Redpanda AI Gateway keeps your AI-powered applications running and your costs under control by routing all LLM and MCP traffic through a single managed layer with automatic failover and budget enforcement. This quickstart walks you through configuring your first gateway and routing requests through it. - -== Prerequisites - -Before starting, ensure you have: - -* Access to the AI Gateway UI (provided by your administrator) -* Admin permissions to configure providers and models -* API key for at least one LLM provider (OpenAI, Anthropic, or Google AI) -* Python 3.8+, Node.js 18+, or cURL (for testing) - -== Configure a provider - -Providers represent upstream LLM services and their associated credentials. Providers are disabled by default and must be enabled explicitly. - -. Navigate to *Agentic* > *AI Gateway* > *Providers*. -. Select a provider (for example, OpenAI, Anthropic, Google AI). -. On the Configuration tab, click *Add configuration* and enter your API key. -. Verify the provider status shows "Active". - -== Enable models - -After enabling a provider, enable the specific models you want to make available through your gateways. - -. Navigate to *Agentic* > *AI Gateway* > *Models*. -. Enable the models you want to use (for example, `gpt-5.2-mini`, `claude-sonnet-4.5`, `claude-opus-4.6`). -. Verify the models appear as "Enabled" in the model catalog. - -TIP: Different providers have different reliability and cost characteristics. When choosing models, consider your use case requirements for quality, speed, and cost. - -=== Model naming convention - -Requests through AI Gateway must use the `vendor/model_id` format. For example: - -* OpenAI models: `openai/gpt-5.2`, `openai/gpt-5.2-mini` -* Anthropic models: `anthropic/claude-sonnet-4.5`, `anthropic/claude-opus-4.6` -* Google Gemini models: `google/gemini-2.0-flash`, `google/gemini-2.0-pro` - -This format allows the gateway to route requests to the correct provider. - -== Create a gateway - -A gateway is a logical configuration boundary that defines routing policies, rate limits, spend limits, and observability scope. Common gateway patterns include the following: - -* Environment separation: Create separate gateways for staging and production -* Team isolation: One gateway per team for budget tracking -* Customer multi-tenancy: One gateway per customer for isolated policies - -ifdef::ai-hub-available[] -[IMPORTANT] -==== -When creating a gateway, you choose between two modes: - -* *AI Hub Mode*: Zero-configuration with pre-configured routing and backend pools. Just add provider credentials and start routing requests. Ideal for quickstarts and standard use cases. -* *Custom Mode*: Full control over all routing rules, backend pools, and policies. Requires manual configuration. Ideal for custom routing logic and specialized requirements. - -See xref:ai-gateway/gateway-modes.adoc[] to understand which mode fits your needs. This quickstart focuses on Custom mode configuration. -==== -endif::[] - -. Navigate to *Agentic* > *AI Gateway* > *Gateways*. -. Click *Create Gateway*. -+ -ifdef::ai-hub-available[] -. Select the gateway mode: -* *AI Hub*: Choose this for pre-configured intelligent routing (see xref:ai-gateway/admin/configure-ai-hub.adoc[] for setup) -* *Custom*: Choose this for full configuration control -endif::[] -. Configure the gateway: -+ -** Display name: Choose a descriptive name (for example, `my-first-gateway`) -** Workspace: Select a workspace (conceptually similar to a resource group) -** Description: Add context about this gateway's purpose -** Optional metadata for documentation - -After creation, copy the gateway endpoint from the overview page. You'll need this for sending requests. The gateway ID is embedded in the endpoint URL. For example: - -[source,bash] ----- -Endpoint: https://example/gateways/d633lffcc16s73ct95mg/v1 -Gateway ID: d633lffcc16s73ct95mg ----- - -== Send your first request - -Now that you've configured a provider and created a gateway, send a test request to verify everything works. - -[tabs] -==== -Python:: -+ --- -[source,python] ----- -from openai import OpenAI - -client = OpenAI( - base_url="", - api_key="", # Or use gateway's auth -) - -response = client.chat.completions.create( - model="openai/gpt-5.2", # Use vendor/model format - messages=[ - {"role": "user", "content": "Hello!"} - ], -) - -print(response.choices[0].message.content) ----- - -Expected output: - -[source,text] ----- -Hello! How can I help you today? ----- --- - -Node.js:: -+ --- -[source,javascript] ----- -import OpenAI from 'openai'; - -const client = new OpenAI({ - baseURL: '', - apiKey: '', // Or use gateway's auth -}); - -const response = await client.chat.completions.create({ - model: 'anthropic/claude-sonnet-4-5-20250929', // Use vendor/model format - messages: [ - { role: 'user', content: 'Hello!' } - ], -}); - -console.log(response.choices[0].message.content); ----- - -Expected output: - -[source,text] ----- -Hello! How can I help you today? ----- --- - -cURL:: -+ --- -[source,bash] ----- -curl /chat/completions \ - -H "Content-Type: application/json" \ - -H "Authorization: Bearer " \ - -d '{ - "model": "openai/gpt-5.2", - "messages": [ - {"role": "user", "content": "Hello!"} - ] - }' ----- - -Expected output: - -[source,json] ----- -{ - "id": "chatcmpl-abc123", - "object": "chat.completion", - "model": "openai/gpt-5.2", - "choices": [ - { - "index": 0, - "message": { - "role": "assistant", - "content": "Hello! How can I help you today?" - }, - "finish_reason": "stop" - } - ], - "usage": { - "prompt_tokens": 9, - "completion_tokens": 9, - "total_tokens": 18 - } -} ----- --- -==== - -=== Troubleshooting - -If your request fails, check these common issues: - -* 401 Unauthorized: Verify your API key is valid -* 404 Not Found: Confirm the base URL matches your gateway endpoint -* Model not found: Ensure the model is enabled in the model catalog and that you're using the correct `vendor/model` format. - -== Verify in the gateway overview - -Confirm your request was routed through AI Gateway. - -. On the *Overview* tab, check the aggregate metrics: -+ -* *Total Requests*: Should have incremented -* *Total Tokens*: Shows combined input and output tokens -* *Total Cost*: Estimated spend across all requests -* *Avg Latency*: Average response time in milliseconds - -. Scroll to the *Models* table to see per-model statistics: -+ -The model you used in your request should appear with its request count, token usage (input/output), estimated cost, latency, and error rate. - -== Configure LLM routing (optional) - -Configure rate limits, spend limits, and provider pools with failover. - -On the Gateways page, select the *LLM* tab to configure routing policies. The LLM routing pipeline represents the request lifecycle: - -. *Rate Limit*: Control request throughput (for example, 100 requests/second) -. *Spend Limit*: Set monthly budget caps (for example, $15K/month with blocking enforcement) -. *Provider Pools*: Define primary and fallback providers - -=== Configure provider pool with fallback - -For high availability, configure a fallback provider that activates when the primary fails: - -. Add a second provider (for example, Anthropic). -. In your gateway's *LLM* routing configuration: -+ -* *Primary pool*: OpenAI (preferred for quality) -* *Fallback pool*: Anthropic (activates on rate limits, timeouts, or errors) - -. Save the configuration. - -The gateway automatically routes to the fallback when it detects: - -* Rate limit exceeded -* Request timeout -* 5xx server errors from primary provider - -// Monitor the fallback rate in observability to detect primary provider issues early. - -== Configure MCP tools (optional) - -If you're using glossterm:AI agent[,AI agents], configure glossterm:MCP[,Model Context Protocol (MCP)] tool aggregation. - -On the Gateways page, select the *MCP* tab to configure tool discovery and execution. The MCP proxy aggregates multiple glossterm:MCP server[,MCP servers] behind a single endpoint, allowing agents to discover and call glossterm:MCP tool[,tools] through the gateway. - -Configure the MCP settings: - -* *Display name*: Descriptive name for the provider pool -* *Model*: Choose which model handles tool execution -* *Load balancing*: If multiple providers are available, select a strategy (for example, round robin) - -=== Available MCP tools - -The gateway provides these built-in MCP tools: - -* *Data catalog API*: Query your data catalog -* *Memory store*: Persistent storage for agent state -* *Vector search*: Semantic search over embeddings -* *MCP Orchestrator*: Built-in tool for programmatic multi-tool workflows - -The *MCP Orchestrator* enables agents to generate JavaScript code that calls multiple tools in a single orchestrated step, reducing round trips. For example, a workflow requiring 47 file reads can be reduced from 49 round trips to just 1. - -To add external tools (for example, Slack, GitHub), add their MCP server endpoints to your gateway configuration. - -=== Deferred tool loading - -When many tools are aggregated, listing all tools upfront can consume significant tokens. With deferred tool loading, the MCP gateway initially returns only: - -* A tool search capability -* The MCP Orchestrator - -Agents then search for specific tools they need, retrieving only that subset. This can reduce token usage by 80-90% when you have many tools configured. - -== Configure CEL routing rule (optional) - -Use CEL (Common Expression Language) expressions to route requests dynamically based on headers, content, or other request properties. - -The AI Gateway uses CEL for flexible routing without code changes. Use CEL to: - -* Route premium users to better models -* Apply different rate limits based on user tiers -* Enforce policies based on request content - -=== Add a routing rule - -In your gateway's routing configuration: - -. Add a CEL expression to route based on user tier: -+ -[source,cel] ----- -# Route based on user tier header -request.headers["x-user-tier"] == "premium" - ? "openai/gpt-5.2" - : "openai/gpt-5.2-mini" ----- - -. Save the rule. - -The gateway editor helps you discover available request fields (headers, path, body, and so on). - -=== Test the routing rule - -Send requests with different headers to verify routing: - -*Premium user request*: - -[source,python] ----- -response = client.chat.completions.create( - model="openai/gpt-5.2", # Will be routed based on CEL rule - messages=[{"role": "user", "content": "Hello"}], - extra_headers={"x-user-tier": "premium"} -) -# Should route to gpt-5.2 (premium model) ----- - -*Free user request*: - -[source,python] ----- -response = client.chat.completions.create( - model="openai/gpt-5.2-mini", - messages=[{"role": "user", "content": "Hello"}], - extra_headers={"x-user-tier": "free"} -) -# Should route to gpt-5.2-mini (cost-effective model) ----- - -// Check the observability dashboard to verify: -// -// * The correct model was selected based on the header value -// * The routing decision explanation shows which CEL rule matched - -=== Common CEL patterns - -Route based on model family: - -[source,cel] ----- -request.body.model.startsWith("anthropic/") ----- - -Apply a rule to all requests: - -[source,cel] ----- -true ----- - -Guard for field existence: - -[source,cel] ----- -has(request.body.max_tokens) && request.body.max_tokens > 1000 ----- - -For more CEL examples, see xref:ai-agents:ai-gateway/cel-routing-cookbook.adoc[]. - -== Connect AI tools to your gateway - -The AI Gateway provides standardized endpoints that work with various AI development tools. This section shows how to configure popular tools. - -=== MCP endpoint - -If you've configured MCP tools in your gateway, AI agents can connect to the aggregated MCP endpoint: - -* *MCP endpoint URL*: `/mcp` -* *Required headers*: -** `Authorization: Bearer ` - -This endpoint aggregates all MCP servers configured in your gateway. - -=== Environment variables - -For consistent configuration, set these environment variables: - -[source,bash] ----- -export REDPANDA_GATEWAY_URL="" -export REDPANDA_API_KEY="" ----- - -=== Claude Code - -Configure Claude Code using HTTP transport for the MCP connection: - -[source,bash] ----- -claude mcp add --transport http redpanda-aigateway /mcp \ - --header "Authorization: Bearer " ----- - -Alternatively, edit `~/.claude/config.json`: - -[source,json] ----- -{ - "mcpServers": { - "redpanda-ai-gateway": { - "transport": "http", - "url": "/mcp", - "headers": { - "Authorization": "Bearer " - } - } - }, - "apiProviders": { - "redpanda": { - "baseURL": "" - } - } -} ----- - -ifdef::integrations-available[] -For detailed Claude Code setup, see xref:ai-agents:ai-gateway/integrations/claude-code-user.adoc[]. -endif::[] - -=== Continue.dev - -Edit your Continue config file (`~/.continue/config.json`): - -[source,json] ----- -{ - "models": [ - { - "title": "Redpanda AI Gateway - GPT-5.2", - "provider": "openai", - "model": "openai/gpt-5.2", - "apiBase": "", - "apiKey": "" - }, - { - "title": "Redpanda AI Gateway - Claude", - "provider": "anthropic", - "model": "anthropic/claude-sonnet-4.5", - "apiBase": "", - "apiKey": "" - }, - { - "title": "Redpanda AI Gateway - Gemini", - "provider": "google", - "model": "google/gemini-2.0-flash", - "apiBase": "", - "apiKey": "" - } - ] -} ----- - -ifdef::integrations-available[] -For detailed Continue setup, see xref:ai-agents:ai-gateway/integrations/continue-user.adoc[]. -endif::[] - -=== Cursor IDE - -Configure Cursor in Settings (*Cursor* → *Settings* or `Cmd+,`): - -[source,json] ----- -{ - "cursor.ai.providers.openai.apiBase": "" -} ----- - -ifdef::integrations-available[] -For detailed Cursor setup, see xref:ai-agents:ai-gateway/integrations/cursor-user.adoc[]. -endif::[] - -=== Custom applications - -For custom applications using OpenAI, Anthropic, or Google Gemini SDKs: - -*Python with OpenAI SDK*: - -[source,python] ----- -from openai import OpenAI - -client = OpenAI( - base_url="", - api_key="", -) ----- - -*Python with Anthropic SDK*: - -[source,python] ----- -from anthropic import Anthropic - -client = Anthropic( - base_url="", - api_key="", -) ----- - -*Node.js with OpenAI SDK*: - -[source,javascript] ----- -import OpenAI from 'openai'; - -const openai = new OpenAI({ - baseURL: '', - apiKey: process.env.REDPANDA_API_KEY, -}); ----- - -== Next steps - -Explore advanced AI Gateway features: - -* xref:ai-agents:ai-gateway/cel-routing-cookbook.adoc[]: Advanced CEL routing patterns for traffic distribution and cost optimization -* xref:ai-agents:ai-gateway/mcp-aggregation-guide.adoc[]: Configure MCP server aggregation and deferred tool loading -ifdef::integrations-available[] -* xref:ai-agents:ai-gateway/integrations/index.adoc[]: Connect more AI development tools -endif::[] - -Learn about the architecture: - -* xref:ai-agents:ai-gateway/gateway-architecture.adoc[]: Technical architecture, request lifecycle, and deployment models -* xref:ai-agents:ai-gateway/what-is-ai-gateway.adoc[]: Problems AI Gateway solves and common use cases diff --git a/modules/ai-agents/pages/ai-gateway/index.adoc b/modules/ai-agents/pages/ai-gateway/index.adoc index 3bbcbd3ba..2217aae84 100644 --- a/modules/ai-agents/pages/ai-gateway/index.adoc +++ b/modules/ai-agents/pages/ai-gateway/index.adoc @@ -1,6 +1,5 @@ = AI Gateway -:description: Keep AI-powered apps running with automatic provider failover, prevent runaway spend with centralized budget controls, and govern access across teams, apps, and service accounts. :page-layout: index -:personas: platform_admin, app_developer, evaluator +:description: Redpanda's managed proxy for LLM APIs. Create an LLM provider, and point your applications at a Redpanda-hosted URL with managed secrets, authentication, and observability. -include::ai-agents:partial$adp-la.adoc[] \ No newline at end of file +include::ai-agents:partial$adp-la.adoc[] diff --git a/modules/ai-agents/pages/ai-gateway/mcp-aggregation-guide.adoc b/modules/ai-agents/pages/ai-gateway/mcp-aggregation-guide.adoc deleted file mode 100644 index ef8ffe5b7..000000000 --- a/modules/ai-agents/pages/ai-gateway/mcp-aggregation-guide.adoc +++ /dev/null @@ -1,1005 +0,0 @@ -= MCP Gateway -:description: Learn how to use the MCP Gateway to aggregate MCP servers, configure deferred tool loading, create orchestrator workflows, and manage security. -:page-topic-type: guide -:personas: app_developer, platform_admin -:learning-objective-1: Configure MCP aggregation with deferred tool loading to reduce token costs -:learning-objective-2: Write orchestrator workflows to reduce multi-step interactions -:learning-objective-3: Manage approved MCP servers with security controls and audit trails - -include::ai-agents:partial$adp-la.adoc[] - -The MCP Gateway provides glossterm:MCP[,Model Context Protocol (MCP)] aggregation, allowing glossterm:AI agent[,AI agents] to access glossterm:MCP tool[,tools] from multiple MCP servers through a single unified endpoint. This eliminates the need for agents to manage multiple MCP connections and significantly reduces token costs through deferred tool loading. - -MCP Gateway benefits: - -* Single endpoint: One MCP endpoint aggregates all approved MCP servers -* Token reduction: Fewer tokens through deferred tool loading (depending on configuration) -* Centralized governance: Admin-approved MCP servers only -* Orchestration: JavaScript-based orchestrator reduces multi-step round trips -* Security: Controlled tool execution environment - -== What is MCP? - -Model Context Protocol (MCP) is a standard for exposing tools (functions) that AI agents can discover and invoke. MCP servers provide tools like: - -* Database queries -* File system operations -* API integrations (CRM, payment, analytics) -* Search (web, vector, enterprise) -* Code execution -* Workflow automation - -[cols="1,1"] -|=== -| Without AI Gateway | With AI Gateway - -| Agent connects to each MCP server individually -| Agent connects to gateway's unified `/mcp` endpoint - -| Agent loads ALL tools from ALL servers upfront (high token cost) -| Gateway aggregates tools from approved MCP servers - -| No centralized governance or security -| Deferred loading: Only search + orchestrator tools sent initially - -| Complex configuration -| Agent queries for specific tools when needed (token savings) - -| -| Centralized governance and observability -|=== - -== Architecture - -[source,text] ----- -┌─────────────────┐ -│ AI Agent │ -│ (Claude, GPT) │ -└────────┬────────┘ - │ - │ 1. Discover tools with /mcp endpoint - │ 2. Invoke specific tool - │ -┌────────▼────────────────────────────────┐ -│ AI Gateway (MCP Aggregator) │ -│ │ -│ ┌─────────────────────────────────┐ │ -│ │ Deferred tool loading │ │ -│ │ (Send search + orchestrator │ │ -│ │ initially, defer others) │ │ -│ └─────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────┐ │ -│ │ Orchestrator (JavaScript) │ │ -│ │ (Reduce round trips for │ │ -│ │ multi-step workflows) │ │ -│ └─────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────┐ │ -│ │ Approved MCP Server Registry │ │ -│ │ (Admin-controlled) │ │ -│ └─────────────────────────────────┘ │ -└────────┬────────────────────────────────┘ - │ - │ Routes to appropriate MCP server - │ - ┌────▼─────┬──────────┬─────────┐ - │ │ │ │ -┌───▼────┐ ┌──▼─────┐ ┌──▼──────┐ ┌▼──────┐ -│ MCP │ │ MCP │ │ MCP │ │ MCP │ -│Database│ │Filesystem│ │ Slack │ │Search │ -│Server │ │ Server │ │ Server │ │Server │ -└────────┘ └────────┘ └─────────┘ └───────┘ ----- - - -== MCP request lifecycle - -=== Tool discovery (initial connection) - -Agent request: - -[source,http] ----- -GET /mcp/tools -Headers: - Authorization: Bearer {TOKEN} - rp-aigw-mcp-deferred: true # Enable deferred loading ----- - - -Gateway response (with deferred loading): - -[source,json] ----- -{ - "tools": [ - { - "name": "search_tools", - "description": "Query available tools by keyword or category", - "input_schema": { - "type": "object", - "properties": { - "query": {"type": "string"}, - "category": {"type": "string"} - } - } - }, - { - "name": "orchestrator", - "description": "Execute multi-step workflows with JavaScript logic", - "input_schema": { - "type": "object", - "properties": { - "workflow": {"type": "string"}, - "context": {"type": "object"} - } - } - } - ] -} ----- - - -Note: Only 2 tools returned initially (search + orchestrator), not all 50+ tools from all MCP servers. - -Token savings: - -* Without deferred loading: ~5,000-10,000 tokens (all tool definitions) -* With deferred loading: ~500-1,000 tokens (2 tool definitions) -* Typically 80-90% reduction - -=== Tool query (when agent needs specific tool) - -Agent request: - -[source,http] ----- -POST /mcp/tools/search_tools -Headers: - Authorization: Bearer {TOKEN} -Body: -{ - "query": "database query" -} ----- - - -Gateway response: - -[source,json] ----- -{ - "tools": [ - { - "name": "execute_sql", - "description": "Execute SQL query against the database", - "mcp_server": "database-server", - "input_schema": { - "type": "object", - "properties": { - "query": {"type": "string"}, - "database": {"type": "string"} - }, - "required": ["query"] - } - }, - { - "name": "list_tables", - "description": "List all tables in the database", - "mcp_server": "database-server", - "input_schema": { - "type": "object", - "properties": { - "database": {"type": "string"} - } - } - } - ] -} ----- - - -Agent receives only relevant tools based on query. - -=== Tool execution - -Agent request: - -[source,http] ----- -POST /mcp/tools/execute_sql -Headers: - Authorization: Bearer {TOKEN} -Body: -{ - "query": "SELECT * FROM users WHERE tier = 'premium' LIMIT 10", - "database": "prod" -} ----- - - -Gateway: - -1. Routes to appropriate MCP server (database-server) -2. Executes tool -3. Returns result - -Gateway response: - -[source,json] ----- -{ - "result": [ - {"id": 1, "name": "Alice", "tier": "premium"}, - {"id": 2, "name": "Bob", "tier": "premium"}, - ... - ] -} ----- - - -Agent receives result and can continue reasoning. - -== Deferred tool loading - -=== How it works - -Traditional MCP (No deferred loading): - -1. Agent connects to MCP endpoint -2. Gateway sends all tools from all MCP servers (50+ tools) -3. Agent includes all tool definitions in every LLM request -4. High token cost: ~5,000-10,000 tokens per request - -Deferred loading (AI Gateway): - -1. Agent connects to MCP endpoint with `rp-aigw-mcp-deferred: true` header -2. Gateway sends only 2 tools: `search_tools` + `orchestrator` -3. Agent includes only 2 tool definitions in LLM request (~500-1,000 tokens) -4. When agent needs specific tool: - * Agent calls `search_tools` with query (for example, "database") - * Gateway returns matching tools - * Agent calls specific tool (for example, `execute_sql`) -5. Total token cost: Initial 500-1,000 + per-query ~200-500 - -=== When to use deferred loading - -Use deferred loading when: - -* You have 10+ tools across multiple MCP servers -* Agents don't need all tools for every request -* Token costs are a concern -* Agents can handle multi-step workflows (search → execute) - -Don't use deferred loading when: - -* You have <5 tools total (overhead not worth it) -* Agents need all tools for every request (rare) -* Latency is more important than token costs (deferred adds 1 round trip) - -=== Configure deferred loading - -Deferred loading is configured for each MCP server through the *Defer Loading Override* setting in the Create MCP Server dialog. - -. Navigate to your gateway's *MCP* tab. -. Create or edit an MCP server. -. Under *Server Settings*, set *Defer Loading Override*: -+ -[cols="1,2"] -|=== -|Option |Description - -|Inherit from gateway -|Use the gateway-level deferred loading setting (default) - -|Enabled -|Always defer loading from this server. Agents receive only a search tool initially and query for specific tools when needed. - -|Disabled -|Always load all tools from this server upfront. -|=== - -. Click *Save*. - - -=== Measure token savings - -Compare token usage before/after deferred loading: - -1. Check logs without deferred loading: - - * Filter: Gateway = your-gateway, Model = your-model, Date = before enabling - * Note the average tokens per request - -2. Enable deferred loading - -3. Check logs after deferred loading: - - * Filter: Same gateway/model, Date = after enabling - * Note the average tokens per request - -4. Calculate savings: -+ -[source,text] ----- -Savings % = ((Before - After) / Before) × 100 ----- - -Expected results: Typically 80-90% reduction in average tokens per request - -== Orchestrator: multi-step workflows - -=== What is the orchestrator? - -The *orchestrator* is a special tool that executes JavaScript workflows, reducing multi-step interactions from multiple round trips to a single request. - -Without Orchestrator: - -1. Agent: "Search vector database for relevant docs" → Round trip 1 -2. Agent receives results, evaluates: "Results insufficient" -3. Agent: "Fallback to web search" → Round trip 2 -4. Agent receives results, processes → Round trip 3 -5. *Total: 3 round trips* (high latency, 3x token cost) - -With Orchestrator: - -1. Agent: "Execute workflow: Search vector DB → if insufficient, fallback to web search" -2. Gateway executes entire workflow in JavaScript -3. Agent receives final result → *1 round trip* - -Benefits: - -* *Latency Reduction*: 1 round trip vs 3+ -* *Token Reduction*: No intermediate LLM calls needed -* *Reliability*: Workflow logic executes deterministically -* *Cost*: Single LLM call instead of multiple - -=== When to use orchestrator - -Use orchestrator when: - -* Multi-step workflows with conditional logic (if/else) -* Fallback patterns (try A, if fails, try B) -* Sequential tool calls with dependencies -* Loop-based operations (iterate, aggregate) - -Don't use orchestrator when: - -* Single tool call (no benefit) -* Agent needs to reason between steps (orchestrator is deterministic) -* Workflow requires LLM judgment at each step - -=== Orchestrator example: search with fallback - -Scenario: Search vector database; if results insufficient, fallback to web search. - -Without Orchestrator (3 round trips): - -[source,python] ----- -# Agent's internal reasoning (3 separate LLM calls) - -# Round trip 1: Search vector DB -vector_results = call_tool("vector_search", {"query": "Redpanda pricing"}) - -# Round trip 2: Agent evaluates results -if len(vector_results) < 3: - # Round trip 3: Fallback to web search - web_results = call_tool("web_search", {"query": "Redpanda pricing"}) - results = web_results -else: - results = vector_results - -# Agent processes final results ----- - - -With Orchestrator (1 round trip): - -[source,python] ----- -# Agent invokes orchestrator once -results = call_tool("orchestrator", { - "workflow": """ - // JavaScript workflow - const vectorResults = await tools.vector_search({ - query: context.query - }); - - if (vectorResults.length < 3) { - // Fallback to web search - const webResults = await tools.web_search({ - query: context.query - }); - return webResults; - } - - return vectorResults; - """, - "context": { - "query": "Redpanda pricing" - } -}) - -# Agent receives final results directly ----- - - -Savings: - -* Latency: ~3-5 seconds (3 round trips) → ~1-2 seconds (1 round trip) -* Tokens: ~1,500 tokens (3 LLM calls) → ~500 tokens (1 LLM call) -* Cost: ~$0.0075 → ~$0.0025 (67% reduction) - -=== Orchestrator API - -// PLACEHOLDER: Confirm orchestrator API details - -Tool name: `orchestrator` - -Input schema: - -[source,json] ----- -{ - "workflow": "string (JavaScript code)", - "context": "object (variables available to workflow)" -} ----- - - -Available in workflow: - -* `tools.{tool_name}(params)`: Call any tool from approved MCP servers -* `context.{variable}`: Access context variables -* Standard JavaScript: `if`, `for`, `while`, `try/catch`, `async/await` - -Security: - -* Sandboxed execution (no file system, network, or system access) -* Timeout and memory limits are system-managed and cannot be modified - -Limitations: - -* Cannot call external APIs directly (must use MCP tools) -* Cannot import npm packages (built-in JS only) - -=== Orchestrator example: data aggregation - -Scenario: Fetch user data from database, calculate summary statistics. - -[source,python] ----- -results = call_tool("orchestrator", { - "workflow": """ - // Fetch all premium users - const users = await tools.execute_sql({ - query: "SELECT * FROM users WHERE tier = 'premium'", - database: "prod" - }); - - // Calculate statistics - const stats = { - total: users.length, - by_region: {}, - avg_spend: 0 - }; - - let totalSpend = 0; - for (const user of users) { - // Count by region - if (!stats.by_region[user.region]) { - stats.by_region[user.region] = 0; - } - stats.by_region[user.region]++; - - // Sum spend - totalSpend += user.monthly_spend; - } - - stats.avg_spend = totalSpend / users.length; - - return stats; - """, - "context": {} -}) ----- - - -Output: - -[source,json] ----- -{ - "total": 1250, - "by_region": { - "us-east": 600, - "us-west": 400, - "eu": 250 - }, - "avg_spend": 149.50 -} ----- - - -vs Without Orchestrator: - -* Would require fetching all users to agent → agent processes → 2 round trips -* Orchestrator: All processing in gateway → 1 round trip - -=== Orchestrator best practices - -DO: - -* Use for deterministic workflows (same input → same output) -* Use for sequential operations with dependencies -* Use for fallback patterns -* Handle errors with `try/catch` -* Keep workflows readable (add comments) - -DON'T: - -* Use for workflows requiring LLM reasoning at each step (let agent handle that) -* Execute long-running operations (timeout will hit) -* Access external resources (use MCP tools instead) -* Execute untrusted user input (security risk) - -== MCP server administration - -=== Add MCP servers - -Prerequisites: - -* MCP server URL -* Authentication method (if required) -* List of tools to enable - -Steps: - -1. Navigate to MCP servers: - - * In the sidebar, navigate to *Agentic* > *AI Gateway* > *Gateways*, select your gateway, then select the *MCP* tab. - -2. Configure server: -+ -[source,yaml] ----- -# PLACEHOLDER: Actual configuration format -name: database-server -url: https://mcp-database.example.com -authentication: - type: bearer_token - token: ${SECRET_REF} # Reference to secret -enabled_tools: - * execute_sql - * list_tables - * describe_table ----- - -3. Test connection: - - * Gateway attempts connection to MCP server - * Verifies authentication - * Retrieves tool list - -4. Enable server: - - * Server status: Active - * Tools available to agents - -Common MCP servers: - -* Database: PostgreSQL, MySQL, MongoDB query tools -* Filesystem: Read/write/search files -* API integrations: Slack, GitHub, Salesforce, Stripe -* Search: web search, vector search, enterprise search -* Code execution: Python, JavaScript sandboxes -* Workflow: Zapier, n8n integrations - -=== MCP server approval workflow - -Why approval is required: - -* Security: Prevent agents from accessing unauthorized systems -* Governance: Control which tools are available -* Cost: Some tools are expensive (API calls, compute) -* Compliance: Audit trail of approved tools - -Typical approval process: - -1. Request: User/team requests MCP server -2. Review: Admin reviews security, cost, necessity -3. Approval/Rejection: Admin decision -4. Configuration: If approved, admin adds server to gateway - -NOTE: The exact approval workflow may vary by organization. In some cases, admins may directly enable servers without a formal workflow. - -Rejected server behavior: - -* Server not listed in tool discovery -* Agent cannot query or invoke tools from this server -* Requests return `403 Forbidden` - -=== Restrict MCP server access - -Per-gateway restrictions: - -[source,yaml] ----- -# PLACEHOLDER: Actual configuration format -gateways: - - name: production-gateway - mcp_servers: - allowed: - - database-server # Only this server allowed - denied: - - filesystem-server # Explicitly denied - - - name: staging-gateway - mcp_servers: - allowed: - - "*" # All approved servers allowed ----- - - -Use cases: - -* Production gateway: Only production-safe tools -* Staging gateway: All tools for testing -* Customer-specific gateway: Only tools relevant to customer - -=== MCP server versioning - -Challenge: MCP server updates may change tool schemas. - -Best practices for version management: - -1. Pin versions (if supported): -+ -[source,yaml] ----- -mcp_servers: - * name: database-server - version: "1.2.3" # Pin to specific version ----- - -2. Test in staging first: - - * Update MCP server in staging gateway - * Test agent workflows - * Promote to production when validated - -3. Monitor breaking changes: - - * Subscribe to MCP server changelogs - * Set up alerts for schema changes - -== MCP observability - -=== Logs - -MCP tool invocations appear in request logs with: - -* Tool name -* MCP server -* Input parameters -* Output result -* Execution time -* Errors (if any) - -Filter logs by MCP: - -[source,text] ----- -Filter: request.path.startsWith("/mcp") ----- - - -Common log fields: - -[cols="1,2,2"] -|=== -| Field | Description | Example - -| Tool -| Tool invoked -| `execute_sql` - -| MCP Server -| Which server handled it -| `database-server` - -| Input -| Parameters sent -| `{"query": "SELECT ..."}` - -| Output -| Result returned -| `[{"id": 1, ...}]` - -| Latency -| Tool execution time -| `250ms` - -| Status -| Success/failure -| `200`, `500` -|=== - -=== Metrics - -The following MCP-specific metrics may be available depending on your gateway configuration: - -* MCP requests per second -* Tool invocation count (by tool, by MCP server) -* MCP latency (p50, p95, p99) -* MCP error rate (by server, by tool) -* Orchestrator execution count -* Orchestrator execution time - -Dashboard: MCP Analytics - -* Top tools by usage -* Top MCP servers by latency -* Error rate by MCP server -* Token savings from deferred loading - -=== Debug MCP issues - -Issue: "Tool not found" - -Possible causes: - -1. MCP server not added to gateway -2. Tool not enabled in MCP server configuration -3. Deferred loading enabled but agent didn't query for tool first - -Solution: - -1. Verify MCP server is active in the Redpanda Cloud console -2. Verify tool is in enabled_tools list -3. If deferred loading: Agent must call `search_tools` first - -Issue: "MCP server timeout" - -Possible causes: - -1. MCP server is down/unreachable -2. Tool execution is slow (for example, expensive database query) -3. Gateway timeout too short - -Solution: - -1. Check MCP server health -2. Optimize tool (for example, add database index) -3. Contact support if you need to adjust timeout limits - -Issue: "Orchestrator workflow failed" - -Possible causes: - -1. JavaScript syntax error -2. Tool invocation failed inside workflow -3. Timeout exceeded -4. Memory limit exceeded - -Solution: - -1. Test workflow syntax in JavaScript playground -2. Check logs for tool error inside orchestrator -3. Simplify workflow or increase timeout -4. Reduce data processing in workflow - -== Security considerations - -//// -=== Tool execution sandboxing - -// PLACEHOLDER: Confirm sandboxing implementation - -Orchestrator sandbox: - -* No file system access -* No network access (except via MCP tools) -* No system calls -* Memory limit: // PLACEHOLDER: for example, 128MB -* Execution timeout: // PLACEHOLDER: for example, 30s - -MCP tool execution: - -* Tools execute in MCP server's environment (not gateway) -* Gateway does not execute tool code (only proxies requests) -* Security is MCP server's responsibility -//// - -=== Authentication - -Gateway → MCP server: - -* Bearer token (most common) -* API key -* mTLS (for high-security environments) - -Agent → Gateway: - -* Standard gateway authentication (Redpanda Cloud token) -* Gateway endpoint URL identifies the gateway (and its approved MCP servers) - -=== Audit trail - -All MCP operations logged: - -* Who (agent/user) invoked tool -* When (timestamp) -* What tool was invoked -* What parameters were sent -* What result was returned -* Whether it succeeded or failed - -Use case: Compliance, security investigation, debugging - -=== Restrict dangerous tools - -Recommendation: Don't enable destructive tools in production gateways - -Examples of dangerous tools*: - -* File deletion (`delete_file`) -* Database writes without safeguards (`execute_sql` with UPDATE/DELETE) -* Payment operations (`charge_customer`) -* System commands (`execute_bash`) - -Best practice: - -* Read-only tools in production gateway -* Write tools only in staging gateway (with approval workflows) -* Wrap dangerous operations in MCP server with safeguards (for example, "require confirmation token") - -== MCP + LLM routing - -=== Combine MCP with CEL routing - -Use case: Route agents to different MCP servers based on customer tier - -CEL expression: - -[source,cel] ----- -request.headers["x-customer-tier"] == "enterprise" - ? "gateway-with-premium-mcp-servers" - : "gateway-with-basic-mcp-servers" ----- - - -Result: - -* Enterprise customers: Access to proprietary data, expensive APIs -* Basic customers: Access to public data, free APIs - -=== MCP with provider pools - -Scenario: Different agents use different models + different tools - -Configuration: - -* Gateway A: GPT-5.2 + database + CRM MCP servers -* Gateway B: Claude Sonnet + web search + analytics MCP servers - -Use case: Optimize model-tool pairing (some models better at certain tools) - -== Integration examples - -[tabs] -==== -Python (OpenAI SDK):: -+ --- -[source,python] ----- -from openai import OpenAI - -# Initialize client with MCP endpoint -client = OpenAI( - base_url=os.getenv("GATEWAY_ENDPOINT"), - api_key=os.getenv("REDPANDA_CLOUD_TOKEN"), - default_headers={ - "rp-aigw-mcp-deferred": "true" # Enable deferred loading - } -) - -# Discover tools -tools_response = requests.get( - f"{os.getenv('GATEWAY_ENDPOINT')}/mcp/tools", - headers={ - "Authorization": f"Bearer {os.getenv('REDPANDA_CLOUD_TOKEN')}", - "rp-aigw-mcp-deferred": "true" - } -) -tools = tools_response.json()["tools"] - -# Agent uses tools -response = client.chat.completions.create( - model="anthropic/claude-sonnet-4.5", - messages=[ - {"role": "user", "content": "Query the database for premium users"} - ], - tools=tools, # Pass MCP tools to agent - tool_choice="auto" -) - -# Handle tool calls -if response.choices[0].message.tool_calls: - for tool_call in response.choices[0].message.tool_calls: - # Execute tool via gateway - tool_result = requests.post( - f"{os.getenv('GATEWAY_ENDPOINT')}/mcp/tools/{tool_call.function.name}", - headers={ - "Authorization": f"Bearer {os.getenv('REDPANDA_CLOUD_TOKEN')}", - }, - json=json.loads(tool_call.function.arguments) - ) - - # Continue conversation with tool result - response = client.chat.completions.create( - model="anthropic/claude-sonnet-4.5", - messages=[ - {"role": "user", "content": "Query the database for premium users"}, - response.choices[0].message, - { - "role": "tool", - "tool_call_id": tool_call.id, - "content": json.dumps(tool_result.json()) - } - ] - ) ----- --- - -Claude Code CLI:: -+ --- -[source,bash] ----- -# Configure gateway with MCP -export CLAUDE_API_BASE="https://{CLUSTER_ID}.cloud.redpanda.com/ai-gateway/v1" -export ANTHROPIC_API_KEY="your-redpanda-token" - -# Claude Code automatically discovers MCP tools from gateway -claude code - -# Agent can now use aggregated MCP tools ----- --- - -LangChain:: -+ --- -[source,python] ----- -from langchain_openai import ChatOpenAI -from langchain.agents import initialize_agent, Tool - -# Initialize LLM with gateway -llm = ChatOpenAI( - base_url=os.getenv("GATEWAY_ENDPOINT"), - api_key=os.getenv("REDPANDA_CLOUD_TOKEN"), -) - -# Fetch MCP tools from gateway -# PLACEHOLDER: LangChain-specific integration code - -# Create agent with MCP tools -agent = initialize_agent( - tools=mcp_tools, - llm=llm, - agent="openai-tools", - verbose=True -) - -# Agent can now use MCP tools -response = agent.run("Find all premium users in the database") ----- --- -==== diff --git a/modules/ai-agents/pages/ai-gateway/overview.adoc b/modules/ai-agents/pages/ai-gateway/overview.adoc new file mode 100644 index 000000000..0b0e68e4c --- /dev/null +++ b/modules/ai-agents/pages/ai-gateway/overview.adoc @@ -0,0 +1,118 @@ += AI Gateway Overview +:description: AI Gateway is Redpanda's managed proxy for LLM APIs. Create a provider for OpenAI, Anthropic, Google Gemini, AWS Bedrock, or an OpenAI-compatible endpoint, and point your applications at a Redpanda-hosted URL with managed secrets, authentication, and observability. +:page-topic-type: overview +:personas: platform_admin, app_developer, evaluator +:page-aliases: ai-gateway/what-is-ai-gateway.adoc, ai-gateway/gateway-architecture.adoc, ai-gateway/cel-routing-cookbook.adoc, ai-gateway/mcp-aggregation-guide.adoc, ai-gateway/builders/discover-gateways.adoc +:learning-objective-1: Describe what AI Gateway is and how a managed proxy differs from direct upstream calls +:learning-objective-2: Explain how LLM providers, secrets, and OIDC authentication fit together in AI Gateway +:learning-objective-3: Identify use cases where AI Gateway fits, and use cases where it does not + +include::ai-agents:partial$adp-la.adoc[] + +AI Gateway is Redpanda's managed proxy for LLM APIs. Instead of giving every application a provider API key and letting it call the upstream directly, you create an *LLM provider* in Redpanda Cloud and point your applications at a Redpanda-hosted proxy URL. Redpanda handles the upstream credentials, forwards the request, and records usage. Your code continues to use the provider's native SDK. + +After reading this page, you will be able to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} +* [ ] {learning-objective-3} + +== The problem AI Gateway solves + +Teams adopting LLMs can quickly hit operational problems: + +* *Credential sprawl:* Every team that touches an LLM gets its own API key. Rotation is manual, offboarding is manual, and it's hard to know who's using what. +* *SDK lock-in and switching cost:* Each provider has its own SDK, auth scheme, and model catalog. Swapping OpenAI for Anthropic means a code change, not a configuration change. +* *No shared view of usage:* Provider dashboards tell you what a single API key spent. They don't tell you what your organization spent, broken down by team or application. + +== What AI Gateway gives you + +AI Gateway consolidates provider access behind the following capabilities. + +=== Centralized secrets + +The upstream API key (or AWS credentials for Bedrock) lives in the Redpanda secret store and is attached to the provider at configuration time. Your application never sees it; rotation happens in one place. + +=== A managed proxy URL per provider + +Every provider you create has its own URL of the form `/llm/v1/providers//`. Your application points its SDK at this URL instead of the upstream, continues to use the provider's native API, and authenticates to Redpanda with a short-lived OIDC access token. + +=== Native SDK compatibility + +Use the provider's own SDK: OpenAI, Anthropic, Google Gemini, AWS Bedrock, or any OpenAI-compatible client (vLLM, Ollama, LM Studio, LocalAI). AI Gateway does not require a single unified SDK; it forwards native requests to the native upstream. + +=== Managed authentication + +Applications authenticate to Redpanda with OIDC service accounts instead of long-lived provider API keys. Service accounts live in Redpanda Cloud IAM, follow the same role and audit model as every other resource, and mint short-lived tokens that are easy to revoke. + +=== Per-provider observability + +The provider's *Overview* tab in the Cloud UI records request and token counts so you can see what each provider is being used for. + +== What's in the UI + +// TODO: confirm the standalone ADP UI entry point (URL and sign-in flow) once it's published. + +The ADP UI has four top-level areas: + +* *LLM Providers*: Create, edit, enable, and delete providers. This is the home of AI Gateway configuration. +* *MCP Servers*: Register glossterm:MCP[] tool servers for agents. Separate from the AI Gateway proxy URL. +* *OAuth Providers*: Register OAuth providers for user-delegated flows (for example, GitHub or Google). +* *My Connections*: Per-user OAuth token management. + +The LLM Providers list is where you spend most of your time. The other three sit alongside and are covered by their own docs. + +== Supported providers + +AI Gateway supports the following provider types: + +[cols="1,3"] +|=== +|Type |Typical upstream + +|OpenAI +|OpenAI's API, including Azure OpenAI and other OpenAI-hosted endpoints through a custom base URL. + +|Anthropic +|Anthropic's API. Optionally forwards the client's `Authorization` header for enterprise and Max-plan subscription passthrough. + +|Google (Gemini) +|Google Generative Language API. + +|AWS Bedrock +|Foundation models hosted in AWS Bedrock, signed with SigV4 server-side. + +|OpenAI-compatible +|Any self-hosted endpoint that implements the OpenAI API surface (vLLM, Ollama, LM Studio, LocalAI, or similar). +|=== + +See xref:ai-agents:ai-gateway/configure-provider.adoc[] for the full form reference for each type. + +== When to use AI Gateway + +AI Gateway is a good fit when you want to: + +* Pull provider API keys out of application code and manage them centrally. +* Authenticate applications to LLMs using the same OIDC identity you use for other Redpanda Cloud resources. +* Run a self-hosted OpenAI-compatible endpoint (vLLM, Ollama, LM Studio) alongside 1P providers behind a single management plane. +* Separate operator and developer roles. Operators configure providers and credentials; developers point at proxy URLs. + +It is not the right fit when you: + +* Only ever call a single provider with a single API key and are happy managing that key inline. +* Need routing, failover, or cross-provider load balancing across providers. AI Gateway does not provide these capabilities. + +[[out-of-scope]] +== Out of scope + +AI Gateway does not provide these capabilities. For current status, consult the Redpanda Cloud release notes. + +* *Multi-provider routing, failover, and retries.* A synthetic provider that fans requests to multiple upstreams is not part of AI Gateway. +* *Spend limits.* Per-user, per-org, and global cost caps are not available. +* *Rate limits.* Requests-per-second, per-minute, or per-day caps are not available. +* *Managed MCP aggregation at the gateway.* Register MCP tool servers separately under *ADP* → *MCP Servers*. + +== Next steps + +. xref:ai-agents:ai-gateway/configure-provider.adoc[Configure an LLM provider]. Create your first provider and copy its proxy URL. +. xref:ai-agents:ai-gateway/connect-agent.adoc[Connect your agent]. Point your application's SDK at the proxy URL. diff --git a/modules/ai-agents/pages/ai-gateway/what-is-ai-gateway.adoc b/modules/ai-agents/pages/ai-gateway/what-is-ai-gateway.adoc deleted file mode 100644 index 300af5259..000000000 --- a/modules/ai-agents/pages/ai-gateway/what-is-ai-gateway.adoc +++ /dev/null @@ -1,194 +0,0 @@ -= What is an AI Gateway? -:description: Understand how AI Gateway keeps AI-powered apps highly available across providers and prevents runaway AI spend with centralized cost governance. -:page-topic-type: concept -:personas: evaluator, app_developer, platform_admin -:learning-objective-1: Explain how AI Gateway keeps AI-powered apps highly available through governed provider failover -:learning-objective-2: Describe how AI Gateway prevents runaway AI spend with centralized budget controls and tenancy-based governance -:learning-objective-3: Identify when AI Gateway fits your use case based on availability requirements, cost governance needs, and multi-provider or MCP tool usage - -include::ai-agents:partial$adp-la.adoc[] - -Redpanda AI Gateway keeps your AI-powered applications highly available and your AI spend under control. It sits between your applications and the LLM providers and AI tools they depend on. If a provider goes down, the gateway provides automatic failover to keep your apps running. It also offers centralized budget controls to prevent runaway costs. For platform teams, it adds governance at the model-fallback level, tenancy modeling for teams, individuals, apps, and service accounts, and a single proxy layer for both LLM models and glossterm:MCP server[,MCP servers]. - -== The problem - -Modern AI applications face two business-critical challenges: staying up and staying on budget. - -First, applications typically hardcode provider-specific SDKs. An application using OpenAI's SDK cannot easily switch to Anthropic or Google without code changes and redeployment. When a provider hits rate limits, suffers an outage, or degrades in performance, your application goes down with it. Your end users don't care which provider you use; they care that the app works. - -Second, costs can spiral without centralized controls. Without a single view of token consumption across teams and applications, it's difficult to attribute costs to specific customers, features, or environments. Testing and debugging can generate unexpected bills, and there's no way to enforce budgets or rate limits per team, application, or service account. The result: runaway spend that finance discovers only after the fact. - -These two challenges are compounded by fragmented observability across provider dashboards, which makes it harder to detect availability issues or cost anomalies in time to act. And as organizations adopt glossterm:AI agent[,AI agents] that call glossterm:MCP tool[,MCP tools], the lack of centralized tool governance adds another dimension of uncontrolled cost and risk. - -== What AI Gateway solves - -Redpanda AI Gateway delivers two core business outcomes, high availability and cost governance, backed by platform-level controls that set it apart from simple proxy layers. - -=== High availability through governed failover - -Your end users don't care whether you use OpenAI, Anthropic, or Google: they care that your app stays up. AI Gateway lets you configure provider pools with automatic failover, so when your primary provider hits rate limits, times out, or returns errors, the gateway routes requests to a fallback provider with no code changes and no downtime for your users. - -Unlike simple retry logic, AI Gateway provides governance at the failover level: you define which providers fail over to which, under what conditions, and with what priority. This controlled failover can significantly improve uptime even during extended provider outages. - -=== Cost governance and budget controls - -AI Gateway gives you centralized fiscal control over AI spend. Set monthly budget caps for each gateway, enforce them automatically, and set rate limits per team, environment, or application. No more runaway costs discovered after the fact. - -You can route requests to different models based on user attributes. For example, to direct premium users to a more capable model while routing free tier users to a cost-effective option, use a CEL expression. For example: - -[source,cel] ----- -// Route premium users to best model, free users to cost-effective model -request.headers["x-user-tier"] == "premium" - ? "anthropic/claude-opus-4.6" - : "anthropic/claude-sonnet-4.5" ----- - -You can also set different rate limits and spend limits for each environment to prevent staging or development traffic from consuming production budgets. - -=== Tenancy and access governance - -AI Gateway provides multi-tenant isolation by design. Create separate gateways for teams, individual developers, applications, or service accounts, each with their own budgets, rate limits, routing policies, and observability scope. This tenancy model lets platform teams govern who uses what, how much they spend, and which models and tools they can access, without building custom authorization layers. - -=== Unified LLM access (single endpoint for all providers) - -AI Gateway provides a single OpenAI-compatible endpoint that routes requests to multiple LLM providers. Instead of integrating with each provider's SDK separately, you configure your application once and switch providers by changing only the model parameter. - -Without AI Gateway, you need different SDKs and patterns for each provider: - -[source,python] ----- -# OpenAI -from openai import OpenAI -client = OpenAI(api_key="sk-...") -response = client.chat.completions.create( - model="gpt-5.2", - messages=[{"role": "user", "content": "Hello"}] -) - -# Anthropic (different SDK, different patterns) -from anthropic import Anthropic -client = Anthropic(api_key="sk-ant-...") -response = client.messages.create( - model="claude-sonnet-4.5", - max_tokens=1024, - messages=[{"role": "user", "content": "Hello"}] -) ----- - -With AI Gateway, you use the OpenAI SDK for all providers: - -[source,python] ----- -from openai import OpenAI - -# Single configuration, multiple providers -client = OpenAI( - base_url="", - api_key="your-redpanda-token", -) - -# Route to OpenAI -response = client.chat.completions.create( - model="openai/gpt-5.2", - messages=[{"role": "user", "content": "Hello"}] -) - -# Route to Anthropic (same code, different model string) -response = client.chat.completions.create( - model="anthropic/claude-sonnet-4.5", - messages=[{"role": "user", "content": "Hello"}] -) - -# Route to Google Gemini (same code, different model string) -response = client.chat.completions.create( - model="google/gemini-2.0-flash", - messages=[{"role": "user", "content": "Hello"}] -) ----- - -To switch providers, you change only the `model` parameter from `openai/gpt-5.2` to `anthropic/claude-sonnet-4.5`. No code changes or redeployment needed. - -=== Proxy for LLM models and MCP servers - -AI Gateway acts as a single proxy layer for both LLM model requests and MCP servers. For LLM traffic, it provides a unified endpoint. For AI agents that use MCP tools, it aggregates multiple MCP servers and provides deferred tool loading, which dramatically reduces token costs. - -Without AI Gateway, agents typically load all available MCP tools from multiple MCP servers at startup. This approach sends 50+ tool definitions with every request, creating high token costs (thousands of tokens per request), slow agent startup times, and no centralized governance over which tools agents can access. - -With AI Gateway, you configure approved MCP servers once, and the gateway loads only search and orchestrator tools initially. Agents query for specific tools only when needed, which often reduces token usage by 80-90% depending on your configuration and the number of tools aggregated. You also gain centralized approval and governance over which MCP servers your agents can access. - -For complex workflows, AI Gateway provides a JavaScript-based orchestrator tool that reduces multi-step workflows from multiple round trips to a single call. For example, you can create a workflow that searches a vector database and, if the results are insufficient, falls back to web search—all in one orchestration step. - -=== Unified observability and cost tracking - -AI Gateway provides a single dashboard that tracks all LLM traffic across providers, eliminating the need to switch between multiple provider dashboards. - -The dashboard tracks request volume for each gateway, model, and provider, along with token usage for both prompt and completion tokens. You can view estimated spend per model with cross-provider comparisons, latency metrics (p50, p95, p99), and errors broken down by type, provider, and model. - -This unified view helps you answer critical questions such as which model is the most cost-effective for your use case, why a specific user request failed, how much your staging environment costs each week, and what the latency difference is between providers for your workload. - -ifdef::ai-hub-available[] -== Gateway modes - -AI Gateway supports two modes to accommodate different organizational needs: - -*AI Hub Mode* provides zero-configuration access with pre-configured backend pools and intelligent routing. Platform admins simply add provider credentials (OpenAI, Anthropic, Google Gemini), and all teams immediately benefit from 17 routing rules and 6 backend pools. Users can toggle preferences like vision routing or long-context routing, but the underlying architecture is managed by Redpanda. This mode eliminates the complexity of LLM gateway configuration. IT adds API keys once, and all teams benefit immediately. - -*Custom Mode* provides full control over routing rules, backend pools, rate limits, and policies. Admins configure every aspect of the gateway to meet specific requirements. This mode is ideal when you need custom routing logic based business rules, specific failover behavior, or integration with custom infrastructure like Azure OpenAI or AWS Bedrock. - -To understand which mode fits your use case, see xref:ai-gateway/gateway-modes.adoc[]. -endif::[] - -== Common gateway patterns - -Some common patterns for configuring gateways include: - -* *Team isolation*: When multiple teams share infrastructure but need separate budgets and policies, create one gateway for each team. For example, you might configure Team A's gateway with a $5K/month budget for both staging and production environments, while Team B's gateway has a $10K/month budget with different rate limits. Each team sees only their own traffic in the observability dashboards, providing clear cost attribution and isolation. -* *Environment separation*: To prevent staging traffic from affecting production metrics, create separate gateways for each environment. Configure the staging gateway with lower rate limits, restricted model access, and aggressive cost controls to prevent runaway expenses. The production gateway can have higher rate limits, access to all models, and alerting configured to detect anomalies. -* *Primary and fallback for reliability*: To ensure uptime during provider outages, configure provider pools with automatic failover. For example, you can set OpenAI as your primary provider (preferred for quality) and configure Anthropic as the fallback that activates when the gateway detects rate limits or timeouts from OpenAI. Monitor the fallback rate to detect primary provider issues early, before they impact your users. -* *A/B testing models*: To compare model quality and cost without dual integration, route a percentage of traffic to different models. For example, you can send 80% of traffic to `claude-sonnet-4.5` and 20% to `claude-opus-4.6`, then compare quality metrics and costs in the observability dashboard before adjusting the split. -* *Customer-based routing*: For SaaS products with tiered pricing (for example, free, pro, enterprise), use CEL routing based on request headers to match users with appropriate models: - -=== Customer-based routing - -For SaaS products with tiered pricing (for example, free, pro, enterprise), use CEL routing based on request headers to match users with appropriate models: - -[source,cel] ----- -request.headers["x-customer-tier"] == "enterprise" ? "anthropic/claude-opus-4.6" : -request.headers["x-customer-tier"] == "pro" ? "anthropic/claude-sonnet-4.5" : -"anthropic/claude-haiku" ----- - -== When to use AI Gateway - -AI Gateway is ideal for organizations that: - -* Use or plan to use multiple LLM providers -* Need centralized cost tracking and budgeting -* Want to experiment with different models without code changes -* Require high availability during provider outages -* Have multiple teams or customers using AI services -* Build AI agents that need MCP tool aggregation -* Need unified observability across all AI traffic - -AI Gateway may not be necessary if: - -* You only use a single provider with simple requirements -* You have minimal AI traffic (< 1000 requests/day) -* You don't need cost tracking or policy enforcement -* Your application doesn't require provider switching - -== Next steps - -* xref:ai-gateway/gateway-quickstart.adoc[Gateway Quickstart] - Get started quickly with a basic gateway setup - -*For Administrators:* - -* xref:ai-gateway/admin/setup-guide.adoc[Setup Guide] - Enable providers, models, and create gateways -* xref:ai-gateway/gateway-architecture.adoc[Architecture Deep Dive] - Technical architecture details - -*For Builders:* - -* xref:ai-gateway/builders/discover-gateways.adoc[Discover Available Gateways] - Find which gateways you can access -* xref:ai-gateway/builders/connect-your-agent.adoc[Connect Your Agent] - Integrate your application