Decide how SmartEM Agents authenticate against the JWT-protected API

## Context

`feat/keycloak-jwt-validation` (currently a pushed branch awaiting PR) makes the SmartEM Decisions backend require an `Authorization: Bearer <jwt>` on every non-exempt endpoint, validated offline against the configured Keycloak realm's JWKS. The SmartEM frontend handles this via the standard OIDC auth-code + PKCE flow, gated by `silent-check-sso.html` so it doesn't redirect-loop on third-party-cookie blocking.

The agent doesn't fit that flow. It runs unattended on Windows EPU workstations, ingests EPU filesystem output, and POSTs to the backend over HTTP/REST. There is no browser, no human, no way to do an interactive consent. Once the backend is auth-required in staging/production, the existing agent immediately breaks unless we give it a service-to-service auth path.

We need to pick that path now so the rollout from the JWT PR doesn't get blocked when it reaches an environment that has agents pointing at it.

## Options

| Option | What it is | Pros | Cons |
|---|---|---|---|
| **Client credentials grant** (recommended) | Dedicated Keycloak client (e.g. `SmartEM-agent`) with `serviceAccountsEnabled: true` and a client secret. Agent POSTs to `/protocol/openid-connect/token` with `grant_type=client_credentials`, gets a JWT, uses it. | Standard OAuth2 service-to-service pattern. Zero backend changes - the existing `verify_token` already accepts any RS256 token from the realm. Same JWKS rotation story as user tokens. Easy to revoke or rotate. | Shared secret needs distribution to agent operators. One secret per agent population is the simple version; per-agent secrets would mean realm churn. |
| **JWT client authentication** | Same shape as above but the agent authenticates to Keycloak with a signed JWT (private key on the agent) instead of a shared secret. | Stronger than shared secret. No secret in transit. Per-agent keys are natural. | Agent has to manage a private key. Setup overhead. Likely the right next step *after* client credentials is shipped. |
| **mTLS** | Mutual TLS terminated at the ingress; backend trusts the cert subject. | Strongest. No token mechanics inside the agent. | k3s/ingress config, cert lifecycle, doesn't compose with our JWT-validation code path - we'd be running two parallel auth systems. |
| **Static API key** | Backend accepts a long-lived secret in a custom header. | Trivial to implement on both ends. | Reinvents auth. No rotation story. Parallel auth path means more code, more attack surface. |
| **Exempt agent endpoints** | Add `/agents/...` and friends to `EXEMPT_PATHS`; rely on network policies. | No code. | "Internal-only" tends not to stay that way. Hard to enforce on shared k8s clusters. Loses any per-agent attribution. |

## Recommendation

**Client credentials**, with JWT client auth as a planned future hardening once the basic flow is in production.

Rationale:
- Reuses the JWT validation already on the backend; no parallel auth path.
- Tokens carry `azp` (authorized party) = the agent client ID, so the backend can later split agent vs user permissions without changing the auth mechanism.
- Shared secret is good enough for first ship: agents are deployed by DLS infra into trusted hosts; the secret never leaves a controlled deployment.
- JWT client auth is a drop-in upgrade later - same flow, different `client_authenticator_type` in Keycloak.

## Concrete work (sketch, for a future implementation issue)

**Keycloak realm config (both mock and DLS realms):**
- Add a `SmartEM-agent` client with `publicClient: false`, `serviceAccountsEnabled: true`, `directAccessGrantsEnabled: false`, `standardFlowEnabled: false`, `clientAuthenticatorType: "client-secret"`.
- Optional: a `smartem-agent` realm role assigned to the client's service account, so later we can authorize "only agents can write data" if we want it.
- For the mock at `smartem-devtools/keycloak-mock/dls-realm.json`: hard-code a dev secret like `dev-agent-secret` so the agent can self-configure from `.env.local`.

**Backend (`smartem-decisions/src/smartem_backend/auth.py`):**
- Already accepts service-account tokens as-is.
- One small hardening: optional `KEYCLOAK_ALLOWED_AZP` env var (comma-separated). When set, `verify_token` checks the `azp` claim is in the allowlist (e.g. `SmartEM,SmartEM-agent`). Default empty -> no check, current behaviour.

**Agent (`smartem-decisions/src/smartem_agent/...`):**
- A `KeycloakClient` class that reads `KEYCLOAK_URL`, `KEYCLOAK_REALM`, `AGENT_CLIENT_ID`, `AGENT_CLIENT_SECRET` from env/config, calls the token endpoint with `grant_type=client_credentials`, caches the access token, and refreshes when within ~30s of `exp` (no refresh-token flow for client_credentials - just re-request).
- The existing requests-based HTTP layer in the agent gets an `auth` callable that injects `Authorization: Bearer <token>` on every request and, on a 401, forces one token refresh + retry.
- `agent.exe` config schema gains the four `KEYCLOAK_*` keys.

## Open questions for whoever picks this up

- One shared `SmartEM-agent` client for all agent instances, or one client per workstation? Shared is simpler for v1; per-workstation gives finer attribution and revocation but requires realm management tooling.
- Do we want `azp` enforcement on the backend from day one, or ship without it and add later? Adding later is non-breaking.
- Should the agent fall back to "no auth" when `AGENT_CLIENT_SECRET` is unset (for local dev parity with the current behaviour), or hard-fail at startup? Hard-fail is safer; fall-back is more ergonomic.

## Out of scope for this issue

- Choosing between users having read-only vs full access; agents having write access vs full access. That's an authorization (RBAC) question, separate from authentication.
- mTLS at the ingress - documented above as an alternative, not pursued.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decide how SmartEM Agents authenticate against the JWT-protected API #284

Context

Options

Recommendation

Concrete work (sketch, for a future implementation issue)

Open questions for whoever picks this up

Out of scope for this issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Option	What it is	Pros	Cons
Client credentials grant (recommended)	Dedicated Keycloak client (e.g. `SmartEM-agent`) with `serviceAccountsEnabled: true` and a client secret. Agent POSTs to `/protocol/openid-connect/token` with `grant_type=client_credentials`, gets a JWT, uses it.	Standard OAuth2 service-to-service pattern. Zero backend changes - the existing `verify_token` already accepts any RS256 token from the realm. Same JWKS rotation story as user tokens. Easy to revoke or rotate.	Shared secret needs distribution to agent operators. One secret per agent population is the simple version; per-agent secrets would mean realm churn.
JWT client authentication	Same shape as above but the agent authenticates to Keycloak with a signed JWT (private key on the agent) instead of a shared secret.	Stronger than shared secret. No secret in transit. Per-agent keys are natural.	Agent has to manage a private key. Setup overhead. Likely the right next step after client credentials is shipped.
mTLS	Mutual TLS terminated at the ingress; backend trusts the cert subject.	Strongest. No token mechanics inside the agent.	k3s/ingress config, cert lifecycle, doesn't compose with our JWT-validation code path - we'd be running two parallel auth systems.
Static API key	Backend accepts a long-lived secret in a custom header.	Trivial to implement on both ends.	Reinvents auth. No rotation story. Parallel auth path means more code, more attack surface.
Exempt agent endpoints	Add `/agents/...` and friends to `EXEMPT_PATHS`; rely on network policies.	No code.	"Internal-only" tends not to stay that way. Hard to enforce on shared k8s clusters. Loses any per-agent attribution.

Decide how SmartEM Agents authenticate against the JWT-protected API #284

Description

Context

Options

Recommendation

Concrete work (sketch, for a future implementation issue)

Open questions for whoever picks this up

Out of scope for this issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions