Skip to content

Add Keycloak authentication to smartem app#74

Open
vredchenko wants to merge 6 commits into
mainfrom
64-keycloak-auth-v2
Open

Add Keycloak authentication to smartem app#74
vredchenko wants to merge 6 commits into
mainfrom
64-keycloak-auth-v2

Conversation

@vredchenko
Copy link
Copy Markdown
Collaborator

@vredchenko vredchenko commented Mar 25, 2026

Summary

End-to-end Keycloak authentication for the smartem app:

  • keycloak-js-based AuthProvider, useAuth() hook, automatic token-refresh scheduling
  • Shared Axios interceptor in @smartem/api attaches Authorization: Bearer <token> to every API call when authenticated
  • AuthGate hard-blocks the dashboard until Keycloak has confirmed authentication — unauthenticated visitors only see a sign-in screen, never the app contents
  • Header swaps the prototyping RoleSwitcher for real auth controls (sign-in icon / account menu with sign-out)
  • Disabled by default in dev/mock mode (VITE_AUTH_ENABLED=false or VITE_ENABLE_MOCKS=true), enabled in production builds

Closes #64. Supersedes #70.

Configuration

Defaults to the DLS test realm; apps/smartem/.env.example:

VITE_KEYCLOAK_URL=https://identity-test.diamond.ac.uk
VITE_KEYCLOAK_REALM=dls
VITE_KEYCLOAK_CLIENT_ID=SmartEM
VITE_AUTH_ENABLED=false

The in-code fallback in config.ts points at production (identity.diamond.ac.uk) so a build without env vars set doesn't silently point at the test realm. Helpdesk UASHD-4189 registered the SmartEM client against the dls realm on both identity.diamond.ac.uk and identity-test.diamond.ac.uk.

For local development without DLS identity access, see DiamondLightSource/smartem-devtools#198 — a self-contained Keycloak mock with the same realm/client/PKCE config.

Fixes that landed during E2E shakeout

While exercising the flow end-to-end against the local mock from #198, three issues surfaced and got fixed on this branch:

  1. Silent SSO via hidden iframe (557edf6) — added silentCheckSsoRedirectUri + a public silent-check-sso.html that posts the parsed URL back to the parent. Without it, check-sso did a top-level redirect on every load.
  2. Init-failure path no-op login (557edf6) — the catch handler was building Auth from defaultAuth, so auth.login() did nothing if init failed. Now built from the live keycloak instance.
  3. 5-second redirect loop (6b02a4a) — keycloak-js defaults checkLoginIframe: true, polling Keycloak every 5s via a hidden iframe. Modern browsers block third-party cookies, so the iframe can never read Keycloak's session cookie; it reports "session changed" each tick, and keycloak-js translates that into a top-level redirect. The companion option silentCheckSsoFallback (default true) caused the initial check to fall back to a top-level redirect when the iframe couldn't see the session. Both disabled.
  4. Hard auth gate (2ee26bd) — AuthGate previously rendered children regardless of auth state, so unauthenticated users saw the dashboard with a sign-in button in the corner. Now wraps children in an AuthBoundary that renders nothing while init is in flight, a centred sign-in screen when unauthenticated, and the app when authenticated. No-op when VITE_AUTH_ENABLED=false.

Scope

Frontend auth ceremony only. The SPA authenticates with Keycloak directly and attaches tokens to API requests. Backend validation is a separate change in DiamondLightSource/smartem-decisions (the JWT-validation PR is in flight).

Route-level RBAC and per-feature permission gating are deferred to a follow-up.

Test plan

  • npm run dev:smartem:mock — app loads normally, no auth UI, mock data visible
  • VITE_AUTH_ENABLED=true npm run dev:smartem against the local Keycloak mock — sign-in screen renders pre-login, full app renders after login
  • Header shows sign-in icon pre-auth, account menu (with user identity) post-auth, sign-out returns to the sign-in screen
  • No #error=login_required redirect loop on idle pages
  • Bearer token is attached to outgoing API requests once authenticated
  • npm run typecheck passes (CI green on every push)
  • npm run build:smartem succeeds (CI green on every push)

@vredchenko vredchenko added the development New features or functionality implementation label Mar 25, 2026
@vredchenko vredchenko marked this pull request as draft March 25, 2026 21:37
@vredchenko vredchenko force-pushed the 64-keycloak-auth-v2 branch 2 times, most recently from ceeb4cb to 98776aa Compare April 1, 2026 12:28
@vredchenko
Copy link
Copy Markdown
Collaborator Author

@vredchenko vredchenko force-pushed the 64-keycloak-auth-v2 branch from 98776aa to 6f5a425 Compare April 20, 2026 09:06
@vredchenko vredchenko force-pushed the 64-keycloak-auth-v2 branch 2 times, most recently from e24e6b3 to 8d42d2c Compare April 29, 2026 14:33
Add keycloak-js auth infrastructure with AuthProvider, useAuth() hook,
and automatic token refresh. Wire tokens into the shared Axios
interceptor so all API calls include Bearer headers when authenticated.
Replace the prototyping RoleSwitcher in the Header with real auth
controls (sign in / account menu / sign out). Auth is disabled by
default in dev/mock mode.
Content recycled to smartem-frontend #39 (tier 3 code examples and
re-evaluation comment) and smartem-devtools PR #160 (NFR suggestions).
The file analysed the legacy app and is superseded by the curated
roadmap in issue #39.
@vredchenko vredchenko force-pushed the 64-keycloak-auth-v2 branch from 8d42d2c to f95be88 Compare May 12, 2026 11:07
Helpdesk UASHD-4189 registered the SmartEM app against the dls realm
on identity.diamond.ac.uk (prod) and identity-test.diamond.ac.uk
(test) with client ID SmartEM. Replaces the placeholder master /
smartem-frontend values from initial scaffolding.

.env.example defaults to the test environment for local dev; the
in-code fallback in config.ts stays on prod so a production build
without env vars set doesn't silently point at the test realm.
@vredchenko
Copy link
Copy Markdown
Collaborator Author

Review notes — local dev exercise of the auth flow

I took the branch for a spin against the DLS identity-test realm and then against a locally-run Keycloak. Three things came up that are worth addressing before this merges.

1. Bug: init failure permanently bricks the login button

In apps/smartem/src/auth/AuthProvider.tsx, the init error path spreads defaultAuth, whose login and logout are no-ops:

const defaultAuth: Auth = {
  initialised: false,
  authenticated: false,
  login: () => {},      // no-op
  logout: () => {},     // no-op
  getToken: () => '',
}

// …

keycloak
  .init({ onLoad: 'check-sso' })
  .then(() => setAuth(buildAuth(keycloak)))
  .catch((err) => {
    console.error('Keycloak init failed:', err)
    setAuth({ ...defaultAuth, initialised: true, error: 'Failed to connect to Keycloak' })
  })

The keycloak instance is alive in the closure with working .login() / .logout() methods, but the context never receives them. Any transient init failure — iframe blocked by CORS, network blip, Keycloak slow to respond, misconfigured Web Origins — leaves the user with a Sign in button that does nothing until they reload (and reload doesn't help if the failure is persistent).

I hit this against identity-test: the silent-SSO iframe returned 403 because http://localhost:5173 isn't in the SmartEM client's Web Origins. The catch ran, the button rendered, clicking it called the no-op login().

Suggested fix — use the live keycloak instance so login can still redirect:

.catch((err) => {
  console.error('Keycloak init failed:', err)
  setAuth({
    ...buildAuth(keycloak),
    initialised: true,
    error: 'Failed to connect to Keycloak',
  })
})

keycloak.login() does a full-page redirect and doesn't depend on the silent-SSO iframe, so it works even when the init handshake failed.

2. Design issue: onLoad: 'check-sso' without a silent-SSO HTML page

keycloak.init({ onLoad: 'check-sso' }) is called with no silentCheckSsoRedirectUri. Per the keycloak-js docs, this means the silent-SSO check falls back to a top-level redirect to Keycloak when the iframe approach isn't viable.

Combined with React.StrictMode (which double-mounts effects in dev), the result is a redirect storm: page → Keycloak /auth?… → page (with #error=login_required) → effect re-runs → page → Keycloak → page → … The app never gets a chance to render the header, so the user can't reach the Sign in button at all.

I observed this against the local Keycloak in this exercise — three full reload cycles in 5s, header never appears.

Suggested fix — add a static apps/smartem/public/silent-check-sso.html:

<!doctype html>
<html><body><script>
  parent.postMessage(location.href, location.origin)
</script></body></html>

and pass its URL in init:

keycloak.init({
  onLoad: 'check-sso',
  silentCheckSsoRedirectUri: `${window.location.origin}/silent-check-sso.html`,
  pkceMethod: 'S256',
})

This keeps the silent-SSO check inside an iframe (no top-level redirect), so the StrictMode double-mount is harmless. Worth also adding pkceMethod: 'S256' explicitly — the request currently uses it but only because the library defaults are kind.

3. Documentation gap: no local-dev setup

The README doesn't mention Keycloak. apps/smartem/.env.example points at identity-test.diamond.ac.uk, but the SmartEM client there doesn't have http://localhost:5173 in its Web Origins or Valid Redirect URIs, so out-of-the-box npm run dev:smartem fails immediately with a 403 on the silent-SSO iframe.

smartem-devtools/docs/architecture/keycloak-spa-authentication.md describes the design but predates this PR and refers to a smartem-frontend client; the implementation uses SmartEM. Worth aligning the names.

To unblock local dev I drafted a self-contained Keycloak mock in smartem-devtools/keycloak-mock/:

  • docker-compose.yml — Keycloak 26 in start-dev --import-realm mode, port 8080
  • realm/dls-realm.json — realm dls, public client SmartEM with PKCE, redirect URIs and Web Origins for localhost:5173 and :5174, a custom fedId claim mapper, two seeded users

Devs point VITE_KEYCLOAK_URL=http://localhost:8080 in .env.local, docker compose up -d, done.

This pairs with a one-line README section. Happy to fold it into this PR or land it as a follow-up.

Verifying the fixes

I patched issues 1 and 2 locally (reverted in current state) and ran the full flow against the local Keycloak mock: app loads → Sign in → redirect to localhost:8080/realms/dls/... → submit credentials → redirect back with auth code → tokens exchanged → header shows account menu with the logged-in user. End-to-end works once both fixes are applied.

Add silent-check-sso.html so keycloak-js performs the session check in a
hidden iframe instead of a top-level redirect, preventing the redirect
storm caused by React StrictMode double-mounting the init effect.

Use the live keycloak instance in the init error path so login() still
triggers a full-page redirect even when the silent-SSO handshake fails.
Adding `silentCheckSsoRedirectUri` covered the initial check-sso, but
keycloak-js still defaults `checkLoginIframe: true` - a hidden iframe that
polls Keycloak every 5 seconds for session state. In modern browsers third-
party cookies are blocked, so that iframe can never read Keycloak's session
cookie; it reports "session changed" each tick, which keycloak-js translates
into a top-level redirect. Result: the SPA bounces between `/` and
`/#error=login_required` every five seconds.

The companion option `silentCheckSsoFallback` (default true) made the same
thing happen for the initial check: if the silent iframe couldn't see the
session cookie, keycloak-js fell back to a top-level redirect. Combined
with React StrictMode's double-mount of the AuthProvider effect, the
fallback produced an immediate two-redirect storm even before the polling
iframe kicked in.

Disable both:
  - `silentCheckSsoFallback: false` keeps the initial check confined to
    the iframe; if that fails, the user just sees the sign-in button and
    has to click it (rather than getting redirect-stormed).
  - `checkLoginIframe: false` stops the post-init polling entirely. We
    lose cross-tab logout detection from this path, but the token-refresh
    timer already detects an invalidated session on its next tick.

Verified end-to-end against the local Keycloak mock: 0 spurious redirects
during settle, click → KC login → return → authenticated state with the
account menu visible.
`AuthGate` was previously a pure pass-through wrapper around `AuthProvider`
- it set up the auth context but always rendered children, regardless of
auth state. Unauthenticated users could see the entire dashboard with just
a "Sign in" button in the header.

Add an `AuthBoundary` inner component that gates rendering on the resolved
auth state:

  - `!auth.initialised`: render nothing. Keycloak init is in flight; better
    to show a brief blank than flash either the sign-in screen or the app.
  - `!auth.authenticated`: render `SignInScreen` - a centred MUI Box with
    the app title, a one-line prompt, and a contained "Sign in" button
    that calls `auth.login()`. If `auth.error` is populated (init failed,
    refresh failed, etc.) it is surfaced beneath the button in `error.main`
    colour.
  - authenticated: render children as before.

When `VITE_AUTH_ENABLED=false` the gate is a no-op (the early return at the
top of `AuthGate` is unchanged), so mock-mode UI work is unaffected.

Verified end-to-end against the local Keycloak mock: pre-login the only
visible content is the sign-in screen; clicking through to Keycloak and
back unlocks the dashboard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development New features or functionality implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement user authentication against Keycloak

1 participant