+
+
+
+
+ Scope
+
+ In scope
+
+ - One outbound POST to a Teams Workflows webhook URL per relevant event, fired at the end of the existing hourly Anthropic cost sync (
/api/sync/anthropic-api-costs).
+ - Three Adaptive Card v1.4 templates: hourly digest, threshold-breach (80% / 100% / 120%), forecast at-risk (state transition
on_track → at_risk or back).
+ - New table
anthropic_alert_state for idempotency — alerts are edge-triggered, not level-triggered.
+ - Stale-data guard: if the sync is stale, post a single stale-data warning instead of the digest.
+ - Env-driven enable/disable. No UI changes in v1.
+ - Tests: unit for the renderer, integration for the evaluator + state machine, manual test for an end-to-end webhook POST against a real channel.
+
+
+ Out of scope (deferred)
+
+ - Bot Framework / Universal Action
refresh / auto-refreshing cards.
+ - Per-user Activity Feed pings.
+ - Teams personal tab embedding the dashboard.
+ - Settings UI to toggle Teams posting, mute card types, or test a webhook (Phase 1.5).
+ - Cards beyond the three above — see
specs/030-claude-spend-teams-alerts/mockup.html for the visual contract and the "explore additional information" thread for backlog items.
+ - Cross-product cards (Copilot, invoices, licenses) — deferred to a later spec.
+
+
+
+ Freshness ceiling
+ The sync runs hourly. "Real-time" here means "seconds after the hourly cron lands." Anthropic does not expose
+ webhooks for billing/usage events, so polling is the only architecture available.
+
+
+
+ Architecture
+
+ The integration is an additive post-step on the existing sync. No new cron, no new route, no new auth.
+
+ ┌──────────────────────────────────────────────────────────────────────────────────┐
+│ GET /api/sync/anthropic-api-costs (Vercel Cron, hourly) │
+│ └─ makeCronSyncRoute(run, "Anthropic API costs") [existing] │
+│ └─ requireCronSecret(req) [existing] │
+│ └─ run() [existing, +1 step] │
+│ ├─ withSyncLock(...) → fetch Anthropic → upsert anthropic_workspace_costs │
+│ ├─ on success → stamp anthropic_sync_status [existing] │
+│ └─ on success → evaluateAndPostTeamsAlerts() [NEW, try/catch] │
+│ ├─ if !env.TEAMS_WEBHOOK_URL → return [no-op] │
+│ ├─ loadSyncStatus() [NEW · queries.ts] │
+│ ├─ loadDashboardKpis(month) [NEW · queries.ts] │
+│ ├─ loadWorkspaceList() [NEW · queries.ts] │
+│ ├─ forecastWorkspaceMonth(ws, month) [NEW · per workspace] │
+│ ├─ diff against anthropic_alert_state [NEW table] │
+│ ├─ build card payload(s): │
+│ │ • hourly digest (always, unless stale) │
+│ │ • threshold breach (per workspace × threshold, 1×/month) │
+│ │ • forecast at-risk (per workspace, on edge transition) │
+│ │ • stale-data warning (if isStale) │
+│ └─ for each payload: postCard(env.TEAMS_WEBHOOK_URL, payload) │
+│ └─ retryWithBackoff() — handle 429 / 5xx │
+└──────────────────────────────────────────────────────────────────────────────────┘
+ │
+ ▼
+ https://prod-XX..logic.azure.com/...
+ (Workflows incoming webhook)
+ │
+ ▼
+ Microsoft Teams channel: Claude Spend Alerts
+
+──── Data layer (rev 2) ──────────────────────────────────────────────────────────
+ src/lib/anthropic/queries.ts ← auth-free pure DB queries (NEW)
+ │
+ ├──→ called directly by the evaluator (no session, no admin gate)
+ └──→ wrapped by src/actions/anthropic-global.ts for admin UI
+ (requireAdmin + unstable_cache stay there, but no DB code below them)
+
+ Design choices
+
+ - Piggyback the sync, don't add a cron. Same freshness, one less moving part, no new auth.
+ - Alert evaluation is non-blocking. Wrapped in
try/catch; a failure logs but does not flip the sync to partial/failed. The sync's job is data integrity; alerts are a side effect.
+ - Edge-triggered alerts. A workspace stuck at 110% does not spam the channel every hour — it posts once when it first crosses, and again only if it drops < 80% and rises again, or crosses a higher threshold.
+ - Workflows webhook only — no bot, no Entra app. Confirmed:
Action.Execute, Action.Submit, and refresh all require a bot. We use Action.OpenUrl exclusively.
+
+
+
+ Data model
+
+ One new table. No new enums (the original tier enum was dropped — see rev 2 below). No changes to existing tables.
+
+ New table: anthropic_alert_state rev 2
+
+ One row per (workspace_id, billing_month) — the idempotency ledger. Each threshold has a nullable
+ *_fired_at timestamp: NULL means "never fired this month", non-NULL means "already fired, don't
+ fire again". Forecast tracking is stored as a boolean + timestamp pair on the same row (it's edge-triggered;
+ thresholds are not).
+
+
+src/lib/db/schema.ts (edit)// No new enum — the previous threshold enum is replaced by three nullable timestamp columns.
+
+export const anthropicAlertState = pgTable("anthropic_alert_state", {
+ id: serial("id").primaryKey(),
+
+ // workspaceId is varchar(64) NOT NULL. For the special "default" workspace
+ // (where Anthropic returns workspace_id = null), we coalesce to "__default__"
+ // before insert — see queries.ts. This sidesteps Postgres's NULL-is-not-equal-NULL
+ // gotcha on the unique index.
+ workspaceId: varchar("workspace_id", { length: 64 }).notNull(),
+ billingMonth: varchar("billing_month", { length: 7 }).notNull(), // YYYY-MM
+
+ // Threshold alerts: fire-once-per-month, never re-fire.
+ threshold80FiredAt: timestamp("threshold_80_fired_at"),
+ threshold100FiredAt: timestamp("threshold_100_fired_at"),
+ threshold120FiredAt: timestamp("threshold_120_fired_at"),
+
+ // Forecast: edge-triggered. Fires when forecastAtRisk transitions false → true.
+ forecastAtRisk: boolean("forecast_at_risk").notNull().default(false),
+ forecastChangedAt: timestamp("forecast_changed_at"),
+
+ createdAt: timestamp("created_at").notNull().defaultNow(),
+ updatedAt: timestamp("updated_at").notNull().defaultNow(),
+}, (t) => [
+ uniqueIndex("alert_state_workspace_month_idx").on(t.workspaceId, t.billingMonth),
+]);
+
+
+ | Column | Purpose |
+ threshold_80_fired_at / _100_ / _120_ | NULL until that threshold first crosses this month. Once set, never re-fires for the same (workspace, month). |
+ forecast_at_risk | Current state of the forecast. Used to detect false → true edges (and optionally back). |
+ forecast_changed_at | Timestamp of the last state flip. Lets the card show "status changed N minutes ago". |
+ | composite unique | (workspace_id, billing_month) — exactly one row per workspace per month. |
+
+
+
+ Why fire-once-per-month for thresholds
+ A workspace stuck at 110% does not need a fresh card every hour, every day, or every time it dips to 79% and pops
+ back. The breach event is "crossed for the first time this month" — once an operator has seen it, more cards add
+ noise, not signal. Forecast cards remain edge-triggered because forecast flips are themselves rare and meaningful.
+
+
+ Migration command
+ pnpm db:generate # produces .migrations/XXXX_*.sql
+pnpm db:migrate # applies to DATABASE_URL
+
+
+ Migration review
+ Run the drizzle-migration-reviewer agent on the generated SQL before applying to production —
+ this is a brand-new table with no FK references, so risk is minimal, but the policy applies.
+
+
+
+ Module layout
+
+
+ Two new namespaces — src/lib/anthropic/ for the auth-free data layer
+ (used by both the existing server actions and the new evaluator) and src/lib/teams/ for the
+ Teams-specific code — plus one schema edit and one source-handler edit.
+ rev 2
+
+
+src/
+├── actions/
+│ └── anthropic-global.ts refactor: actions delegate to queries.ts (no behavior change for UI)
+├── lib/
+│ ├── db/
+│ │ └── schema.ts + anthropicAlertState (rev 2 shape)
+│ ├── env.ts + TEAMS_WEBHOOK_URL, TEAMS_DASHBOARD_BASE_URL
+│ ├── sync/
+│ │ └── sources/
+│ │ └── anthropic-workspace.ts + call evaluateAndPostTeamsAlerts() on success
+│ ├── anthropic/ NEW namespace — auth-free data layer
+│ │ ├── queries.ts loadSyncStatus(), loadDashboardKpis(month), loadWorkspaceList()
+│ │ └── forecast-workspace.ts forecastWorkspaceMonth(ws, month, today) — trailing-rate, not OLS
+│ └── teams/
+│ ├── webhook.ts postCard() — fetch wrapper w/ retry
+│ ├── cards.ts renderDigestCard(), renderBreachCard(), renderForecastCard(), renderStaleCard()
+│ ├── evaluator.ts evaluateAndPostTeamsAlerts() — pure orchestration
+│ ├── state.ts readAlertState(month), upsertAlertState(diff)
+│ ├── format.ts tiny helpers: $, %, ago()
+│ └── types.ts card-input types, AlertEvaluation, CardEnvelope
+
+tests/
+├── unit/
+│ └── teams/
+│ ├── cards.test.ts snapshot tests for each card variant
+│ └── format.test.ts $, %, ago()
+└── integration/
+ ├── anthropic/
+ │ └── forecast-workspace.test.ts trailing-rate math against seeded daily costs
+ └── teams/
+ ├── evaluator.test.ts fire-once-per-month invariants + forecast edges
+ └── webhook.test.ts postCard against a mocked fetch
+
+ Server-action refactor (no UI behavior change)
+
+ The existing getSyncStatus(), getDashboardKpis(), getWorkspaceList()
+ keep their signatures and their requireAdmin() + unstable_cache() wrappers. Their
+ bodies shrink to a single call into src/lib/anthropic/queries.ts. The DB queries
+ themselves move into the queries module. Both surfaces (UI server actions and the cron-time evaluator) read from
+ the same source — actions enforce the admin gate, the evaluator skips it because it runs under
+ CRON_SECRET.
+
+
+ Security hardening for the auth-free data layer
+
+ Hoisting org-financial queries above the auth boundary creates a footgun: a future route handler that imports a
+ load* function and forgets to gate would publicly expose spend KPIs. The mitigations below make
+ misuse hard to commit and easy to catch in review.
+
+
+ import "server-only" at the top of src/lib/anthropic/queries.ts. Next.js throws a build error if any client-side code tries to import the module, preventing accidental browser bundling.
+ - Top-of-file contract comment naming the two and only two allowed callers (server actions, cron-time evaluator) and stating "callers must enforce authorization themselves."
+ - Naming convention: queries are
load*() (not get*()) to flag "auth-free, caller's responsibility" distinct from action-style getters.
+ - Mandatory review by the
nextauth-security-reviewer sub-agent before merge — explicitly added to the rollout step list.
+ - No middleware import — the queries module must not be imported from
middleware.ts or any edge runtime path.
+
+
+ Residual risk after mitigations: a hand-written new route that explicitly imports a load function and skips auth.
+ That requires an intentional act and is the kind of regression code review catches. Net assessment: small
+ incremental risk vs. the original "auth at the boundary" pattern, materially lower than the cost of leaving the
+ cron caller broken.
+
+
+
+ CRON_SECRET blast radius
+ A leaked CRON_SECRET previously gave an attacker the ability to trigger syncs. After this change it
+ also gives them the ability to fire Teams posts to whatever channel TEAMS_WEBHOOK_URL points at. No
+ data exfil (the attacker would need both secrets to read responses), low real value, but worth noting.
+
+
+ Why a workspace-monthly forecast module separate from forecastBudget()
+
+ src/lib/forecast.ts's forecastBudget() is designed for the annual budget tracker
+ (input: MonthlySpend[] over many months; output: annual projection via OLS). The Teams use case
+ needs monthly projection from daily data over a partial month. Different domain.
+ forecastWorkspaceMonth() uses a 7-day trailing rate scaled to days-remaining-in-month — well-suited
+ for a 30-day cycle and produces the "Crosses 100% on YYYY-MM-DD" date the forecast card needs.
+
+
+
+ Teams webhook client
+
+
+ A thin wrapper around fetch that POSTs an Adaptive Card envelope, honors 429 Retry-After,
+ and retries on 412/502/504 per Microsoft's guidance. Reuses retryWithBackoff() from
+ src/lib/sync/framework.ts.
+
+
+src/lib/teams/webhook.ts (new)import { env } from "@/lib/env";
+import { retryWithBackoff } from "@/lib/sync/framework";
+import type { CardEnvelope } from "./types";
+
+const RETRIABLE = new Set([412, 429, 502, 504]);
+
+export async function postCard(webhookUrl: string, envelope: CardEnvelope): Promise<void> {
+ await retryWithBackoff(async () => {
+ const res = await fetch(webhookUrl, {
+ method: "POST",
+ headers: { "content-type": "application/json" },
+ body: JSON.stringify(envelope),
+ });
+ if (res.ok) return;
+
+ if (RETRIABLE.has(res.status)) {
+ const retryAfter = Number(res.headers.get("retry-after") ?? 0);
+ throw new RetriableError(`Teams webhook ${res.status}`, retryAfter * 1000);
+ }
+ const body = await res.text();
+ throw new Error(`Teams webhook ${res.status}: ${body.slice(0, 500)}`);
+ }, { maxAttempts: 3, baseDelayMs: 2000, maxDelayMs: 20000, jitterPct: 0.2 });
+}
+
+
+ Limits we're designing under
+ Per-webhook throttle is 4 req/sec, per-channel cap is 1800 messages/hour,
+ max payload 28 KB. Hourly cadence with ≤ 5 cards per run sits at < 0.3% of the per-hour cap.
+
+
+
+ Card renderer
+
+
+ Pure functions: (input) => CardEnvelope. No DB access, no side effects, no clock reads — the caller
+ passes in now. This makes snapshot tests trivial and makes the cards reproducible.
+
+
+ Renderer module shape
+src/lib/teams/cards.ts (new)export function renderDigestCard(input: DigestInput): CardEnvelope;
+export function renderBreachCard(input: BreachInput): CardEnvelope;
+export function renderForecastCard(input: ForecastInput): CardEnvelope;
+export function renderStaleCard(input: StaleInput): CardEnvelope;
+
+ Input types
+src/lib/teams/types.ts (new)export type DigestInput = {
+ kpis: DashboardKpis; // from getDashboardKpis()
+ topWorkspaces: WorkspaceListItem[]; // top 5 by utilization, from getWorkspaceList()
+ sync: SyncStatus; // from getSyncStatus()
+ month: string; // "2026-05"
+ dashboardUrl: string;
+};
+
+export type BreachInput = {
+ workspace: WorkspaceListItem;
+ threshold: "threshold_80" | "threshold_100" | "threshold_120";
+ projectedMonthEndCents: number;
+ topModel: { name: string; sharePct: number } | null;
+ workspaceUrl: string;
+ raiseLimitUrl: string;
+};
+
+export type ForecastInput = {
+ workspace: WorkspaceListItem;
+ forecast: BudgetForecast; // from forecastBudget()
+ crossesCapOn: Date | null;
+ runRate7dCents: number;
+ runRateWoWPct: number;
+ workspaceUrl: string;
+};
+
+export type CardEnvelope = {
+ type: "message";
+ attachments: [{
+ contentType: "application/vnd.microsoft.card.adaptive";
+ contentUrl: null;
+ content: AdaptiveCard;
+ }];
+};
+
+ Visual contract
+
+ Cards must match the mockup at specs/030-claude-spend-teams-alerts/mockup.html. Notable rendering
+ constraints from the Teams contract investigation:
+
+
+ - Use Adaptive Card
version: "1.4" for mobile rendering compatibility (1.5 is desktop-only).
+ - Only
Action.OpenUrl is supported via Workflows webhook. No Submit/Execute, no refresh.
+ - SVG and animated GIF are not supported by Adaptive Card
Image. Render utilization bars as nested ColumnSet with weighted widths and tinted Container.style (default / good / warning / attention).
+ - Payload ceiling is 28 KB — well above the ~3-5 KB our cards land at.
+ - Action buttons in the mockup that imply server-side actions (Pause workspace, Raise limit, Snooze) become
Action.OpenUrl deep-links into the in-app dashboard. The button label stays the same; the click flow is "open the page where you can do it".
+
+
+
+ Don't leak secrets
+ The webhook URL is the secret. The HMAC sig= in the query string authenticates anyone holding it.
+ Never log the URL, never echo it back in error messages, never include it in sync_events.errorMessage.
+
+
+
+ Alert evaluator
+
+
+ The orchestration layer. Pulls data via the auth-free queries module, diffs against the alert-state ledger,
+ emits zero or more cards, persists the new state.
+ rev 2
+
+
+src/lib/teams/evaluator.ts (new)import { loadSyncStatus, loadDashboardKpis, loadWorkspaceList } from "@/lib/anthropic/queries";
+import { forecastWorkspaceMonth } from "@/lib/anthropic/forecast-workspace";
+import { readAlertState, upsertAlertState } from "./state";
+import { postCard } from "./webhook";
+import { env } from "@/lib/env";
+
+export async function evaluateAndPostTeamsAlerts(opts?: {
+ now?: Date;
+}): Promise<{ posted: number; skipped: string[] }> {
+ if (!env.TEAMS_WEBHOOK_URL) return { posted: 0, skipped: ["webhook_disabled"] };
+
+ const now = opts?.now ?? new Date();
+ const month = getCurrentMonth(now);
+ const sync = await loadSyncStatus();
+
+ // Stale guard — short-circuit before doing any other work.
+ if (sync.isStale) {
+ await postCard(env.TEAMS_WEBHOOK_URL, renderStaleCard({ sync, month }));
+ return { posted: 1, skipped: ["digest_skipped_stale"] };
+ }
+
+ const [kpis, workspaces, state] = await Promise.all([
+ loadDashboardKpis(month),
+ loadWorkspaceList(),
+ readAlertState(month),
+ ]);
+
+ // Per-workspace forecasts run in parallel; the call is cheap (single SQL aggregation).
+ const forecasts = await Promise.all(
+ workspaces.filter(w => w.workspaceId !== null && w.limitCents !== null).map(w =>
+ forecastWorkspaceMonth(w.workspaceId!, month, now).then(f => ({ workspace: w, forecast: f }))
+ ),
+ );
+
+ const diff = computeAlertDiff({ workspaces, forecasts, state, now });
+ const envelopes: CardEnvelope[] = [];
+
+ // 1) Hourly digest — always (unless stale).
+ envelopes.push(renderDigestCard({ kpis, topWorkspaces: top5(workspaces), sync, month, dashboardUrl: dashUrl() }));
+
+ // 2) Threshold breaches — only thresholds with no firedAt timestamp yet this month.
+ for (const b of diff.thresholdsToFire) {
+ envelopes.push(renderBreachCard(toBreachInput(b)));
+ }
+
+ // 3) Forecast edges — false → true (and optionally true → false).
+ for (const f of diff.forecastEdges) {
+ envelopes.push(renderForecastCard(toForecastInput(f)));
+ }
+
+ // Post serially to stay under 4 req/sec.
+ for (const envelope of envelopes) {
+ await postCard(env.TEAMS_WEBHOOK_URL, envelope);
+ }
+
+ // Persist new state only after all cards have posted successfully.
+ await upsertAlertState(diff, now);
+ return { posted: envelopes.length, skipped: [] };
+}
+
+ State rules (rev 2)
+ For each (workspaceId, billingMonth) row, the evaluator decides per-column:
+
+ | Column | Condition | Action | Card? |
+ threshold_80_fired_at | NULL AND pct ≥ 80 | set to now | yes |
+ threshold_80_fired_at | non-NULL (any pct) | no-op | — |
+ threshold_100_fired_at | NULL AND pct ≥ 100 | set to now | yes |
+ threshold_120_fired_at | NULL AND pct ≥ 120 | set to now | yes |
+ forecast_at_risk | was false, now true | flip + stamp forecast_changed_at | yes |
+ forecast_at_risk | was true, now false | flip + stamp forecast_changed_at | see Q1 |
+
+
+ Threshold semantics
+
+ - Each threshold fires exactly once per (workspace, billing month). No fall-then-rise re-fires.
+ - A workspace going 0% → 105% in one hour fires both
threshold_80 and threshold_100 in the same evaluation — two cards.
+ - A workspace at 110% for 720 hours produces zero new cards after the initial breach.
+ - New billing month = new row = thresholds re-armed.
+ - Workspaces with
limitCents === null are skipped for thresholds. Forecast still runs (no cap = no overshoot, but the run-rate / projection is still computed).
+
+
+
+ Cron integration
+
+
+ The only change to the existing cron path is a single try-wrapped call at the end of run(), after the
+ sync-status sentinel is stamped.
+
+
+ export async function run(triggeredBy?: number, opts?: RunOpts): Promise<{ eventId: number }> {
+ const result = await withSyncLock(
+ { sourceType: "anthropic_api_costs", triggeredBy, operationType: opts?.operationType },
+ async (eventId) => { /* ...existing... */ return counts; }
+ );
+ if (counts.errorCount === 0) {
+ await db.insert(anthropicSyncStatus).values({ /* sentinel */ }).onConflictDoUpdate(...);
++ try {
++ const { posted, skipped } = await evaluateAndPostTeamsAlerts();
++ console.log(`[teams] posted=${posted} skipped=${skipped.join(",") || "-"}`);
++ } catch (err) {
++ console.error("[teams] evaluation failed (non-fatal):", err);
++ }
+ }
+ return result;
+ }
+
+
+ - Runs only when
counts.errorCount === 0 — partial-failure syncs don't post (data may be incomplete).
+ - Wrapped in
try/catch — Teams posting can never fail the sync.
+ - Logs to
console per house convention. No new logging infrastructure.
+ - No change to
makeCronSyncRoute, withSyncLock, or sync_events.
+
+
+ Manual trigger
+
+ A dev or admin can re-fire the evaluator without re-running the upstream sync by calling
+ evaluateAndPostTeamsAlerts() from a one-off script (e.g., scripts/teams-test.ts). This is
+ not exposed as an API route in v1.
+
+
+
+ Configuration
+
+ New env vars
+
+ | Name | Type | Required | Purpose |
+
+ TEAMS_WEBHOOK_URL |
+ URL |
+ No |
+ If unset, the evaluator no-ops. This is the kill switch. |
+
+
+ TEAMS_DASHBOARD_BASE_URL |
+ URL |
+ No (defaults to NEXTAUTH_URL) |
+ Base URL used to build deep links in card buttons. |
+
+
+
+ Zod additions in src/lib/env.ts
+src/lib/env.ts (edit)const envSchema = z.object({
+ /* ...existing... */
+ TEAMS_WEBHOOK_URL: z.string().url().optional(),
+ TEAMS_DASHBOARD_BASE_URL: z.string().url().optional(),
+});
+
+
+ Vercel env
+ Add TEAMS_WEBHOOK_URL in Vercel Project Settings → Environment Variables, scoped to Production only
+ at first. vercel env pull on local will pick it up. Treat the URL as a secret — it's an HMAC-signed
+ Logic Apps trigger and is auth by possession.
+
+
+
+ Card payloads
+
+
+ Three Adaptive Card v1.4 payloads, wrapped in the Workflows envelope. These are the contract — the renderer must
+ produce JSON that matches this shape.
+
+
+ (a) Hourly digest
+
+ POST $TEAMS_WEBHOOK_URL — digest payload
+{
+ "type": "message",
+ "attachments": [
+ {
+ "contentType": "application/vnd.microsoft.card.adaptive",
+ "contentUrl": null,
+ "content": {
+ "$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
+ "type": "AdaptiveCard",
+ "version": "1.4",
+ "body": [
+ { "type": "TextBlock", "text": "Claude API spend · hourly digest", "weight": "Bolder", "size": "Large", "wrap": true },
+ { "type": "TextBlock", "text": "May 2026 MTD · synced 14:01 CET · 1 min old", "isSubtle": true, "spacing": "None", "wrap": true },
+ { "type": "ColumnSet", "spacing": "Medium", "columns": [
+ { "type": "Column", "width": "stretch", "items": [
+ { "type": "TextBlock", "text": "MTD spend", "isSubtle": true, "size": "Small" },
+ { "type": "TextBlock", "text": "$18,420", "weight": "Bolder", "size": "ExtraLarge", "spacing": "None" },
+ { "type": "TextBlock", "text": "▲ 12% vs same day last month", "color": "Attention", "size": "Small", "spacing": "None" }
+ ]},
+ { "type": "Column", "width": "stretch", "items": [
+ { "type": "TextBlock", "text": "Workspaces > 80%", "isSubtle": true, "size": "Small" },
+ { "type": "TextBlock", "text": "3 of 11", "weight": "Bolder", "size": "ExtraLarge", "spacing": "None" }
+ ]},
+ { "type": "Column", "width": "stretch", "items": [
+ { "type": "TextBlock", "text": "Forecast", "isSubtle": true, "size": "Small" },
+ { "type": "TextBlock", "text": "At risk", "weight": "Bolder", "size": "ExtraLarge", "color": "Attention", "spacing": "None" }
+ ]}
+ ]},
+ { "type": "TextBlock", "text": "Top utilization", "weight": "Bolder", "spacing": "Medium" },
+
+ /* repeat per workspace: name+pct, then a two-column "bar" */
+ { "type": "TextBlock", "text": "research-claude · 102% · $5,120 / $5,000", "spacing": "Small", "wrap": true },
+ { "type": "ColumnSet", "spacing": "None", "columns": [
+ { "type": "Column", "width": 100, "items": [{ "type": "Container", "style": "attention", "minHeight": "6px", "items": [{ "type": "TextBlock", "text": " " }]}]}
+ ]}
+ /* ...repeat for the remaining 4 workspaces... */
+ ],
+ "actions": [
+ { "type": "Action.OpenUrl", "title": "Open dashboard", "url": "https://hub.unic.com/anthropic" },
+ { "type": "Action.OpenUrl", "title": "View all workspaces", "url": "https://hub.unic.com/anthropic/workspaces" }
+ ]
+ }
+ }
+ ]
+}
+
+
+ (b) Threshold breach
+
+ POST $TEAMS_WEBHOOK_URL — breach payload (truncated for brevity)
+{
+ "type": "message",
+ "attachments": [{
+ "contentType": "application/vnd.microsoft.card.adaptive",
+ "contentUrl": null,
+ "content": {
+ "$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
+ "type": "AdaptiveCard",
+ "version": "1.4",
+ "body": [
+ { "type": "Container", "style": "attention", "bleed": true, "items": [
+ { "type": "TextBlock", "text": "⚠ Workspace over budget · research-claude", "weight": "Bolder", "size": "Large", "color": "Attention", "wrap": true },
+ { "type": "TextBlock", "text": "Crossed 100% at 13:58 CET · first breach this month", "isSubtle": true, "spacing": "None", "wrap": true }
+ ]},
+ { "type": "FactSet", "facts": [
+ { "title": "Workspace", "value": "research-claude" },
+ { "title": "Monthly limit", "value": "$5,000.00" },
+ { "title": "Spend MTD", "value": "$5,120.40 · 102%" },
+ { "title": "7-day run rate", "value": "$612/day · 3.4× the 30-day avg" },
+ { "title": "Projected EOM", "value": "$8,940 · +$3,940 over" }
+ /* "Top model" intentionally omitted in v1 — no per-workspace × model join
+ available. Deferred to Phase 1.5 once we wire anthropic_usage_metrics
+ into a workspace-attributed aggregation. */
+ ]}
+ ],
+ "actions": [
+ { "type": "Action.OpenUrl", "title": "Open workspace", "url": "https://hub.unic.com/anthropic/workspaces/research-claude" },
+ { "type": "Action.OpenUrl", "title": "Adjust limit", "url": "https://hub.unic.com/anthropic/workspaces/research-claude/limit" }
+ ]
+ }
+ }]
+}
+
+
+ (c) Forecast at-risk
+
+ POST $TEAMS_WEBHOOK_URL — forecast payload (truncated)
+{
+ "type": "message",
+ "attachments": [{
+ "contentType": "application/vnd.microsoft.card.adaptive",
+ "contentUrl": null,
+ "content": {
+ "$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
+ "type": "AdaptiveCard",
+ "version": "1.4",
+ "body": [
+ { "type": "Container", "style": "warning", "bleed": true, "items": [
+ { "type": "TextBlock", "text": "Forecast: product-ai projected to overshoot", "weight": "Bolder", "size": "Medium", "color": "Warning", "wrap": true },
+ { "type": "TextBlock", "text": "OLS on last 14 days · status flipped to at_risk", "isSubtle": true, "spacing": "None" }
+ ]},
+ { "type": "FactSet", "facts": [
+ { "title": "Spend MTD", "value": "$4,200 · 84%" },
+ { "title": "7-day run rate", "value": "$210/day · ▲ 28% WoW" },
+ { "title": "Projected EOM", "value": "$5,890 · +$890 over" },
+ { "title": "Crosses 100% on", "value": "28 May 2026 · in 7 days" }
+ ]}
+ ],
+ "actions": [
+ { "type": "Action.OpenUrl", "title": "Open forecast", "url": "https://hub.unic.com/anthropic/workspaces/product-ai/forecast" }
+ ]
+ }
+ }]
+}
+
+
+
+ Failure modes & edge cases
+
+
+ | Scenario | Behavior |
+
+ | Webhook returns 429 |
+ retryWithBackoff honors Retry-After, up to 3 attempts, max 20s delay. |
+
+
+ | Webhook returns 4xx (non-429) |
+ No retry. Log error (without URL). Sync still marked success — alerts are best-effort. |
+
+
+ | Workflow URL revoked / O365 connector retired |
+ All POSTs fail. Operator removes TEAMS_WEBHOOK_URL until a new Workflows URL is provisioned. |
+
+
+ | Sync stale (> 70 min) |
+ Digest skipped. Stale-data card posted instead. Breach/forecast cards skipped entirely. |
+
+
+ | Anthropic workspace removed upstream |
+ Workspace drops out of getWorkspaceList(). Its alert-state row goes dormant (active=false next eval). |
+
+
+ | New billing month |
+ Eval queries WHERE billing_month = '2026-06'. No rows exist; everything is treated as fresh. May → June rollover emits a fresh set of breach cards if conditions persist. |
+
+
+ | Limit increased mid-month (utilization drops below threshold) |
+ Row flips active=false. Next time utilization crosses again, a new card fires. No spam. |
+
+
+ | Limit removed entirely |
+ Workspace has no limitCents. Excluded from threshold evaluation. Forecast still runs if monthly history is sufficient. |
+
+
+ | Card > 28 KB |
+ Renderer caps the digest at top-5 workspaces; theoretical max is well under 6 KB. If somehow exceeded, the POST fails with 413 and the operator sees the error in cron logs. |
+
+
+ | Concurrent sync runs |
+ withSyncLock prevents concurrency. Evaluator runs only after the lock releases. |
+
+
+ | Partial sync (errorCount > 0) |
+ Evaluator skipped. Sync data may be incomplete; better silence than misleading alerts. |
+
+
+
+
+ Test plan
+
+ Unit Vitest
+
+ tests/unit/teams/cards.test.ts — one snapshot per card variant. Catches accidental schema breakage.
+ tests/unit/teams/format.test.ts — money/percent/relative-time helpers.
+ - Validate the snapshot JSON against an Adaptive Card schema validator (one-time check; not in CI loop).
+
+
+ Integration Vitest + real Neon branch
+
+ tests/integration/teams/evaluator.test.ts rev 2:
+
+ - Empty state, no workspaces over threshold → exactly 1 envelope (digest).
+ - One workspace newly at 85% → digest + 1 breach card;
threshold_80_fired_at is set.
+ - Same workspace next hour, still 85% → digest only, no re-fire.
+ - Same workspace, now 110% → digest + 1 new
threshold_100 card; threshold_100_fired_at set.
+ - Drops to 60% → digest only; no recovery card; timestamps remain set.
+ - Back up to 85% → digest only;
threshold_80_fired_at already non-NULL, so no re-fire.
+ - Month rollover → new row inserted with all timestamps NULL; same 85% breach emits a fresh card.
+ - Forecast flips
on_track → at_risk → forecast card posted; forecast_at_risk = true, forecast_changed_at stamped.
+ - Forecast flips back
at_risk → on_track → optional recovery card per Q1 default (no card).
+ - Stale sync → only stale card posted; thresholds and forecasts not evaluated.
+ - Workspace with
limitCents = null → thresholds skipped, forecast still computed.
+ - Default workspace (
workspaceId = null) → coalesced to "__default__" in the ledger, unique index satisfied.
+
+
+ tests/integration/anthropic/forecast-workspace.test.ts new — seed 14 days of anthropic_workspace_costs, assert: 7-day run rate, week-over-week %, projected EOM cents, crosses-cap date (or NULL when on-track).
+ tests/integration/teams/webhook.test.ts — mock fetch to assert: envelope shape, retry on 429 with Retry-After, no retry on 400, never logs the URL.
+
+
+ Manual
+
+ - Provision a real Workflows webhook in a staging channel.
+ - Seed a workspace with a synthetic 110% utilization in staging DB.
+ - Run the script
pnpm tsx scripts/teams-test.ts → digest + breach card appear in the channel within seconds.
+ - Re-run the script → only the digest re-appears (idempotency proof).
+ - Trigger the cron via
curl -H "Authorization: Bearer $CRON_SECRET" .../api/sync/anthropic-api-costs.
+
+
+
+ Rollout
+
+
+ -
+ Spec freeze — review this plan with the AI-FinOps stakeholder. Confirm channel + audience.
+
+ -
+ Branch & Neon branch — create feature branch, spin a Neon worktree branch for schema work.
+
+ -
+ Schema migration — add
anthropic_alert_type enum + anthropic_alert_state table. Generate, review with drizzle-migration-reviewer, apply.
+
+ -
+ Extract data layer — move DB queries from
src/actions/anthropic-global.ts into new src/lib/anthropic/queries.ts. Server actions stay as thin admin-gated + cached wrappers. UI behavior unchanged; verify by running the existing dashboard against staging.
+
+ -
+ Add
forecastWorkspaceMonth() in src/lib/anthropic/forecast-workspace.ts with an integration test.
+
+ -
+ Implement Teams modules in order:
types.ts → format.ts → cards.ts → webhook.ts → state.ts → evaluator.ts. Each lands with unit tests.
+
+ -
+ Wire into sync — edit
src/lib/sync/sources/anthropic-workspace.ts with the try-wrapped call.
+
+ -
+ Integration tests — run against the Neon branch; assert state-machine behavior.
+
+ -
+ Staging webhook — provision in a private staging Teams channel. Push branch to a Vercel preview. Set
TEAMS_WEBHOOK_URL on the preview only.
+
+ -
+ Smoke test — seed a synthetic over-limit workspace, trigger the cron, observe the channel.
+
+ -
+ Security review — run the
nextauth-security-reviewer sub-agent against the diff. Focus on src/lib/anthropic/queries.ts (server-only marker, no client imports), the cron handler, and any newly-touched session-handling code.
+
+ -
+ Code review — merge to
main.
+
+ -
+ Production webhook — channel owner provisions a Workflows webhook in the production channel. Add
TEAMS_WEBHOOK_URL to Vercel Production.
+
+ -
+ Watch for 24h — confirm one digest per hour, no duplicate breach cards on cron re-runs, no errors in cron logs.
+
+ -
+ Close spec — link the merged PRs from
specs/030-claude-spend-teams-alerts/README.html (to be added later).
+
+
+
+ Backout
+
+ Remove or unset TEAMS_WEBHOOK_URL in Vercel. No code rollback needed — the evaluator early-returns when the var is missing.
+ Schema migration is additive (a new table) and safe to keep even when the feature is off.
+
+
+
+ Open questions
+
+
+ | # | Question | Default if no answer |
+
+ | Q1 |
+ Post a recovery card when forecast flips at_risk → on_track? (Thresholds are fire-once-per-month by design and don't apply.) |
+ No in v1 — reduces noise. The flip is recorded in the ledger and visible in the next digest. Reconsider after a month of real usage. |
+
+
+ | Q2 |
+ One channel for everything, or split "digest" vs "alerts"? |
+ One channel (TEAMS_WEBHOOK_URL). Add a second var (TEAMS_WEBHOOK_URL_ALERTS) later if signal-to-noise becomes an issue. |
+
+
+ | Q3 |
+ Should the digest post even when nothing has changed? |
+ Yes — the digest is the heartbeat. If it disappears, ops knows the sync is broken. |
+
+
+ | Q4 |
+ Currency and time-zone formatting — USD + CET, or per-tenant config? |
+ USD + CET hard-coded in v1; pull from a settings module when there's more than one tenant. |
+
+
+ | Q5 |
+ Hourly cadence too noisy? |
+ Keep hourly. The digest is one message; only edge-triggered breach/forecast cards add extra posts, and they're by definition uncommon. |
+
+
+ | Q6 |
+ Where do the deep-link buttons point? Are there per-workspace pages today (/anthropic/workspaces/:id)? |
+ Confirm against the routing under src/app/(authed)/anthropic/ during implementation. If the per-workspace page doesn't exist, link to the global dashboard with an anchor. |
+
+
+
+