feat: scheduled (cron) triggers for agents — closes #186#196
feat: scheduled (cron) triggers for agents — closes #186#196ishaan-berri wants to merge 13 commits into
Conversation
Adds nullable cron_schedule (standard 5-field), cron_timezone (IANA),
cron_enabled, cron_overlap_policy, plus server-managed cron_last_fired_at
and cron_next_fire_at. Indexed on (cron_enabled, cron_next_fire_at) so
the scheduler hot path is a single index seek.
Session gains a trigger column ("api" default, "cron" for scheduled
runs) so the UI can badge cron-driven sessions.
Used by src/server/cron.ts to compute next-fire instants. Standard library — handles 5-field crons, IANA timezones, and DST correctly.
The session POST route inlined ~250 lines of warm/cold bring-up orchestration. Moves runBringUp + helpers into a standalone module so non-HTTP callers (the worker's cron tick, future integrations) can reuse exactly the same dance. Nothing inside reads request-scoped state — only prisma, k8s, harness primitives.
Drops ~370 lines of inline bring-up logic. Route is now a thin shell: auth + body parse + warm claim + Session row create + delegate to runBringUp. Behavior unchanged.
Implements parseCronSpec, computeNextFireAt, and tickCron.
Multi-pod safety: tickCron claims due agents via raw
"SELECT … FOR UPDATE SKIP LOCKED" inside a transaction, advances each
agent's cron_next_fire_at within the same tx, then fires bring-up
*outside* the transaction so the lock window stays small. Two pods
racing the same tick: one wins each row, the loser sees it skipped.
Overlap policy: when a previous cron-tagged session is still in
{creating, ready}, the new fire is skipped. Reserved for "queue" and
"parallel" via Agent.cron_overlap_policy column.
Bounded at 50 due agents per tick so a stuck run can't swamp the
worker; anything beyond gets picked up next tick (default 30s).
Lives in its own module so cron.ts → cron-bringup.ts →
session-bringup.ts doesn't pull API-route code into the worker bundle.
Also keeps the synthetic-prompt format for scheduled runs in one
place ("[cron] scheduled run at <iso>").
One more line in the existing tick loop. When no agent has a schedule the (cron_enabled, cron_next_fire_at) index makes the lookup essentially free, so the cost on a no-schedule deploy is ~one index seek per worker tick. Counters surface in the heartbeat log so operators can see cron_considered / cron_fired / cron_skipped_overlap without a separate dashboard.
UpdateAgentBody accepts cron_schedule, cron_timezone, cron_enabled, cron_overlap_policy. ApiAgent / ApiSession surface the server-managed cron timestamps and the trigger discriminator. Read-only fields are omitted from the request body so clients can't lie about cron_next_fire_at.
When cron_schedule or cron_timezone changes (validated together so the 400 names the bad pair, not the field evaluation order), recompute cron_next_fire_at server-side. Empty string clears the schedule; the scheduler's WHERE cron_schedule IS NOT NULL then masks the row. Flipping cron_enabled doesn't touch cron_next_fire_at — toggling off and back on resumes from the existing cadence without losing it.
Client-side types now mirror the server shapes. cron_schedule accepts null on the request side so the UI can clear via PATCH without ambiguity. SessionRow.trigger is optional + nullable since older rows predate the column.
Drop-in editor for the agent settings page's <dl> grid. Reads the
agent's cron config, lets the user edit cron expression / timezone /
enabled, and submits via the shared updateAgent API. Renders the
common shapes ("Weekdays at 9am", "Every 5 minutes") as a live
preview; falls back to "Custom schedule" for anything novel.
Preset chips cover the cases the original issue called out — daily
stargazer outreach, weekly digests, hourly checks. Server is the
ultimate validator: client preview is cosmetic and never blocks save.
Adds a Schedule row to the Configuration block and a small "cron" badge alongside the status pill on cron-triggered sessions in the Sessions list so the user can tell at a glance which runs were auto-started.
Runs with "node --import tsx --test scripts/cron-parse.test.mjs". 9 assertions covering: empty/null schedules → next=null; daily, every-N- minute, weekday-only cadence advances correctly; LA timezone produces expected UTC offset; invalid cron + invalid tz throw with actionable messages.
Greptile SummaryThis PR adds native scheduled (cron) triggers for agents: new schema columns on
Confidence Score: 3/5The core scheduler has two logic gaps in its Phase 1 / Phase 2 handoff that produce incorrect DB state and unintended session creation; safe to merge only after those are addressed. Both bugs live in the hot path of the new cron tick. The first means every overlap-skipped firing writes a stale 'last fired' timestamp users can see in the UI. The second means an agent that the code explicitly logs as 'disabling' due to a bad schedule still gets a session spawned for it in the same tick. src/server/cron.ts — specifically the Phase 1 transaction block (lines ~204-222) and the Phase 2 loop (lines ~238-268) need attention before merging.
|
| Filename | Overview |
|---|---|
| src/server/cron.ts | New scheduler with two logic bugs: cron_last_fired_at is committed before the overlap check (skipped agents get a false timestamp), and agents disabled mid-tick for invalid schedules still fire once in Phase 2. |
| prisma/migrations/0006_agent_cron_triggers/migration.sql | Additive-only migration (new columns with safe defaults + one index). No column deletions, no data loss risk. |
| src/server/session-bringup.ts | Logic extracted from the session POST route into a shared module — functionally identical, just moved to enable reuse by the cron tick. |
| src/app/api/v1/managed_agents/agents/[agent_id]/route.ts | PATCH handler extended with cron field validation; parseCronSpec called server-side for immediate 400 on bad cron/tz pair, and cron_next_fire_at recomputed on schedule/timezone change. |
| src/components/cron-editor.tsx | New UI component for inline schedule editing; validation is server-side only with errors surfaced via onError prop. Clean implementation. |
| src/worker/index.ts | Worker tick wired to call tickCron() alongside existing reconcile and warm-pool ticks; result logged in heartbeat. |
| src/server/cron-bringup.ts | Thin shim that sets a cron-specific title and delegates to runBringUp; keeps cron.ts free of API-route imports. |
| prisma/schema.prisma | Six new nullable/defaulted cron columns on Agent plus trigger on Session; composite index on (cron_enabled, cron_next_fire_at) matches the scheduler query exactly. |
| src/server/types.ts | ApiAgent and ApiSession extended with cron fields; extensive cast-through-unknown pattern due to pre-generate Prisma client compatibility. |
Reviews (1): Last reviewed commit: "test(cron): unit cover parser, next-fire..." | Re-trigger Greptile
| for (const a of claimed) { | ||
| try { | ||
| const next = computeNextFireAt(a.cron_schedule, a.cron_timezone, now); | ||
| await tx.agent.update({ | ||
| where: { agent_id: a.agent_id }, | ||
| data: { cron_next_fire_at: next, cron_last_fired_at: now }, | ||
| }); | ||
| } catch (e) { | ||
| console.error( | ||
| `cron: disabling agent ${a.agent_id} — invalid schedule at fire time: ${ | ||
| e instanceof Error ? e.message : String(e) | ||
| }`, | ||
| ); | ||
| await tx.agent.update({ | ||
| where: { agent_id: a.agent_id }, | ||
| data: { cron_enabled: false, cron_next_fire_at: null }, | ||
| }); | ||
| } | ||
| } | ||
| return claimed; | ||
| }); | ||
| } catch (e) { | ||
| // Transaction-level failure — log and bail. Next tick will retry. | ||
| console.error( | ||
| `cron: claim transaction failed: ${e instanceof Error ? e.message : String(e)}`, | ||
| ); | ||
| result.errors += 1; | ||
| return result; | ||
| } | ||
|
|
||
| result.considered = dueAgents.length; | ||
|
|
||
| // Phase 2: fire each agent outside the transaction. Failures are | ||
| // per-agent — one bad agent must not block the rest of the tick. | ||
| for (const due of dueAgents) { | ||
| try { | ||
| const agent = await prisma.agent.findUnique({ | ||
| where: { agent_id: due.agent_id }, | ||
| }); | ||
| if (agent === null) continue; | ||
|
|
||
| // Overlap policy. "skip" is the only supported value in v1; the | ||
| // column is keyed for future "queue" / "parallel" semantics. | ||
| const policy = | ||
| (agent as unknown as { cron_overlap_policy?: string }) | ||
| .cron_overlap_policy ?? "skip"; | ||
| if (policy === "skip" && (await hasActiveCronRun(agent.agent_id))) { | ||
| result.skipped_overlap += 1; | ||
| console.log( | ||
| `cron: skipping agent_id=${agent.agent_id} — previous cron run still active`, | ||
| ); | ||
| continue; |
There was a problem hiding this comment.
cron_last_fired_at set before overlap check, so skipped runs show a false "last fired" timestamp
cron_last_fired_at = now is committed inside the claim transaction (line 209) for every claimed agent. The overlap check (hasActiveCronRun) runs after the transaction commits (line 250). When the check fires and a previous run is still active, the agent is skipped via continue — no session is created — but cron_last_fired_at already reflects now in the database. The UI then shows "last: X minutes ago" for a run that never happened, with no matching Session row users can inspect.
| for (const a of claimed) { | ||
| try { | ||
| const next = computeNextFireAt(a.cron_schedule, a.cron_timezone, now); | ||
| await tx.agent.update({ | ||
| where: { agent_id: a.agent_id }, | ||
| data: { cron_next_fire_at: next, cron_last_fired_at: now }, | ||
| }); | ||
| } catch (e) { | ||
| console.error( | ||
| `cron: disabling agent ${a.agent_id} — invalid schedule at fire time: ${ | ||
| e instanceof Error ? e.message : String(e) | ||
| }`, | ||
| ); | ||
| await tx.agent.update({ | ||
| where: { agent_id: a.agent_id }, | ||
| data: { cron_enabled: false, cron_next_fire_at: null }, | ||
| }); | ||
| } | ||
| } | ||
| return claimed; | ||
| }); | ||
| } catch (e) { | ||
| // Transaction-level failure — log and bail. Next tick will retry. | ||
| console.error( | ||
| `cron: claim transaction failed: ${e instanceof Error ? e.message : String(e)}`, | ||
| ); | ||
| result.errors += 1; | ||
| return result; | ||
| } | ||
|
|
||
| result.considered = dueAgents.length; | ||
|
|
||
| // Phase 2: fire each agent outside the transaction. Failures are | ||
| // per-agent — one bad agent must not block the rest of the tick. | ||
| for (const due of dueAgents) { | ||
| try { | ||
| const agent = await prisma.agent.findUnique({ | ||
| where: { agent_id: due.agent_id }, | ||
| }); | ||
| if (agent === null) continue; | ||
|
|
||
| // Overlap policy. "skip" is the only supported value in v1; the | ||
| // column is keyed for future "queue" / "parallel" semantics. | ||
| const policy = | ||
| (agent as unknown as { cron_overlap_policy?: string }) | ||
| .cron_overlap_policy ?? "skip"; | ||
| if (policy === "skip" && (await hasActiveCronRun(agent.agent_id))) { | ||
| result.skipped_overlap += 1; | ||
| console.log( | ||
| `cron: skipping agent_id=${agent.agent_id} — previous cron run still active`, | ||
| ); | ||
| continue; | ||
| } | ||
|
|
||
| await fireCronRun(agent as AgentRow, now); |
There was a problem hiding this comment.
Agents disabled for invalid schedule are still fired in Phase 2
When computeNextFireAt throws for an agent (invalid schedule mutated between last fire and this tick), the catch block sets cron_enabled = false and cron_next_fire_at = null inside the transaction — correctly preventing future fires. However, the agent is still present in the claimed array returned from the transaction, and Phase 2 iterates over dueAgents without checking cron_enabled. The result is that the "now disabled" agent still gets fireCronRun called, spawning one unintended session. The error log says "disabling" but the code fires anyway.
Why
Closes #186. Today every LAP session is started by an external trigger (human chat, Slack/Linear webhook,
POST /agents/{id}/session). There's no native way to say "run this agent every day at 9am PT on its own." This blocks daily stargazer outreach, weekly digests, hourly health checks, periodic data syncs — every one of them today has to be poked by an external GitHub Action / Zapier / k8s CronJob.What
Native scheduled triggers. Add a 5-field cron + IANA timezone to
Agent; the worker fires a Session at each scheduled instant.Schema
Agent.cron_schedule(nullable 5-field cron)Agent.cron_timezone(IANA, defaultUTC)Agent.cron_enabled(toggle, defaulttrue)Agent.cron_overlap_policy("skip"only in v1; reserved for"queue"/"parallel")Agent.cron_last_fired_at,Agent.cron_next_fire_at(server-managed)(cron_enabled, cron_next_fire_at)so the scheduler is a single index seekSession.trigger("api"default,"cron"for scheduled runs) so the UI can badge themScheduler
src/server/cron.ts→tickCron()runs alongside the existing reconcile + warm-pool ticks in the worker. Each tick:SELECT … FOR UPDATE SKIP LOCKEDcron_next_fire_atinside the same tx (so a sibling pod won't re-claim on its next tick)trigger="cron"Bring-up is the same
runBringUpthe HTTP route already uses — extracted tosrc/server/session-bringup.tsso both callers share one implementation.Answering "how does this work with multiple pods?" (the comment on the issue)
This was the open question. State lives in the DB (
cron_next_fire_atcolumn), not in any in-memory APScheduler instance. Concurrency is enforced by Postgres row locks:SELECT … FOR UPDATE SKIP LOCKEDSKIP LOCKEDdoes — return rows that aren't currently locked)cron_next_fire_atand commits → Pod B's next tick won't see Agent X because the next-fire instant is now in the futureNo leader election, no Redis, no Zookeeper. Just Postgres. Verified locally with a race test (
Promise.all([tickCron(), tickCron()])against a due row): exactly one pod fires, the other returnsfired: 0.API
PATCH /api/v1/managed_agents/agents/{id}acceptscron_schedule,cron_timezone,cron_enabled,cron_overlap_policy.cron_scheduleclears the schedule (sets DB to NULL)cron_next_fire_atserver-side so the new cadence takes effect immediatelyUI
Inline editor on the agent settings page —
Schedulerow next toEnv vars.Collapsed (read-only):
Expanded (editing):
Sessions list badges cron-driven runs with a
cronpill alongside the status.(Screenshots saved locally at
/tmp/cron-collapsed.pngand/tmp/cron-expanded.png— happy to drag-and-drop them onto the PR after merge or via a follow-up comment.)Test plan
node --import tsx --test scripts/cron-parse.test.mjs) — 9 assertions, all passnpx tsc --noEmitcleannpx eslintclean on changed filesnpx next buildsucceeds (no breaking changes to existing routes)trigger="cron"Out of scope for v1
cron_enabled = false → trueresumecron_overlap_policy = "queue" | "parallel"(column exists, only"skip"accepted)