feat: scheduled (cron) triggers for agents — closes #186 by ishaan-berri · Pull Request #196 · BerriAI/litellm-agent-platform

ishaan-berri · 2026-05-19T04:33:18Z

Why

Closes #186. Today every LAP session is started by an external trigger (human chat, Slack/Linear webhook, POST /agents/{id}/session). There's no native way to say "run this agent every day at 9am PT on its own." This blocks daily stargazer outreach, weekly digests, hourly health checks, periodic data syncs — every one of them today has to be poked by an external GitHub Action / Zapier / k8s CronJob.

What

Native scheduled triggers. Add a 5-field cron + IANA timezone to Agent; the worker fires a Session at each scheduled instant.

Schema

Agent.cron_schedule (nullable 5-field cron)
Agent.cron_timezone (IANA, default UTC)
Agent.cron_enabled (toggle, default true)
Agent.cron_overlap_policy ("skip" only in v1; reserved for "queue" / "parallel")
Agent.cron_last_fired_at, Agent.cron_next_fire_at (server-managed)
Index (cron_enabled, cron_next_fire_at) so the scheduler is a single index seek
Session.trigger ("api" default, "cron" for scheduled runs) so the UI can badge them

Scheduler

src/server/cron.ts → tickCron() runs alongside the existing reconcile + warm-pool ticks in the worker. Each tick:

Inside a transaction, claim due agents via raw SELECT … FOR UPDATE SKIP LOCKED
Advance each agent's cron_next_fire_at inside the same tx (so a sibling pod won't re-claim on its next tick)
Commit the tx — release locks fast
Outside the tx, fire each agent's bring-up (warm claim → fall back to cold), tagged trigger="cron"

Bring-up is the same runBringUp the HTTP route already uses — extracted to src/server/session-bringup.ts so both callers share one implementation.

Answering "how does this work with multiple pods?" (the comment on the issue)

This was the open question. State lives in the DB (cron_next_fire_at column), not in any in-memory APScheduler instance. Concurrency is enforced by Postgres row locks:

Two pods wake up at the same instant → both run SELECT … FOR UPDATE SKIP LOCKED
Pod A grabs Agent X's lock; Pod B sees Agent X is locked and skips it (that's what SKIP LOCKED does — return rows that aren't currently locked)
Pod A advances cron_next_fire_at and commits → Pod B's next tick won't see Agent X because the next-fire instant is now in the future

No leader election, no Redis, no Zookeeper. Just Postgres. Verified locally with a race test (Promise.all([tickCron(), tickCron()]) against a due row): exactly one pod fires, the other returns fired: 0.

API

PATCH /api/v1/managed_agents/agents/{id} accepts cron_schedule, cron_timezone, cron_enabled, cron_overlap_policy.

Empty string cron_schedule clears the schedule (sets DB to NULL)
Invalid cron string → 400 with the parser's error message
Invalid timezone → 400 with a hint about IANA names
Schedule or timezone change recomputes cron_next_fire_at server-side so the new cadence takes effect immediately

UI

Inline editor on the agent settings page — Schedule row next to Env vars.

Collapsed (read-only):

Schedule    Weekdays at 9am (America/Los_Angeles)  [enabled]   ✏ Edit    last: 5/18/2026, 9:24:50 AM
                                                                          next: 5/19/2026, 9:24:50 AM

Expanded (editing):

Schedule  ┌──────────────────────────────────────────────────────────┐
          │ Cron expression                                           │
          │ [ 0 9 * * 1-5                                          ]  │
          │ Weekdays at 9am                                           │
          │                                                           │
          │ [Every 5 minutes] [Hourly] [Daily at 9am]                 │
          │ [Weekdays at 9am] [Weekly (Monday 9am)]                   │
          │                                                           │
          │ Timezone                                                  │
          │ [ America/Los_Angeles                                  ▾] │
          │                                                           │
          │ [✓] Enabled                                               │
          │                                                           │
          │ [ Save ]   ✕ Cancel                                       │
          └──────────────────────────────────────────────────────────┘

Sessions list badges cron-driven runs with a cron pill alongside the status.

(Screenshots saved locally at /tmp/cron-collapsed.png and /tmp/cron-expanded.png — happy to drag-and-drop them onto the PR after merge or via a follow-up comment.)

Test plan

Out of scope for v1

Per-run retry policy on bring-up failure
Backfill on cron_enabled = false → true resume
Manual "run now" button (separate issue)
cron_overlap_policy = "queue" | "parallel" (column exists, only "skip" accepted)

Adds nullable cron_schedule (standard 5-field), cron_timezone (IANA), cron_enabled, cron_overlap_policy, plus server-managed cron_last_fired_at and cron_next_fire_at. Indexed on (cron_enabled, cron_next_fire_at) so the scheduler hot path is a single index seek. Session gains a trigger column ("api" default, "cron" for scheduled runs) so the UI can badge cron-driven sessions.

Used by src/server/cron.ts to compute next-fire instants. Standard library — handles 5-field crons, IANA timezones, and DST correctly.

The session POST route inlined ~250 lines of warm/cold bring-up orchestration. Moves runBringUp + helpers into a standalone module so non-HTTP callers (the worker's cron tick, future integrations) can reuse exactly the same dance. Nothing inside reads request-scoped state — only prisma, k8s, harness primitives.

Drops ~370 lines of inline bring-up logic. Route is now a thin shell: auth + body parse + warm claim + Session row create + delegate to runBringUp. Behavior unchanged.

Implements parseCronSpec, computeNextFireAt, and tickCron. Multi-pod safety: tickCron claims due agents via raw "SELECT … FOR UPDATE SKIP LOCKED" inside a transaction, advances each agent's cron_next_fire_at within the same tx, then fires bring-up *outside* the transaction so the lock window stays small. Two pods racing the same tick: one wins each row, the loser sees it skipped. Overlap policy: when a previous cron-tagged session is still in {creating, ready}, the new fire is skipped. Reserved for "queue" and "parallel" via Agent.cron_overlap_policy column. Bounded at 50 due agents per tick so a stuck run can't swamp the worker; anything beyond gets picked up next tick (default 30s).

Lives in its own module so cron.ts → cron-bringup.ts → session-bringup.ts doesn't pull API-route code into the worker bundle. Also keeps the synthetic-prompt format for scheduled runs in one place ("[cron] scheduled run at <iso>").

One more line in the existing tick loop. When no agent has a schedule the (cron_enabled, cron_next_fire_at) index makes the lookup essentially free, so the cost on a no-schedule deploy is ~one index seek per worker tick. Counters surface in the heartbeat log so operators can see cron_considered / cron_fired / cron_skipped_overlap without a separate dashboard.

UpdateAgentBody accepts cron_schedule, cron_timezone, cron_enabled, cron_overlap_policy. ApiAgent / ApiSession surface the server-managed cron timestamps and the trigger discriminator. Read-only fields are omitted from the request body so clients can't lie about cron_next_fire_at.

When cron_schedule or cron_timezone changes (validated together so the 400 names the bad pair, not the field evaluation order), recompute cron_next_fire_at server-side. Empty string clears the schedule; the scheduler's WHERE cron_schedule IS NOT NULL then masks the row. Flipping cron_enabled doesn't touch cron_next_fire_at — toggling off and back on resumes from the existing cadence without losing it.

Client-side types now mirror the server shapes. cron_schedule accepts null on the request side so the UI can clear via PATCH without ambiguity. SessionRow.trigger is optional + nullable since older rows predate the column.

Drop-in editor for the agent settings page's <dl> grid. Reads the agent's cron config, lets the user edit cron expression / timezone / enabled, and submits via the shared updateAgent API. Renders the common shapes ("Weekdays at 9am", "Every 5 minutes") as a live preview; falls back to "Custom schedule" for anything novel. Preset chips cover the cases the original issue called out — daily stargazer outreach, weekly digests, hourly checks. Server is the ultimate validator: client preview is cosmetic and never blocks save.

Adds a Schedule row to the Configuration block and a small "cron" badge alongside the status pill on cron-triggered sessions in the Sessions list so the user can tell at a glance which runs were auto-started.

Runs with "node --import tsx --test scripts/cron-parse.test.mjs". 9 assertions covering: empty/null schedules → next=null; daily, every-N- minute, weekday-only cadence advances correctly; LA timezone produces expected UTC offset; invalid cron + invalid tz throw with actionable messages.

greptile-apps · 2026-05-19T04:37:52Z

Greptile Summary

This PR adds native scheduled (cron) triggers for agents: new schema columns on Agent and Session, a Postgres-based multi-pod-safe scheduler (tickCron with SELECT … FOR UPDATE SKIP LOCKED), bring-up logic extracted to a shared module, and a UI inline editor. The architecture is sound \u2014 state lives in the DB, no leader election required \u2014 but there are two logic bugs in the Phase 1 / Phase 2 handoff inside tickCron.

cron_last_fired_at is committed inside the claim transaction before the overlap check runs outside it; when a run is skipped due to cron_overlap_policy = \"skip\", the column still shows now with no corresponding Session row, misleading the UI "last fired" display.
When computeNextFireAt throws for an agent with a corrupted schedule (the catch branch sets cron_enabled = false), the agent remains in the claimed list returned from the transaction and is still passed to fireCronRun in Phase 2 \u2014 spawning one unintended session despite the explicit "disabling" log message.

Confidence Score: 3/5

The core scheduler has two logic gaps in its Phase 1 / Phase 2 handoff that produce incorrect DB state and unintended session creation; safe to merge only after those are addressed.

Both bugs live in the hot path of the new cron tick. The first means every overlap-skipped firing writes a stale 'last fired' timestamp users can see in the UI. The second means an agent that the code explicitly logs as 'disabling' due to a bad schedule still gets a session spawned for it in the same tick.

src/server/cron.ts — specifically the Phase 1 transaction block (lines ~204-222) and the Phase 2 loop (lines ~238-268) need attention before merging.

Important Files Changed

Filename	Overview
src/server/cron.ts	New scheduler with two logic bugs: cron_last_fired_at is committed before the overlap check (skipped agents get a false timestamp), and agents disabled mid-tick for invalid schedules still fire once in Phase 2.
prisma/migrations/0006_agent_cron_triggers/migration.sql	Additive-only migration (new columns with safe defaults + one index). No column deletions, no data loss risk.
src/server/session-bringup.ts	Logic extracted from the session POST route into a shared module — functionally identical, just moved to enable reuse by the cron tick.
src/app/api/v1/managed_agents/agents/[agent_id]/route.ts	PATCH handler extended with cron field validation; parseCronSpec called server-side for immediate 400 on bad cron/tz pair, and cron_next_fire_at recomputed on schedule/timezone change.
src/components/cron-editor.tsx	New UI component for inline schedule editing; validation is server-side only with errors surfaced via onError prop. Clean implementation.
src/worker/index.ts	Worker tick wired to call tickCron() alongside existing reconcile and warm-pool ticks; result logged in heartbeat.
src/server/cron-bringup.ts	Thin shim that sets a cron-specific title and delegates to runBringUp; keeps cron.ts free of API-route imports.
prisma/schema.prisma	Six new nullable/defaulted cron columns on Agent plus trigger on Session; composite index on (cron_enabled, cron_next_fire_at) matches the scheduler query exactly.
src/server/types.ts	ApiAgent and ApiSession extended with cron fields; extensive cast-through-unknown pattern due to pre-generate Prisma client compatibility.

_{Reviews (1): Last reviewed commit: "test(cron): unit cover parser, next-fire..." | Re-trigger Greptile}

greptile-apps · 2026-05-19T04:37:56Z

+      for (const a of claimed) {
+        try {
+          const next = computeNextFireAt(a.cron_schedule, a.cron_timezone, now);
+          await tx.agent.update({
+            where: { agent_id: a.agent_id },
+            data: { cron_next_fire_at: next, cron_last_fired_at: now },
+          });
+        } catch (e) {
+          console.error(
+            `cron: disabling agent ${a.agent_id} — invalid schedule at fire time: ${
+              e instanceof Error ? e.message : String(e)
+            }`,
+          );
+          await tx.agent.update({
+            where: { agent_id: a.agent_id },
+            data: { cron_enabled: false, cron_next_fire_at: null },
+          });
+        }
+      }
+      return claimed;
+    });
+  } catch (e) {
+    // Transaction-level failure — log and bail. Next tick will retry.
+    console.error(
+      `cron: claim transaction failed: ${e instanceof Error ? e.message : String(e)}`,
+    );
+    result.errors += 1;
+    return result;
+  }
+
+  result.considered = dueAgents.length;
+
+  // Phase 2: fire each agent outside the transaction. Failures are
+  // per-agent — one bad agent must not block the rest of the tick.
+  for (const due of dueAgents) {
+    try {
+      const agent = await prisma.agent.findUnique({
+        where: { agent_id: due.agent_id },
+      });
+      if (agent === null) continue;
+
+      // Overlap policy. "skip" is the only supported value in v1; the
+      // column is keyed for future "queue" / "parallel" semantics.
+      const policy =
+        (agent as unknown as { cron_overlap_policy?: string })
+          .cron_overlap_policy ?? "skip";
+      if (policy === "skip" && (await hasActiveCronRun(agent.agent_id))) {
+        result.skipped_overlap += 1;
+        console.log(
+          `cron: skipping agent_id=${agent.agent_id} — previous cron run still active`,
+        );
+        continue;


cron_last_fired_at set before overlap check, so skipped runs show a false "last fired" timestamp

cron_last_fired_at = now is committed inside the claim transaction (line 209) for every claimed agent. The overlap check (hasActiveCronRun) runs after the transaction commits (line 250). When the check fires and a previous run is still active, the agent is skipped via continue — no session is created — but cron_last_fired_at already reflects now in the database. The UI then shows "last: X minutes ago" for a run that never happened, with no matching Session row users can inspect.

greptile-apps · 2026-05-19T04:37:57Z

+      for (const a of claimed) {
+        try {
+          const next = computeNextFireAt(a.cron_schedule, a.cron_timezone, now);
+          await tx.agent.update({
+            where: { agent_id: a.agent_id },
+            data: { cron_next_fire_at: next, cron_last_fired_at: now },
+          });
+        } catch (e) {
+          console.error(
+            `cron: disabling agent ${a.agent_id} — invalid schedule at fire time: ${
+              e instanceof Error ? e.message : String(e)
+            }`,
+          );
+          await tx.agent.update({
+            where: { agent_id: a.agent_id },
+            data: { cron_enabled: false, cron_next_fire_at: null },
+          });
+        }
+      }
+      return claimed;
+    });
+  } catch (e) {
+    // Transaction-level failure — log and bail. Next tick will retry.
+    console.error(
+      `cron: claim transaction failed: ${e instanceof Error ? e.message : String(e)}`,
+    );
+    result.errors += 1;
+    return result;
+  }
+
+  result.considered = dueAgents.length;
+
+  // Phase 2: fire each agent outside the transaction. Failures are
+  // per-agent — one bad agent must not block the rest of the tick.
+  for (const due of dueAgents) {
+    try {
+      const agent = await prisma.agent.findUnique({
+        where: { agent_id: due.agent_id },
+      });
+      if (agent === null) continue;
+
+      // Overlap policy. "skip" is the only supported value in v1; the
+      // column is keyed for future "queue" / "parallel" semantics.
+      const policy =
+        (agent as unknown as { cron_overlap_policy?: string })
+          .cron_overlap_policy ?? "skip";
+      if (policy === "skip" && (await hasActiveCronRun(agent.agent_id))) {
+        result.skipped_overlap += 1;
+        console.log(
+          `cron: skipping agent_id=${agent.agent_id} — previous cron run still active`,
+        );
+        continue;
+      }
+
+      await fireCronRun(agent as AgentRow, now);


Agents disabled for invalid schedule are still fired in Phase 2

When computeNextFireAt throws for an agent (invalid schedule mutated between last fire and this tick), the catch block sets cron_enabled = false and cron_next_fire_at = null inside the transaction — correctly preventing future fires. However, the agent is still present in the claimed array returned from the transaction, and Phase 2 iterates over dueAgents without checking cron_enabled. The result is that the "now disabled" agent still gets fireCronRun called, spawning one unintended session. The error log says "disabling" but the code fires anyway.

ishaan-berri added 13 commits May 18, 2026 21:30

deps: add cron-parser for schedule evaluation

61ce76f

Used by src/server/cron.ts to compute next-fire instants. Standard library — handles 5-field crons, IANA timezones, and DST correctly.

session route: use shared runBringUp from session-bringup.ts

058f96a

Drops ~370 lines of inline bring-up logic. Route is now a thin shell: auth + body parse + warm claim + Session row create + delegate to runBringUp. Behavior unchanged.

ui(agent page): wire CronEditor + badge cron-driven sessions

4167f31

Adds a Schedule row to the Configuration block and a small "cron" badge alongside the status pill on cron-triggered sessions in the Sessions list so the user can tell at a glance which runs were auto-started.

ishaan-berri mentioned this pull request May 19, 2026

Add scheduled / cron triggers for agents #186

Closed

greptile-apps Bot reviewed May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: scheduled (cron) triggers for agents — closes #186#196

feat: scheduled (cron) triggers for agents — closes #186#196
ishaan-berri wants to merge 13 commits into
mainfrom
litellm_add-cron-triggers

ishaan-berri commented May 19, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 19, 2026

Important Files Changed

Uh oh!

greptile-apps Bot May 19, 2026

Uh oh!

greptile-apps Bot May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ishaan-berri commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Schema

Scheduler

Answering "how does this work with multiple pods?" (the comment on the issue)

API

UI

Test plan

Out of scope for v1

Uh oh!

greptile-apps Bot commented May 19, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

greptile-apps Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ishaan-berri commented May 19, 2026 •

edited

Loading