feat: add GithubTrigger and refactor trigger interface around create_pending_run#119
feat: add GithubTrigger and refactor trigger interface around create_pending_run#119tofarr wants to merge 4 commits into
Conversation
…pending_run
Adds a new poll-based trigger type that fires when configured GitHub
repositories receive new events. Refactors the trigger abstraction so each
subclass owns the decision to fire AND the resulting AutomationRun, which
lets GithubTrigger attach the triggering events to run.event_payload.
Trigger interface (openhands/automation/schemas.py)
---------------------------------------------------
- New _TriggerBase abstract class with async create_pending_run(session,
automation, now=None) -> AutomationRun | None. None means 'not due'.
- CronTrigger delegates to the existing cron evaluator, then calls
utils.run.create_pending_run on a fire.
- EventTrigger always returns None (webhook-driven, never polled).
- GithubTrigger:
* github_access_token (SecretStr) + repositories (list[str]) + optional
event_types allow-list.
* Polls /repos/{owner}/{repo}/events for every configured repo
concurrently via wait_all.
* Collects every matching event newer than last_triggered_at (or
created_at for the first poll) and attaches them to
run.event_payload as {source: 'github_trigger', events: [...]}.
* Each event is tagged with its source _repository.
- Added @field_serializer for github_access_token so SecretStr round-trips
through the JSON column (fixes a latent persistence crash).
- Added TriggerAdapter (TypeAdapter[Trigger]) so callers can validate the
discriminated union directly.
Concurrency helper (openhands/automation/utils/async_utils.py)
--------------------------------------------------------------
- New wait_all() that mirrors openhands SDK's async_utils.wait_all:
runs an iterable of coroutines concurrently, preserves input order,
aggregates exceptions, supports optional timeout.
Scheduler (openhands/automation/scheduler.py)
---------------------------------------------
- Replaced the inline cron-only is_automation_due filter at L120 with
_create_pending_runs(session, automations, now), which:
* Parses each automation.trigger JSON into a typed trigger model
(ValidationError -> log and skip).
* Awaits trigger.create_pending_run sequentially (shared AsyncSession
is not safe for concurrent use; cross-trigger parallelism still
happens inside each trigger).
* Skips per-trigger exceptions so one bad trigger cannot starve the
batch.
- Generalised the firing log line (no longer assumes cron-only trigger
shape).
Tests
-----
- tests/test_async_utils.py: ordering, concurrency, exception aggregation,
timeout cancellation.
- tests/test_triggers.py: cron/event/github create_pending_run paths,
GithubTrigger validation, event-type filtering, never-triggered cutoff,
disabled short-circuit, non-200 handling, multi-repo event collection.
Uses real SQLite-in-memory sessions; only the GitHub HTTP transport is
replaced via httpx.MockTransport.
- tests/test_scheduler_create_pending_runs.py: mixed-trigger batches,
invalid trigger config skipped, one failing trigger does not block the
rest.
- tests/conftest.py: new sqlite_session / sqlite_session_factory /
sqlite_engine / patch_github_transport fixtures.
499 unit tests pass; the only errors in this sandbox are testcontainer
setups that need a Docker daemon.
Co-authored-by: openhands <openhands@all-hands.dev>
|
🚀 Deploy Preview PR Created/Updated A deploy preview has been created/updated for this PR. Deploy PR: https://github.com/OpenHands/deploy/pull/4311 Once the deploy PR's CI passes, the automation service will be deployed to the feature environment. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Solid architecture and excellent test coverage. The trigger abstraction refactor is clean and the GitHub polling implementation is well-structured. Found some operational concerns around rate limiting and unbounded concurrency that should be addressed for production robustness.
| description=( | ||
| "Repositories to poll, each as 'owner/name' " | ||
| "(e.g. 'All-Hands-AI/OpenHands')." | ||
| ), |
There was a problem hiding this comment.
🟠 Important: No max_length constraint on repositories list.
With a 60-second polling interval and GitHub's 5000/hour rate limit, users can configure ~83 repos max before hitting rate limits (5000 requests/hour ÷ 60 polls/hour = 83 repos). Consider:
repositories: list[str] = Field(
...,
min_length=1,
max_length=50, # Conservative limit to avoid rate limits
description=...
)Or document the practical limit in the field description.
| "GitHub poll for %s returned status %s", | ||
| repo, | ||
| resp.status_code, | ||
| ) |
There was a problem hiding this comment.
🟡 Suggestion: Distinguish rate limit errors from other failures.
GitHub returns 403 with X-RateLimit-Remaining: 0 when rate limited, and 429 for abuse detection. These are operationally different from network errors or invalid repos. Consider:
if resp.status_code == 403:
remaining = resp.headers.get('X-RateLimit-Remaining')
if remaining == '0':
reset_time = resp.headers.get('X-RateLimit-Reset')
logger.warning(
"GitHub rate limit exceeded for %s (resets at %s)",
repo, reset_time
)
else:
logger.warning("GitHub access denied for %s", repo)
return []
elif resp.status_code == 429:
logger.warning("GitHub abuse detection triggered for %s", repo)
return []
elif resp.status_code != 200:
logger.warning("GitHub poll for %s returned status %s", repo, resp.status_code)
return []This helps operators diagnose issues when automations stop firing.
| self._fetch_new_events(client, repo, cutoff) | ||
| for repo in self.repositories | ||
| ], | ||
| timeout=None, |
There was a problem hiding this comment.
🟠 Important: No timeout on wait_all could block scheduler.
With timeout=None and a 30-second httpx timeout per request, if someone configures 100 repos, this blocks the scheduler for up to 30 seconds (all requests run concurrently, so it's max(all request times), not sum). If one repo is slow or GitHub is having issues, this delays polling for ALL automations.
Consider:
per_repo: list[list[dict[str, Any]]] = await wait_all(
[
self._fetch_new_events(client, repo, cutoff)
for repo in self.repositories
],
timeout=45.0, # Slightly more than httpx timeout
)Or make it configurable if different users need different tradeoffs.
| automation config) — anyone with read access to the database can | ||
| see it. | ||
| """ | ||
| return v.get_secret_value() |
There was a problem hiding this comment.
🟡 Suggestion: Document the security model explicitly.
The comment explains why plain text storage is necessary, but consider adding to the class docstring:
"""Poll-based trigger that fires when new events appear on GitHub repos.
**Security Note**: The GitHub access token is stored in plain text in the
automation's trigger JSON column. Anyone with database read access can view
it. Treat the automation config with the same security controls as other
sensitive credentials (API keys, webhook secrets). The token should have
minimal scopes (read:org, read:repo) and be rotated periodically.
...
"""
Code Review AnalysisTaste Rating: 🟡 Acceptable - Solid architecture with clean abstractions, but operational concerns around rate limiting and unbounded concurrency need attention for production robustness. Architecture Strengths
[IMPROVEMENT OPPORTUNITIES]Beyond the inline comments, some broader observations: Concurrency Scaling: With
GitHub API Pagination: Currently fetches Token Scopes: The docstring mentions the token but doesn't specify required scopes. GitHub needs Retry Strategy: Transient GitHub errors (503, network blips) are treated as "no events." For critical automations, consider a retry-with-backoff for HTTP failures (but not for 404/403, which are permanent). [TESTING GAPS]The tests are excellent, but two scenarios worth adding:
[RISK ASSESSMENT]Overall PR: Risk Factors:
Recommendation:
VERDICT✅ Worth merging (after addressing the timeout + repo limit issues) Core logic is sound: The trigger abstraction is clean, error handling is robust, and tests prove the behavior. The operational concerns are fixable with small schema changes (add KEY INSIGHTThe architecture makes a pragmatic tradeoff: GitHub polling is simple (no webhook setup) but requires storing credentials and managing rate limits. The
|
… event_filter
Swaps the coarse event_types: list[str] allow-list for a single
event_filter: str | None JMESPath expression. The expression is
evaluated against each GitHub event JSON; an event is kept when the
expression returns a truthy value, and the trigger fires only if at
least one event matches.
This is much more expressive — callers can filter on nested fields
such as payload.action, payload.ref, actor.login, etc.,
without us having to expose each as a dedicated field. Examples:
type == 'PushEvent'
type == 'PullRequestEvent' && payload.action == 'opened'
type == 'PushEvent' && payload.ref == 'refs/heads/main'
Implementation notes:
- The expression is compiled at field-validation time so syntax errors
surface immediately as 422s when the trigger is POSTed, not at the
next poll cycle.
- The compiled parser is cached lazily on the instance (`_compiled_filter`
on `__dict__`) so each poll only pays the parse cost once.
- The persisted JSON keeps the raw string — round-trip through the
AutomationsTrigger JSON column is unchanged.
- JMESPath evaluation errors on a single event are logged and that
event is dropped (treated as non-matching); they cannot kill the
poll.
Tests
-----
- Removed the old `event_types` test.
- Added five new tests: keep/drop based on filter result, nested
`payload.action` matching, valid expression accepted, invalid
expression rejected at validation, whitespace-only filter normalised
to None.
- All 29 trigger / async-util tests pass; broader 124-test unit sweep
is green.
Co-authored-by: openhands <openhands@all-hands.dev>
…riggerBase
Adds a new poll-based trigger that fires when configured Slack channels
receive new messages. Like GithubTrigger, it polls the upstream service
in parallel and exposes a JMESPath event_filter.
Refactor
--------
Introduces _PollingTriggerBase, a private subclass of _TriggerBase shared
by GithubTrigger and SlackTrigger. The base class owns the bits that are
identical across both:
- The event_filter field + JMESPath compile-at-validation logic.
- The cached _compiled_event_filter property.
- _item_matches_filter() — runtime JMESPath evaluation with per-item
error handling.
- create_pending_run() — enabled/deleted/cutoff guards, async client
lifecycle, and persisting the run with the right event_payload envelope.
Concrete subclasses now only implement:
- _PAYLOAD_SOURCE / _PAYLOAD_KEY (ClassVars for the payload envelope).
- _build_client() — auth headers, base URL.
- _fetch_all_new(client, cutoff) — typically a wait_all() fan-out.
This drops GithubTrigger from ~250 lines to ~110 with no behaviour change
(all 24 existing trigger tests still pass).
SlackTrigger
------------
- type='slack', slack_token (SecretStr, xoxp-/xoxb-), channels (list of
Slack channel IDs, validated against ^[A-Z][A-Z0-9]+$).
- Polls /conversations.history with oldest=<unix-seconds> as the
high-water mark, derived from automation.last_triggered_at (or
created_at on first poll — no backfill).
- Honours Slack's quirky error model: HTTP 200 always for app errors, so
we check body['ok'] and log body['error']. HTTP 429 logs Retry-After
and treats the cycle as 'no messages'.
- Sorts newest-first responses into chronological order before delivery.
- Does defense-in-depth client-side cutoff comparison (ts_unix > cutoff)
even though Slack's 'oldest' is server-side exclusive.
- Tags each message with '_channel' so downstream code knows the origin.
- Payload: {source: 'slack_trigger', messages: [...]}.
Reference: https://github.com/tofarr/polling-experiment/blob/main/scripts/poll_slack_messages.py
Tests (16 new, 45 total in test_triggers.py)
--------------------------------------------
- TestSlackTriggerValidation x7: channel id regex (accept/reject lower
case, accept various prefixes, reject channel names like '#general',
reject empty list), invalid JMESPath rejected, SecretStr round-trip,
discriminator dispatch.
- TestSlackTriggerCreatesRun x9: happy path with payload + 'oldest'
assertion, no-new-messages, ok=false treated as no messages, 429 rate
limit treated as no messages, cutoff-edge message excluded, JMESPath
keep/exclude, multi-channel parallel fan-out, disabled short-circuit
(zero HTTP calls).
- conftest.py: new patch_slack_transport fixture mirroring
patch_github_transport.
All 140 unit tests pass (45 trigger, 3 scheduler create-pending-run,
6 async_utils, plus broader scheduler/event/config sweeps).
ruff format / ruff check clean.
Co-authored-by: openhands <openhands@all-hands.dev>
No, no it wasn't. |
That's weird - this is my first PR from Agent Canvas. I wonder is your name hardcoded in there somewhere? |
Yeah, very weird. I can't see anything obvious. I've also not heard of Agent Canvas prior to this PR. 🤷♀️ |
Summary
Adds a new
GithubTriggerpoll-based trigger type that fires when configured GitHub repositories receive new events, and refactors the trigger abstraction so that each trigger subclass owns both the decision to fire and the resultingAutomationRun. This letsGithubTriggerattach the triggering events torun.event_payloadfor the automation script to consume.Trigger interface (
openhands/automation/schemas.py)_TriggerBaseabstract class withasync create_pending_run(session, automation, now=None) -> AutomationRun | None. ReturningNonemeans "not due"; otherwise the trigger creates and returns the PENDING run.CronTriggerdelegates to the existing cron evaluator and, when due, calls the existingutils.run.create_pending_runhelper.EventTriggeralways returnsNone(webhook-driven, never polled).GithubTrigger(new):github_access_token: SecretStr,repositories: list[str](owner/name), optionalevent_typesallow-list./repos/{owner}/{repo}/eventsfor every configured repo concurrently viawait_all.last_triggered_at(orcreated_atfor the first poll, mirroringCronTrigger's no-backfill semantics) and attaches them torun.event_payloadas:{"source": "github_trigger", "events": [<github event>, ...]}_repository.@field_serializerforgithub_access_tokenso theSecretStrround-trips through the JSON column (this was a latent persistence crash —model_dump()on aSecretStris not JSON-serializable).TriggerAdapter(TypeAdapter[Trigger]) so callers can validate the discriminated union directly.Concurrency helper (
openhands/automation/utils/async_utils.py)New
wait_all()that mirrorsopenhands-sdk'sasync_utils.wait_all: runs an iterable of coroutines concurrently, preserves input order, aggregates exceptions, supports an optional timeout.Scheduler (
openhands/automation/scheduler.py)Replaced the inline cron-only
is_automation_duefilter at L120 with_create_pending_runs(session, automations, now), which:automation.triggerJSON into a typed trigger model (ValidationError→ log and skip).trigger.create_pending_runsequentially (the sharedAsyncSessionis not safe for concurrent use; cross-resource parallelism still happens inside each trigger, e.g. across repos inGithubTrigger).The firing log line was generalised — it no longer assumes a cron-only trigger shape.
Tests (24 new)
All against real code paths; only the GitHub HTTP transport is replaced (via
httpx.MockTransport) and the DB is SQLite-in-memory so we don't need Docker for these unit tests.tests/test_async_utils.py— ordering, concurrency, exception aggregation, timeout cancellation.tests/test_triggers.py— cron/event/githubcreate_pending_runpaths,GithubTriggervalidation, event-type filtering, never-triggered cutoff fallback, disabled short-circuit, non-200 handling, multi-repo event collection.tests/test_scheduler_create_pending_runs.py— mixed-trigger batches, invalid trigger config skipped, one failing trigger does not block the rest.tests/conftest.py— newsqlite_engine/sqlite_session_factory/sqlite_session/patch_github_transportfixtures (reusable by future trigger tests).Test results
testcontainers-based tests failing because no Docker daemon is available in the sandbox — unrelated to these changes.ruff format/ruff checkclean.Example trigger config
{ "type": "github", "github_access_token": "ghp_xxx", "repositories": ["All-Hands-AI/OpenHands", "openhands/automation"], "event_types": ["PushEvent", "PullRequestEvent"] }