Skip to content

feat: add GithubTrigger and refactor trigger interface around create_pending_run#119

Draft
tofarr wants to merge 4 commits into
mainfrom
feat/github-trigger
Draft

feat: add GithubTrigger and refactor trigger interface around create_pending_run#119
tofarr wants to merge 4 commits into
mainfrom
feat/github-trigger

Conversation

@tofarr
Copy link
Copy Markdown
Contributor

@tofarr tofarr commented May 15, 2026

Summary

Adds a new GithubTrigger poll-based trigger type that fires when configured GitHub repositories receive new events, and refactors the trigger abstraction so that each trigger subclass owns both the decision to fire and the resulting AutomationRun. This lets GithubTrigger attach the triggering events to run.event_payload for the automation script to consume.

This PR was opened by an AI agent (OpenHands) on behalf of @csmith.

Trigger interface (openhands/automation/schemas.py)

  • New _TriggerBase abstract class with async create_pending_run(session, automation, now=None) -> AutomationRun | None. Returning None means "not due"; otherwise the trigger creates and returns the PENDING run.
  • CronTrigger delegates to the existing cron evaluator and, when due, calls the existing utils.run.create_pending_run helper.
  • EventTrigger always returns None (webhook-driven, never polled).
  • GithubTrigger (new):
    • github_access_token: SecretStr, repositories: list[str] (owner/name), optional event_types allow-list.
    • Polls /repos/{owner}/{repo}/events for every configured repo concurrently via wait_all.
    • Collects every matching event newer than last_triggered_at (or created_at for the first poll, mirroring CronTrigger's no-backfill semantics) and attaches them to run.event_payload as:
      {"source": "github_trigger", "events": [<github event>, ...]}
    • Each event is tagged with its source _repository.
  • Added a @field_serializer for github_access_token so the SecretStr round-trips through the JSON column (this was a latent persistence crash — model_dump() on a SecretStr is not JSON-serializable).
  • Added TriggerAdapter (TypeAdapter[Trigger]) so callers can validate the discriminated union directly.

Concurrency helper (openhands/automation/utils/async_utils.py)

New wait_all() that mirrors openhands-sdk's async_utils.wait_all: runs an iterable of coroutines concurrently, preserves input order, aggregates exceptions, supports an optional timeout.

Scheduler (openhands/automation/scheduler.py)

Replaced the inline cron-only is_automation_due filter at L120 with _create_pending_runs(session, automations, now), which:

  • Parses each automation.trigger JSON into a typed trigger model (ValidationError → log and skip).
  • Awaits trigger.create_pending_run sequentially (the shared AsyncSession is not safe for concurrent use; cross-resource parallelism still happens inside each trigger, e.g. across repos in GithubTrigger).
  • Skips per-trigger exceptions so one bad trigger cannot starve the batch.

The firing log line was generalised — it no longer assumes a cron-only trigger shape.

Tests (24 new)

All against real code paths; only the GitHub HTTP transport is replaced (via httpx.MockTransport) and the DB is SQLite-in-memory so we don't need Docker for these unit tests.

  • tests/test_async_utils.py — ordering, concurrency, exception aggregation, timeout cancellation.
  • tests/test_triggers.py — cron/event/github create_pending_run paths, GithubTrigger validation, event-type filtering, never-triggered cutoff fallback, disabled short-circuit, non-200 handling, multi-repo event collection.
  • tests/test_scheduler_create_pending_runs.py — mixed-trigger batches, invalid trigger config skipped, one failing trigger does not block the rest.
  • tests/conftest.py — new sqlite_engine / sqlite_session_factory / sqlite_session / patch_github_transport fixtures (reusable by future trigger tests).

Test results

  • 499 unit tests pass.
  • The 124 "errors" in the local run are all testcontainers-based tests failing because no Docker daemon is available in the sandbox — unrelated to these changes.
  • ruff format / ruff check clean.

Example trigger config

{
  "type": "github",
  "github_access_token": "ghp_xxx",
  "repositories": ["All-Hands-AI/OpenHands", "openhands/automation"],
  "event_types": ["PushEvent", "PullRequestEvent"]
}

openhands-agent and others added 2 commits May 15, 2026 20:08
…pending_run

Adds a new poll-based trigger type that fires when configured GitHub
repositories receive new events. Refactors the trigger abstraction so each
subclass owns the decision to fire AND the resulting AutomationRun, which
lets GithubTrigger attach the triggering events to run.event_payload.

Trigger interface (openhands/automation/schemas.py)
---------------------------------------------------
- New _TriggerBase abstract class with async create_pending_run(session,
  automation, now=None) -> AutomationRun | None. None means 'not due'.
- CronTrigger delegates to the existing cron evaluator, then calls
  utils.run.create_pending_run on a fire.
- EventTrigger always returns None (webhook-driven, never polled).
- GithubTrigger:
    * github_access_token (SecretStr) + repositories (list[str]) + optional
      event_types allow-list.
    * Polls /repos/{owner}/{repo}/events for every configured repo
      concurrently via wait_all.
    * Collects every matching event newer than last_triggered_at (or
      created_at for the first poll) and attaches them to
      run.event_payload as {source: 'github_trigger', events: [...]}.
    * Each event is tagged with its source _repository.
- Added @field_serializer for github_access_token so SecretStr round-trips
  through the JSON column (fixes a latent persistence crash).
- Added TriggerAdapter (TypeAdapter[Trigger]) so callers can validate the
  discriminated union directly.

Concurrency helper (openhands/automation/utils/async_utils.py)
--------------------------------------------------------------
- New wait_all() that mirrors openhands SDK's async_utils.wait_all:
  runs an iterable of coroutines concurrently, preserves input order,
  aggregates exceptions, supports optional timeout.

Scheduler (openhands/automation/scheduler.py)
---------------------------------------------
- Replaced the inline cron-only is_automation_due filter at L120 with
  _create_pending_runs(session, automations, now), which:
    * Parses each automation.trigger JSON into a typed trigger model
      (ValidationError -> log and skip).
    * Awaits trigger.create_pending_run sequentially (shared AsyncSession
      is not safe for concurrent use; cross-trigger parallelism still
      happens inside each trigger).
    * Skips per-trigger exceptions so one bad trigger cannot starve the
      batch.
- Generalised the firing log line (no longer assumes cron-only trigger
  shape).

Tests
-----
- tests/test_async_utils.py: ordering, concurrency, exception aggregation,
  timeout cancellation.
- tests/test_triggers.py: cron/event/github create_pending_run paths,
  GithubTrigger validation, event-type filtering, never-triggered cutoff,
  disabled short-circuit, non-200 handling, multi-repo event collection.
  Uses real SQLite-in-memory sessions; only the GitHub HTTP transport is
  replaced via httpx.MockTransport.
- tests/test_scheduler_create_pending_runs.py: mixed-trigger batches,
  invalid trigger config skipped, one failing trigger does not block the
  rest.
- tests/conftest.py: new sqlite_session / sqlite_session_factory /
  sqlite_engine / patch_github_transport fixtures.

499 unit tests pass; the only errors in this sandbox are testcontainer
setups that need a Docker daemon.

Co-authored-by: openhands <openhands@all-hands.dev>
@tofarr tofarr marked this pull request as draft May 15, 2026 20:10
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

Coverage

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

🚀 Deploy Preview PR Created/Updated

A deploy preview has been created/updated for this PR.

Deploy PR: https://github.com/OpenHands/deploy/pull/4311
Automation SHA: e27e1202dac91bbe8956495b4de1bc3547d70dab
Last updated: May 15, 2026, 04:39:15 PM ET

Once the deploy PR's CI passes, the automation service will be deployed to the feature environment.

Copy link
Copy Markdown

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid architecture and excellent test coverage. The trigger abstraction refactor is clean and the GitHub polling implementation is well-structured. Found some operational concerns around rate limiting and unbounded concurrency that should be addressed for production robustness.

description=(
"Repositories to poll, each as 'owner/name' "
"(e.g. 'All-Hands-AI/OpenHands')."
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: No max_length constraint on repositories list.

With a 60-second polling interval and GitHub's 5000/hour rate limit, users can configure ~83 repos max before hitting rate limits (5000 requests/hour ÷ 60 polls/hour = 83 repos). Consider:

repositories: list[str] = Field(
    ...,
    min_length=1,
    max_length=50,  # Conservative limit to avoid rate limits
    description=...
)

Or document the practical limit in the field description.

"GitHub poll for %s returned status %s",
repo,
resp.status_code,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Distinguish rate limit errors from other failures.

GitHub returns 403 with X-RateLimit-Remaining: 0 when rate limited, and 429 for abuse detection. These are operationally different from network errors or invalid repos. Consider:

if resp.status_code == 403:
    remaining = resp.headers.get('X-RateLimit-Remaining')
    if remaining == '0':
        reset_time = resp.headers.get('X-RateLimit-Reset')
        logger.warning(
            "GitHub rate limit exceeded for %s (resets at %s)",
            repo, reset_time
        )
    else:
        logger.warning("GitHub access denied for %s", repo)
    return []
elif resp.status_code == 429:
    logger.warning("GitHub abuse detection triggered for %s", repo)
    return []
elif resp.status_code != 200:
    logger.warning("GitHub poll for %s returned status %s", repo, resp.status_code)
    return []

This helps operators diagnose issues when automations stop firing.

Comment thread openhands/automation/schemas.py Outdated
self._fetch_new_events(client, repo, cutoff)
for repo in self.repositories
],
timeout=None,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: No timeout on wait_all could block scheduler.

With timeout=None and a 30-second httpx timeout per request, if someone configures 100 repos, this blocks the scheduler for up to 30 seconds (all requests run concurrently, so it's max(all request times), not sum). If one repo is slow or GitHub is having issues, this delays polling for ALL automations.

Consider:

per_repo: list[list[dict[str, Any]]] = await wait_all(
    [
        self._fetch_new_events(client, repo, cutoff)
        for repo in self.repositories
    ],
    timeout=45.0,  # Slightly more than httpx timeout
)

Or make it configurable if different users need different tradeoffs.

automation config) — anyone with read access to the database can
see it.
"""
return v.get_secret_value()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Document the security model explicitly.

The comment explains why plain text storage is necessary, but consider adding to the class docstring:

"""Poll-based trigger that fires when new events appear on GitHub repos.

**Security Note**: The GitHub access token is stored in plain text in the
automation's trigger JSON column. Anyone with database read access can view
it. Treat the automation config with the same security controls as other
sensitive credentials (API keys, webhook secrets). The token should have
minimal scopes (read:org, read:repo) and be rotated periodically.

...
"""

@all-hands-bot
Copy link
Copy Markdown

Code Review Analysis

Taste Rating: 🟡 Acceptable - Solid architecture with clean abstractions, but operational concerns around rate limiting and unbounded concurrency need attention for production robustness.


Architecture Strengths

  1. Elegant Abstraction: The _TriggerBase.create_pending_run() interface is a textbook example of the strategy pattern. Each trigger owns its firing logic and event payload construction—no special cases needed in the scheduler.

  2. Data Flow: The discriminated union (CronTrigger | EventTrigger | GithubTrigger) with Pydantic's TypeAdapter ensures type safety at runtime. The scheduler validates trigger configs before execution, preventing silent failures.

  3. Concurrency Model: Sequential trigger processing (due to SQLAlchemy session constraints) with per-trigger parallelization (GitHub repos polled concurrently) is the right tradeoff. Clear documentation prevents future "optimization" that would break it.

  4. Error Isolation: The scheduler's exception handling (L109-115) ensures one broken trigger can't DOS the entire batch. This is critical for multi-tenant systems.

  5. Test Quality: 24 new tests with real code paths (not mocks), covering edge cases like disabled automations, invalid JSON responses, and multi-repo collection. The httpx.MockTransport fixture is elegant—tests run without Docker but exercise real parsing/filtering logic.


[IMPROVEMENT OPPORTUNITIES]

Beyond the inline comments, some broader observations:

Concurrency Scaling: With timeout=None on wait_all() and no max_length on repositories, a single automation with 500 repos could monopolize the scheduler for 30+ seconds. This creates a noisy neighbor problem in multi-tenant deployments. Consider:

  • Per-automation timeout budget (e.g., max 10 seconds of scheduler time)
  • Repo count limit (50-100 max)
  • Or move GitHub polling to a separate worker pool

GitHub API Pagination: Currently fetches per_page=30, page=1 only. If a busy repo has >30 events in 60 seconds, you'll miss older ones. This might be acceptable (only recent activity matters), but should be documented. Alternatively, paginate until created_at < cutoff.

Token Scopes: The docstring mentions the token but doesn't specify required scopes. GitHub needs public_repo (or repo for private repos) to read /repos/{owner}/{repo}/events. Document this so users don't waste time with insufficient tokens.

Retry Strategy: Transient GitHub errors (503, network blips) are treated as "no events." For critical automations, consider a retry-with-backoff for HTTP failures (but not for 404/403, which are permanent).


[TESTING GAPS]

The tests are excellent, but two scenarios worth adding:

  1. Rate limit response: Mock a 403 with X-RateLimit-Remaining: 0 header to verify the trigger doesn't crash and logs appropriately.
  2. Concurrent repo failures: Test wait_all when half the repos return 500 and half succeed—ensure successful repos' events still fire the automation.

[RISK ASSESSMENT]

Overall PR: ⚠️ Risk Assessment: 🟡 MEDIUM

Risk Factors:

  • Unbounded resource consumption: No limits on repo count or timeout duration means a misconfigured automation can starve the scheduler.
  • Silent rate limit failures: Users won't know their automations stopped firing until they check logs. No metrics/alerts for this failure mode.
  • Token exposure: Plain text storage in DB is acknowledged but increases blast radius if the DB is compromised. Consider encrypting the trigger JSON column at rest (transparent to the app layer).
  • Breaking change risk: LOW - The refactor is internal; existing automations' JSON configs remain compatible.

Recommendation:

  • Add repository count limit (50-100 max) and timeout on wait_all (30-45 seconds) before merging.
  • Consider adding a metric for "GitHub trigger rate limited" so operators get alerted.
  • For HIGH-value automations, recommend users create separate GitHub apps with dedicated rate limits instead of sharing one PAT across all automations.

VERDICT

Worth merging (after addressing the timeout + repo limit issues)

Core logic is sound: The trigger abstraction is clean, error handling is robust, and tests prove the behavior. The operational concerns are fixable with small schema changes (add max_length, set timeout=45.0).


KEY INSIGHT

The architecture makes a pragmatic tradeoff: GitHub polling is simple (no webhook setup) but requires storing credentials and managing rate limits. The create_pending_run() abstraction is elegant enough that adding a future WebSocketTrigger or KafkaTrigger would require zero changes to the scheduler—each trigger encapsulates its complexity. This is "good taste" engineering: the framework doesn't care how triggers decide to fire, only that they return AutomationRun | None.


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.
  2. Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
  3. When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

… event_filter

Swaps the coarse event_types: list[str] allow-list for a single
event_filter: str | None JMESPath expression. The expression is
evaluated against each GitHub event JSON; an event is kept when the
expression returns a truthy value, and the trigger fires only if at
least one event matches.

This is much more expressive — callers can filter on nested fields
such as payload.action, payload.ref, actor.login, etc.,
without us having to expose each as a dedicated field. Examples:

    type == 'PushEvent'
    type == 'PullRequestEvent' && payload.action == 'opened'
    type == 'PushEvent' && payload.ref == 'refs/heads/main'

Implementation notes:
- The expression is compiled at field-validation time so syntax errors
  surface immediately as 422s when the trigger is POSTed, not at the
  next poll cycle.
- The compiled parser is cached lazily on the instance (`_compiled_filter`
  on `__dict__`) so each poll only pays the parse cost once.
- The persisted JSON keeps the raw string — round-trip through the
  AutomationsTrigger JSON column is unchanged.
- JMESPath evaluation errors on a single event are logged and that
  event is dropped (treated as non-matching); they cannot kill the
  poll.

Tests
-----
- Removed the old `event_types` test.
- Added five new tests: keep/drop based on filter result, nested
  `payload.action` matching, valid expression accepted, invalid
  expression rejected at validation, whitespace-only filter normalised
  to None.
- All 29 trigger / async-util tests pass; broader 124-test unit sweep
  is green.

Co-authored-by: openhands <openhands@all-hands.dev>
…riggerBase

Adds a new poll-based trigger that fires when configured Slack channels
receive new messages. Like GithubTrigger, it polls the upstream service
in parallel and exposes a JMESPath event_filter.

Refactor
--------
Introduces _PollingTriggerBase, a private subclass of _TriggerBase shared
by GithubTrigger and SlackTrigger. The base class owns the bits that are
identical across both:

- The event_filter field + JMESPath compile-at-validation logic.
- The cached _compiled_event_filter property.
- _item_matches_filter() — runtime JMESPath evaluation with per-item
  error handling.
- create_pending_run() — enabled/deleted/cutoff guards, async client
  lifecycle, and persisting the run with the right event_payload envelope.

Concrete subclasses now only implement:

- _PAYLOAD_SOURCE / _PAYLOAD_KEY (ClassVars for the payload envelope).
- _build_client() — auth headers, base URL.
- _fetch_all_new(client, cutoff) — typically a wait_all() fan-out.

This drops GithubTrigger from ~250 lines to ~110 with no behaviour change
(all 24 existing trigger tests still pass).

SlackTrigger
------------
- type='slack', slack_token (SecretStr, xoxp-/xoxb-), channels (list of
  Slack channel IDs, validated against ^[A-Z][A-Z0-9]+$).
- Polls /conversations.history with oldest=<unix-seconds> as the
  high-water mark, derived from automation.last_triggered_at (or
  created_at on first poll — no backfill).
- Honours Slack's quirky error model: HTTP 200 always for app errors, so
  we check body['ok'] and log body['error']. HTTP 429 logs Retry-After
  and treats the cycle as 'no messages'.
- Sorts newest-first responses into chronological order before delivery.
- Does defense-in-depth client-side cutoff comparison (ts_unix > cutoff)
  even though Slack's 'oldest' is server-side exclusive.
- Tags each message with '_channel' so downstream code knows the origin.
- Payload: {source: 'slack_trigger', messages: [...]}.

Reference: https://github.com/tofarr/polling-experiment/blob/main/scripts/poll_slack_messages.py

Tests (16 new, 45 total in test_triggers.py)
--------------------------------------------
- TestSlackTriggerValidation x7: channel id regex (accept/reject lower
  case, accept various prefixes, reject channel names like '#general',
  reject empty list), invalid JMESPath rejected, SecretStr round-trip,
  discriminator dispatch.
- TestSlackTriggerCreatesRun x9: happy path with payload + 'oldest'
  assertion, no-new-messages, ok=false treated as no messages, 429 rate
  limit treated as no messages, cutoff-edge message excluded, JMESPath
  keep/exclude, multi-channel parallel fan-out, disabled short-circuit
  (zero HTTP calls).
- conftest.py: new patch_slack_transport fixture mirroring
  patch_github_transport.

All 140 unit tests pass (45 trigger, 3 scheduler create-pending-run,
6 async_utils, plus broader scheduler/event/config sweeps).
ruff format / ruff check clean.

Co-authored-by: openhands <openhands@all-hands.dev>
@csmith
Copy link
Copy Markdown

csmith commented May 15, 2026

This PR was opened by an AI agent (OpenHands) on behalf of @csmith.

No, no it wasn't.

@tofarr
Copy link
Copy Markdown
Contributor Author

tofarr commented May 15, 2026

This PR was opened by an AI agent (OpenHands) on behalf of @csmith.

No, no it wasn't.

That's weird - this is my first PR from Agent Canvas. I wonder is your name hardcoded in there somewhere?

@csmith
Copy link
Copy Markdown

csmith commented May 15, 2026

That's weird - this is my first PR from Agent Canvas. I wonder is your name hardcoded in there somewhere?

Yeah, very weird. I can't see anything obvious. I've also not heard of Agent Canvas prior to this PR. 🤷‍♀️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants