Skip to content

feat(taskspawner): add filePatterns filtering for github based tasks#891

Open
knechtionscoding wants to merge 1 commit intokelos-dev:mainfrom
datagravity-ai:feat/support-filesChanged-github
Open

feat(taskspawner): add filePatterns filtering for github based tasks#891
knechtionscoding wants to merge 1 commit intokelos-dev:mainfrom
datagravity-ai:feat/support-filesChanged-github

Conversation

@knechtionscoding
Copy link
Copy Markdown
Contributor

@knechtionscoding knechtionscoding commented Apr 3, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adds filePatterns filtering to GitHub pull request and webhook TaskSpawner sources, allowing users to filter work items by changed file paths using doublestar glob patterns.

This enables use cases like:

  • Only reviewing PRs that touch Go source files (include: ["**/*.go"])
  • Skipping documentation-only PRs (exclude: ["docs/**", "*.md"] with excludeOnly: true)
  • Triggering security reviews when auth code changes (include: ["internal/auth/**"])

Key changes:

  • New API type FilePatternFilter with include, exclude, and excludeOnly fields. excludeOnly inverts exclude logic so items are only rejected when all changed files match exclude patterns (e.g., skip docs-only PRs).
  • filePatterns field on githubPullRequests — file lists are fetched from the GitHub API after cheap label/author/draft filters but before expensive per-PR review and comment fetches.
  • filePatterns field on githubWebhook filters — for push events, changed files are extracted directly from the payload (no API call). For pull_request events, files are lazily fetched from the GitHub API using the workspace's secretRef.
  • {{.ChangedFiles}} template variable — available in promptTemplate, branch, and metadata templates for both polling and webhook sources. File lists are automatically fetched when the template references ChangedFiles, even without filePatterns configured.

Which issue(s) this PR is related to:

fixes: #778

Special notes for your reviewer:

  • The doublestar/v4 dependency was already present as an indirect dependency; this PR promotes it to a direct dependency.
  • File pattern matching uses source.MatchesFilePatterns as a shared implementation used by both the polling source and webhook filter paths.
  • The webhook handler lazily enriches ChangedFiles only when a spawner actually needs them (has filePatterns in a filter or references ChangedFiles in a template), avoiding unnecessary API calls.

Does this PR introduce a user-facing change?

Add filePatterns filtering to githubPullRequests and githubWebhook sources, allowing TaskSpawners to filter by changed file paths using glob patterns. A new {{.ChangedFiles}} template variable exposes the list of changed files in prompts.

Summary by cubic

Adds file-based filtering for GitHub PRs and webhooks using doublestar globs, and exposes {{.ChangedFiles}} in templates. This lets spawners run only when relevant files change and pass file context to agents.

  • New Features

    • FilePatternFilter with include, exclude, and excludeOnly.
    • filePatterns on githubPullRequests and githubWebhook filters.
    • {{.ChangedFiles}} available in prompt, branch, and metadata templates (newline-joined).
    • PR polling: fetch PR files after cheap filters; also auto-fetch when templates reference ChangedFiles.
    • Webhooks: push events collect files from payload; PR events fetch via GitHub API using the workspace secretRef; auto-fetch when filters use filePatterns or templates reference ChangedFiles.
    • Shared matching via source.MatchesFilePatterns.
    • Examples under examples/12-taskspawner-file-patterns.
  • Dependencies

    • Promote github.com/bmatcuk/doublestar/v4 to a direct dependency.

Written for commit 978d083. Summary will update on new commits.

@knechtionscoding knechtionscoding marked this pull request as ready for review April 3, 2026 07:49
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 18 files

@knechtionscoding
Copy link
Copy Markdown
Contributor Author

@gjkim42 the only thing of note is the changedFiles is new line separated rather than allowing it to be a range. I'm ambivalent to this really, so wanted to leave it up to you.

@github-actions github-actions bot added needs-triage needs-kind Indicates an issue or PR lacks a kind/* label needs-priority needs-actor labels Apr 3, 2026
@knechtionscoding
Copy link
Copy Markdown
Contributor Author

@gjkim42 I think this is ready for your review

@gjkim42 gjkim42 added priority/important-longterm triage-accepted kind/api Categorizes issue or PR as related to API changes kind/feature Categorizes issue or PR as related to a new feature labels Apr 7, 2026
@gjkim42
Copy link
Copy Markdown
Collaborator

gjkim42 commented Apr 7, 2026

/kelos review

@github-actions github-actions bot added release-note and removed needs-triage needs-kind Indicates an issue or PR lacks a kind/* label needs-priority labels Apr 7, 2026
@gjkim42 gjkim42 self-assigned this Apr 7, 2026
@kelos-bot
Copy link
Copy Markdown

kelos-bot bot commented Apr 7, 2026

🤖 Kelos Task Status

Task kelos-reviewer-891 has succeeded. ✅

@knechtionscoding knechtionscoding force-pushed the feat/support-filesChanged-github branch from 94f10ba to e24fc22 Compare April 7, 2026 13:35
Copy link
Copy Markdown

@kelos-bot kelos-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Kelos Reviewer Agent @gjkim42

Review Summary

Verdict: COMMENT
Scope: Adds filePatterns glob filtering for GitHub PR and webhook TaskSpawner sources with a new {{.ChangedFiles}} template variable.

Overall this is a well-structured feature with thorough tests, good lazy-enrichment patterns, and clear API design. I have two findings I'd like maintainer input on before approving.

Findings

Correctness

  • spawnerNeedsChangedFiles missing metadata checkinternal/webhook/handler.go:493-508 only checks PromptTemplate and Branch for "ChangedFiles", but the equivalent templateReferencesChangedFiles in cmd/kelos-spawner/main.go:737-756 also checks Metadata.Labels and Metadata.Annotations. If a webhook spawner references {{.ChangedFiles}} only in a metadata template, the webhook handler will not fetch the file list and the template variable will render empty. These two functions should have the same detection logic — consider extracting a shared helper or mirroring the metadata checks in spawnerNeedsChangedFiles.

  • Hardcoded defaultGitHubAPIBaseURL in webhook file fetchinginternal/webhook/github_files.go:19 uses https://api.github.com. The polling source (internal/source/github_pr.go) respects a configurable BaseURL which can be a GHProxy or GitHub Enterprise URL. Webhook file fetching will not work correctly for GitHub Enterprise or GHProxy deployments. Consider plumbing the API base URL through (e.g., from the spawner's configuration or from the webhook handler's config), similar to how the polling path does it.

Tests

  • Test coverage is comprehensive: unit tests for MatchesFilePatterns (13 cases), integration tests for Discover with file patterns (4 scenarios), push event file extraction with deduplication, webhook filter matching, template rendering, and templateReferencesChangedFiles. All tests pass.

Conventions

  • Generated files (zz_generated.deepcopy.go, CRD YAMLs) are in sync — make verify passes.
  • go vet is clean, no lint issues.
  • PR description follows the template format correctly.

Code Quality

  • The doublestar.Match error is silently discarded (match, _ := doublestar.Match(...)) in multiple places in MatchesFilePatterns (internal/source/github_pr.go:50,57,69). While doublestar.Match only returns errors for invalid patterns (not runtime data issues), logging or validating patterns at config load time would be more user-friendly — a typo in a pattern would silently not match rather than producing a clear error.

Suggestions (optional)

  • Consider validating filePatterns glob syntax at admission/reconcile time so users get immediate feedback on malformed patterns rather than silent non-matching behavior.
  • The contributor noted ChangedFiles is newline-separated. This works well for prompt templates but could be limiting for metadata labels (which typically cannot contain newlines). If metadata use-cases are anticipated, consider whether a different separator or a list type would be more appropriate there.
  • Minor: extractPushEventFiles (internal/webhook/github_filter.go:449-476) iterates commit.Added, commit.Removed, commit.Modified separately with duplicated dedup logic. A small helper or iterating a combined slice would reduce repetition, but this is purely cosmetic.

/kelos needs-input

@knechtionscoding knechtionscoding force-pushed the feat/support-filesChanged-github branch from 70d2fcd to de8d70b Compare April 7, 2026 13:43
…s and webhooks

Allow TaskSpawners to filter GitHub pull requests and webhook events by
changed file paths using doublestar glob patterns. This enables
use cases like reviewing only PRs that touch Go source files, skipping
documentation-only PRs, or triggering security reviews when auth code
changes.

- Add FilePatternFilter type with include, exclude, and excludeOnly
  semantics to the TaskSpawner API
- Add filePatterns field to githubPullRequests and githubWebhook filters
- Expose {{.ChangedFiles}} template variable for prompts and branches
- For polling sources, fetch PR file lists from the GitHub API after
  cheap label/author/draft filters but before expensive review fetches
- For webhook push events, extract changed files from the payload
- For webhook PR events, lazily fetch files from the GitHub API when
  a filter or template requires them

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@knechtionscoding knechtionscoding force-pushed the feat/support-filesChanged-github branch from de8d70b to 978d083 Compare April 7, 2026 13:53
@knechtionscoding
Copy link
Copy Markdown
Contributor Author

@gjkim42 comments/issues resolved.

kelos-bot bot pushed a commit that referenced this pull request Apr 7, 2026
Add "consistent parallel implementation paths" convention to project
config and agent configs, based on recurring review feedback in PRs
#880 and #891 where polling and webhook code paths diverged in data
structures and field checks.

Fix gh pr edit in kelos-workers step 7a: {{.Number}} is the issue
number in the workers context, not the PR number. Remove the explicit
number so gh pr edit defaults to the current branch's PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/api Categorizes issue or PR as related to API changes kind/feature Categorizes issue or PR as related to a new feature needs-actor priority/important-longterm release-note triage-accepted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API: Add filePatterns filter and ChangedFiles enrichment to githubPullRequests for content-aware task routing

2 participants