feat(events): add count, timeseries, and field-value-discovery tools#83
Open
ivanlysiuk-sysdig wants to merge 1 commit into
Open
feat(events): add count, timeseries, and field-value-discovery tools#83ivanlysiuk-sysdig wants to merge 1 commit into
ivanlysiuk-sysdig wants to merge 1 commit into
Conversation
Adds three new MCP tools so that end-to-end runtime-event investigations
can be done in a few tool calls instead of paginating event bodies:
- count_runtime_events: returns a 16-category × 8-severity histogram for
any filter and time window in a single call. No pagination, no
truncation. Backed by GET /api/v1/secureEvents/count.
- runtime_events_timeseries: buckets event counts over time, grouped by
a categorical field (default "severity"). Server picks the coarsest
bucket size that fits the rows cap; minimum bucket is 1 minute. Lets
the model find when a burst started/ended in two calls (coarse pass +
zoom). Backed by GET /api/v1/secureEvents/timeseriesBy.
- discover_runtime_event_field_values: enumerates the distinct values
of a runtime-events field present in a window, split into "suggested"
(active in window) and "other" (known but inactive). Lets the model
learn real cluster/rule/image names before writing a filter instead of
guessing. Backed by GET /secure/events/v2/eventFields/{field}.
Also:
- Extracts the runtime-events baseline filter ("not originator in
(benchmarks, compliance, cloudsec, scanning, hostscanning)") into a
shared helper used by all four runtime-event tools.
- Shares the filter-expression DSL documentation across the four tools
so the LLM applies identical filter intuition everywhere.
- Fixes two filter-DSL examples in list_runtime_events whose syntax was
rejected by the live API: 'host.hostName startsWith "web-"' is not
accepted (correct form: 'host.hostName starts with "web-"'), and
'container.imageName' is not a valid field (correct forms:
'container.image.repo' and 'container.image.tag').
All three new tools require policy-events.read, the same permission as
list_runtime_events and get_event_info.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Today the MCP server can list runtime events (
list_runtime_events) andfetch single events by id (
get_event_info,get_event_process_tree),but cannot answer aggregate questions efficiently. A few common
investigation questions force the LLM to pull the underlying event bodies
even when the answer is purely numeric:
→ today: paginate
list_runtime_events, count the array locally. Withthe documented 200-event-per-call cap, this is O(N/200) round trips
and O(N · payload-size) tokens for a question whose answer is one
integer per severity bucket.
window with successively narrower
list_runtime_eventscalls.right now?" → today: guess values, filter, see if anything comes
back, iterate. Most failed filters are typos against names the model
cannot know up front.
This PR adds three tools that answer those three questions in a single
call each, using existing Sysdig public APIs.
New tools
count_runtime_eventsGET /api/v1/secureEvents/count<filter>in the last N hours?" — returns a histogram across 16 event categories × 8 severity codes in one call. No pagination, no truncation.runtime_events_timeseriesGET /api/v1/secureEvents/timeseriesByseverity). Server picks the coarsest bucket size that fits therowsupper bound; minimum bucket is 1 minute. Lets the model find a burst boundary in two calls (coarse pass + zoom).discover_runtime_event_field_valuesGET /secure/events/v2/eventFields/{field}suggested(values active in the window) andother(values known to the tenant but inactive). Lets the model learn real names before writing a filter instead of guessing.All three require
policy-events.read— the same permission aslist_runtime_eventsandget_event_info. They're permission-gated bythe same
RequiredPermissionsFromToolhelper, so the existingpermission-based filtering keeps working.
Shared baseline + DSL fixes for
list_runtime_eventsThe runtime-events baseline filter (
not originator in ("benchmarks","compliance","cloudsec","scanning","hostscanning")) isextracted into
secure_events_common.goand reused by all fourruntime-event tools, so the four tools surface a consistent view of
"runtime activity" regardless of which one the model picks.
The same file holds the filter-expression DSL prose, also shared across
the four tools — keeping the LLM's filter intuition identical between
list / count / timeseries / discover.
While touching
list_runtime_eventsto share the baseline, twoexamples in its
filter_exprdescription are fixed:host.hostName startsWith "web-"→host.hostName starts with "web-"(
startsWithas one word is rejected by the backend with HTTP 400).container.imageName = "nginx:latest"→container.image.repo = "nginx" and container.image.tag = "latest"(container.imageNameisrejected with HTTP 422 "unsupported metric"; the descriptors that
exist are
container.image.repo,container.image.tag,container.image.digest,container.image.id).These examples currently render in the tool description and may have
been propagating into model-generated filters as syntax errors.
Worked example — "Investigate the most recent burst"
With these tools, a typical investigation can be:
discover_runtime_event_field_values(field: "ruleName", scope_hours: 24)→ lists rule names actively producing events (the
suggestedbucket).runtime_events_timeseries(scope_hours: 24, field: "severity", filter_expr: 'ruleName = "<picked-rule>"', rows: 1000)→ coarsepass; identifies which 15-minute / 1-hour buckets contain the
activity.
runtime_events_timeseries(scope_hours: <narrowed>, field: "severity", filter_expr: 'ruleName = "<picked-rule>"', rows: 3600)→ forces 1-minute buckets across the narrowed range; pinpoints the
start and end of the burst.
count_runtime_events(scope_hours: <narrowed>, filter_expr: 'ruleName = "<picked-rule>"')→ exact total.list_runtime_events(scope_hours: <narrowed>, filter_expr: 'ruleName = "<picked-rule>"', limit: 5)→ a few representativeevents to read in detail.
Four calls instead of dozens of paginating reads, and the model never
needs to count event-array lengths to answer "how many".
Test plan
go build ./...clean.go vet ./...clean.go test ./internal/infra/mcp/tools/...passes (existinglist_runtime_eventstest still green after the shared-baselinerefactor; three new test files cover happy-path / defaults /
client-error / non-2xx for each new tool).
go generate ./internal/infra/sysdig/cleanly regeneratesmocks/client_extension.gowith the three new mock methods.hand-written client extensions following the existing
client_process_tree.gopattern.cmd/server/main.gokeeps the existing tools in place and appendsthe three new ones.
policy-events.readso they're filtered out for tokens that lack it.from real customer-investigation shapes, and the DSL prose lists ML
/ severity / engine recipes.
Notes
/secure/events/v1/*to match the existing event endpoints, but the
count,timeseriesBy, andeventFields/*endpoints don't exist on thatfamily today — they live under
/api/v1/secureEvents*and/secure/events/v2/eventFields/*. If the backend later exposes themunder
/secure/events/v1/*, the hand-written clients here are easyto migrate.
are noted in the tool descriptions so the model can reason about
them up front.
🤖 Generated with Claude Code