feat(events): add count, timeseries, and field-value-discovery tools by ivanlysiuk-sysdig · Pull Request #83 · sysdiglabs/sysdig-mcp-server

ivanlysiuk-sysdig · 2026-05-15T21:47:27Z

Summary

Today the MCP server can list runtime events (list_runtime_events) and
fetch single events by id (get_event_info, get_event_process_tree),
but cannot answer aggregate questions efficiently. A few common
investigation questions force the LLM to pull the underlying event bodies
even when the answer is purely numeric:

"How many high-severity events fired in cluster X over the last 24h?"
→ today: paginate list_runtime_events, count the array locally. With
the documented 200-event-per-call cap, this is O(N/200) round trips
and O(N · payload-size) tokens for a question whose answer is one
integer per severity bucket.
"When did this burst start / stop?" → today: binary-search the
window with successively narrower list_runtime_events calls.
"What clusters / rules / image names are actually producing events
right now?" → today: guess values, filter, see if anything comes
back, iterate. Most failed filters are typos against names the model
cannot know up front.

This PR adds three tools that answer those three questions in a single
call each, using existing Sysdig public APIs.

New tools

Tool	Endpoint	Question it answers
`count_runtime_events`	`GET /api/v1/secureEvents/count`	"How many events match `<filter>` in the last N hours?" — returns a histogram across 16 event categories × 8 severity codes in one call. No pagination, no truncation.
`runtime_events_timeseries`	`GET /api/v1/secureEvents/timeseriesBy`	"When did this burst start / stop?" — returns per-bucket counts grouped by a categorical field (default `severity`). Server picks the coarsest bucket size that fits the `rows` upper bound; minimum bucket is 1 minute. Lets the model find a burst boundary in two calls (coarse pass + zoom).
`discover_runtime_event_field_values`	`GET /secure/events/v2/eventFields/{field}`	"What clusters / rules / image names are firing in this window?" — returns `suggested` (values active in the window) and `other` (values known to the tenant but inactive). Lets the model learn real names before writing a filter instead of guessing.

All three require policy-events.read — the same permission as
list_runtime_events and get_event_info. They're permission-gated by
the same RequiredPermissionsFromTool helper, so the existing
permission-based filtering keeps working.

Shared baseline + DSL fixes for `list_runtime_events`

The runtime-events baseline filter (not originator in ("benchmarks","compliance","cloudsec","scanning","hostscanning")) is
extracted into secure_events_common.go and reused by all four
runtime-event tools, so the four tools surface a consistent view of
"runtime activity" regardless of which one the model picks.

The same file holds the filter-expression DSL prose, also shared across
the four tools — keeping the LLM's filter intuition identical between
list / count / timeseries / discover.

While touching list_runtime_events to share the baseline, two
examples in its filter_expr description are fixed:

host.hostName startsWith "web-" → host.hostName starts with "web-"
(startsWith as one word is rejected by the backend with HTTP 400).
container.imageName = "nginx:latest" → container.image.repo = "nginx" and container.image.tag = "latest" (container.imageName is
rejected with HTTP 422 "unsupported metric"; the descriptors that
exist are container.image.repo, container.image.tag,
container.image.digest, container.image.id).

These examples currently render in the tool description and may have
been propagating into model-generated filters as syntax errors.

Worked example — "Investigate the most recent burst"

With these tools, a typical investigation can be:

discover_runtime_event_field_values(field: "ruleName", scope_hours: 24)
→ lists rule names actively producing events (the suggested bucket).
runtime_events_timeseries(scope_hours: 24, field: "severity", filter_expr: 'ruleName = "<picked-rule>"', rows: 1000) → coarse
pass; identifies which 15-minute / 1-hour buckets contain the
activity.
runtime_events_timeseries(scope_hours: <narrowed>, field: "severity", filter_expr: 'ruleName = "<picked-rule>"', rows: 3600)
→ forces 1-minute buckets across the narrowed range; pinpoints the
start and end of the burst.
count_runtime_events(scope_hours: <narrowed>, filter_expr: 'ruleName = "<picked-rule>"') → exact total.
list_runtime_events(scope_hours: <narrowed>, filter_expr: 'ruleName = "<picked-rule>"', limit: 5) → a few representative
events to read in detail.

Four calls instead of dozens of paginating reads, and the model never
needs to count event-array lengths to answer "how many".

Test plan

go build ./... clean.
go vet ./... clean.
go test ./internal/infra/mcp/tools/... passes (existing
list_runtime_events test still green after the shared-baseline
refactor; three new test files cover happy-path / defaults /
client-error / non-2xx for each new tool).
go generate ./internal/infra/sysdig/ cleanly regenerates
mocks/client_extension.go with the three new mock methods.
No changes to the OpenAPI spec — new endpoints are added as
hand-written client extensions following the existing
client_process_tree.go pattern.
No breaking changes — additive only. Tool registration in
cmd/server/main.go keeps the existing tools in place and appends
the three new ones.
Permission gating: all three new tools declare
policy-events.read so they're filtered out for tokens that lack it.
Each new tool's description includes 4–8 filter examples drawn
from real customer-investigation shapes, and the DSL prose lists ML
/ severity / engine recipes.

Notes

I considered exposing these capabilities under /secure/events/v1/*
to match the existing event endpoints, but the count,
timeseriesBy, and eventFields/* endpoints don't exist on that
family today — they live under /api/v1/secureEvents* and
/secure/events/v2/eventFields/*. If the backend later exposes them
under /secure/events/v1/*, the hand-written clients here are easy
to migrate.
The runtime-events 1-minute bucket floor and the 14-day window cap
are noted in the tool descriptions so the model can reason about
them up front.

🤖 Generated with Claude Code

Adds three new MCP tools so that end-to-end runtime-event investigations can be done in a few tool calls instead of paginating event bodies: - count_runtime_events: returns a 16-category × 8-severity histogram for any filter and time window in a single call. No pagination, no truncation. Backed by GET /api/v1/secureEvents/count. - runtime_events_timeseries: buckets event counts over time, grouped by a categorical field (default "severity"). Server picks the coarsest bucket size that fits the rows cap; minimum bucket is 1 minute. Lets the model find when a burst started/ended in two calls (coarse pass + zoom). Backed by GET /api/v1/secureEvents/timeseriesBy. - discover_runtime_event_field_values: enumerates the distinct values of a runtime-events field present in a window, split into "suggested" (active in window) and "other" (known but inactive). Lets the model learn real cluster/rule/image names before writing a filter instead of guessing. Backed by GET /secure/events/v2/eventFields/{field}. Also: - Extracts the runtime-events baseline filter ("not originator in (benchmarks, compliance, cloudsec, scanning, hostscanning)") into a shared helper used by all four runtime-event tools. - Shares the filter-expression DSL documentation across the four tools so the LLM applies identical filter intuition everywhere. - Fixes two filter-DSL examples in list_runtime_events whose syntax was rejected by the live API: 'host.hostName startsWith "web-"' is not accepted (correct form: 'host.hostName starts with "web-"'), and 'container.imageName' is not a valid field (correct forms: 'container.image.repo' and 'container.image.tag'). All three new tools require policy-events.read, the same permission as list_runtime_events and get_event_info. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings May 15, 2026 21:47

ivanlysiuk-sysdig requested a review from a team as a code owner May 15, 2026 21:47

Copilot started reviewing on behalf of ivanlysiuk-sysdig May 15, 2026 21:47 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(events): add count, timeseries, and field-value-discovery tools#83

feat(events): add count, timeseries, and field-value-discovery tools#83
ivanlysiuk-sysdig wants to merge 1 commit into
sysdiglabs:mainfrom
ivanlysiuk-sysdig:feat/event-investigation-tools

ivanlysiuk-sysdig commented May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ivanlysiuk-sysdig commented May 15, 2026

Summary

New tools

Shared baseline + DSL fixes for list_runtime_events

Worked example — "Investigate the most recent burst"

Test plan

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Shared baseline + DSL fixes for `list_runtime_events`