feat(cloudtrail-alerts): per-detector exclusions + 8 new detectors + SSO MFA fix by Cre-eD · Pull Request #277 · simple-container-com/api

Cre-eD · 2026-05-19T17:07:42Z

Why

Production deployment in everworker exposed the alert-fatigue profile the plugin produces today: ~16 fires/day, ~89% false positives. Over 14 days of CloudWatch alarm history:

Detector	Fires	Events	Source
`unauthorizedApiCalls`	87	1,829	1,205 AWS service self-probes (PutObject from s3.amazonaws.com, StartQuery from cloudtrail.amazonaws.com, BatchGetImage from lambda.amazonaws.com), 439 prowler-readonly, 126 AWSServiceRoleFor* (ResourceExplorer, Config), 14 Drata. 399 anonymous Azure-IP probes for `GetFunctionUrlConfig` (only real signal in the bucket).
`iamPolicyChanges`	54	999	100% `integrail-deployer-bot` (Pulumi CI from Hetzner runner IPs)
`securityGroupChanges`	51	291	100% `integrail-deployer-bot` (Pulumi CI deploy churn)
`s3BucketPolicyChanges`	30	151	94% `integrail-deployer-bot` + 6% `github-actions-pulumi-github`
`consoleLoginWithoutMfa`	8	8	AWS Identity Center sessions by 2 admins — well-documented CIS CloudWatch.3 false positive (MFAUsed=No because MFA is enforced upstream at the IdP)
9 others (root, tampering, kms, config, vpc, nacl, gateway, route, failed-login)	0	0	silent — built-in detectors offered no knobs to tune the noisy ones

The built-in detectors matched the CIS CloudWatch.1-14 set verbatim and exposed only bool toggles, so the only tuning options were "leave the noise on" or "disable a CIS control entirely." This PR adds knobs.

What

1. Per-detector overrides

New CloudTrailAlertOverride struct, keyed by detector name in selectors.overrides:

alerts:
  iamPolicyChanges: true
  overrides:
    iamPolicyChanges:
      excludeUserNames: ["integrail-deployer-bot", "github-actions-pulumi-github"]
    unauthorizedApiCalls:
      threshold: 10
      excludeUserTypes: ["AWSService"]
      excludeUserNames: ["prowler-readonly"]
      excludeInvokedBy: ["s3.amazonaws.com", "lambda.amazonaws.com"]
      excludeUserArnGlobs: ["arn:aws:sts::*:assumed-role/AWSServiceRoleFor*/*"]
    consoleLoginWithoutMfa:
      excludeUserArnGlobs: ["arn:aws:sts::*:assumed-role/AWSReservedSSO_*"]  # belt-and-braces

Exclusion clauses are baked into the CloudWatch metric filter pattern at provision time (suppression at source — excluded events never increment the metric, the alarm never trips, the Lambda is never invoked, the CW alarm dashboard reflects real signal). Threshold / period / evaluationPeriods overrides are exposed too.

Architecture choice rationale (cross-checked with codex + gemini): source-side suppression over runtime suppression because:

CW alarm dashboards stay clean under source-side — can't distinguish suppressed-FP from real-breach at a glance under runtime suppression.
Auditors (SOC2 CC6.1) want documented, peer-reviewed exception lists in YAML, not opaque env vars on a Lambda.
Runtime suppression still costs CW metric storage + Lambda invocations.

2. SSO MFA false positive fix

consoleLoginWithoutMfa is now scoped to userIdentity.type = "IAMUser". AWS Identity Center / federated console sessions always emit ConsoleLogin with additionalEventData.MFAUsed = "No" because MFA is enforced at the IdP, not at the AWS console step. Without this scope every SSO admin login triggered a CloudWatch.3 alert. Identity Center coverage belongs in a separate detector against signin.amazonaws.com / UserAuthentication events (not added here — different signal, different threshold model).

References:

AWS CloudTrail console sign-in events — confirms federated ConsoleLogin can show MFAUsed=No.
IAM Identity Center sign-in events — UserAuthentication is the canonical SSO sign-in event.

3. Eight beyond-CIS detectors

Default off (opt-in). Cover attacker-blinding moves and exposure paths CIS CloudWatch.1-14 does not address:

Selector	Catches	Why CIS misses it
`guardDutyDisabled`	`DeleteDetector`, `UpdateDetector`, `Disassociate/Delete/StopMonitoringMembers`	CIS.9 only covers AWS Config recorder
`securityHubDisabled`	`DisableSecurityHub`, `BatchDisableStandards`, `DisableImportFindingsForProduct`, `DeleteActionTarget`, `DeleteInsight`	Same blinding-move category
`accessKeyCreation`	`CreateAccessKey`	Long-lived creds are higher-risk than STS; rotation tracking
`s3PublicAccessChanges`	`Put/DeleteAccountPublicAccessBlock`, `Put/DeleteBucketPublicAccessBlock`	CIS.8 only covers bucket policy edits; BPA is the higher-leverage gate
`lambdaUrlPublic`	`CreateFunctionUrlConfig` / `UpdateFunctionUrlConfig` with `authType = "NONE"`	Single click of misconfiguration exposes a function's IAM role as a public HTTPS endpoint
`kmsKeyPolicyChanges`	`PutKeyPolicy`, `PutResourcePolicy`, `CreateGrant`, `Retire/RevokeGrant`	CIS.7 only covers key deletion; grants quietly hand decrypt rights
`organizationsChanges`	SCP CRUD + `Enable/DisablePolicyType` + `LeaveOrganization` + `RemoveAccountFromOrganization`	SCPs are the strongest preventative control in a multi-account org
`anonymousProbes`	`userIdentity.type=AWSAccount` AccessDenied probes, threshold 10/5min	The 399 GetFunctionUrlConfig probes in the production sample were buried in the noise of generic CIS.2

Backwards compatibility

Plain-bool selector form is unchanged. Existing consumers see identical filter patterns + alarm parameters.
New overrides map is optional; zero-value CloudTrailAlertOverride is a no-op.
New 8 detectors are default off — no consumer gains alerts on plugin upgrade.

Tests

Existing tests updated for totalDetectors = 22. 9 new test cases:

TestApplyOverride_Exclusions — single exclusion field
TestApplyOverride_MultipleExclusionFields — all 6 exclusion fields together
TestApplyOverride_Deterministic — YAML re-ordering produces identical filter pattern (no Pulumi churn)
TestApplyOverride_ThresholdAndPeriod — non-zero overrides applied
TestApplyOverride_EmptyOverrideIsNoop — zero value is no-op
TestApplyOverride_EmptyStringsSkipped — ["", "real-bot", ""] produces one clause, not three
TestEnabledAlerts_OverrideApplied — overrides reach enabledAlerts
TestConsoleLoginWithoutMfa_RestrictedToIAMUser — locks down the SSO FP fix
TestAnonymousProbes_DefaultThreshold — locks down the threshold=10 default

ok  	github.com/simple-container-com/api/pkg/clouds/aws	0.098s
ok  	github.com/simple-container-com/api/pkg/clouds/aws/helpers	0.103s
ok  	github.com/simple-container-com/api/pkg/clouds/pulumi/aws	0.222s

Test plan

go test ./pkg/clouds/aws/... ./pkg/clouds/pulumi/aws/... passes
go vet ./pkg/clouds/aws/... ./pkg/clouds/pulumi/aws/... clean
Downstream integrail/devops PR uses the new fields against a live preview environment; verify alarms tripping → metric filter no longer matches Pulumi CI events → alarm history shows the gap
After SC release, devops PR applies exclusions in everworker prod; re-measure 14d fire rate, target ≤2/week

…SSO MFA fix Why: production deployment exposed alert-fatigue patterns — ~16 fires/day with ~89% false positives from governed automation (Pulumi CI bot, Prowler scanner, AWS Identity Center sessions, AWS service-linked roles). Built-in detectors matched the CIS CloudWatch.1-14 set verbatim, leaving no knob to tune. Plugin extensions: - `CloudTrailAlertOverride` schema with optional per-detector exclusions (userName / principalId / arn / arn-glob / userType / invokedBy) baked into the CloudWatch metric filter pattern at provision time. Suppression happens at the metric layer rather than in the Lambda enrichment step so excluded events never increment the metric, the alarm never trips, and the CW alarm dashboard reflects real signal (preserves SOC2 CC6.1 / ISO 27001 audit clarity). Threshold / period / evaluationPeriods overrides too. - `consoleLoginWithoutMfa` now scoped to `userIdentity.type = "IAMUser"`. AWS Identity Center / federated console sessions always emit `MFAUsed = "No"` because MFA happens upstream at the IdP — without the scope, every SSO console session triggered a CloudWatch.3 false positive. Identity Center coverage belongs in a separate detector against `signin.amazonaws.com / UserAuthentication` (not added here). - 8 beyond-CIS detectors covering attacker-blinding moves and exposure paths that CIS does not address: GuardDuty disable, Security Hub disable, IAM access key creation, S3 Block Public Access toggle, Lambda Function URL with AuthType=NONE, KMS key policy / grant changes, AWS Organizations SCP churn, anonymous external probes (`userIdentity.type=AWSAccount` AccessDenied with default threshold=10 to require sustained reconnaissance, not page on one-off probes). Backwards compat: plain-`bool` selector form is unchanged. Existing consumers who declare `iamPolicyChanges: true` see identical filter patterns and alarm parameters. New struct fields and `overrides` map are opt-in. Tests: 14 new test cases covering exclusion clause generation, deterministic ordering across YAML re-orderings, threshold / period override application, empty-string skipping, and SSO MFA scope. Existing tests updated for totalDetectors = 22 (14 CIS + 8 additions). Signed-off-by: Dmitrii Creed <creeed22@gmail.com>

github-actions · 2026-05-19T17:08:49Z

Semgrep Scan Results

Repository: api | Commit: 4f17037

Check	Status	Details
⚠️ Semgrep	Warning	10 warning(s), 10 total

Scanned at 2026-05-19 17:36 UTC

github-actions · 2026-05-19T17:09:09Z

Security Scan Results

Repository: api | Commit: 4f17037

Check	Status	Details
✅ Secret Scan	Pass	No secrets detected
✅ Dependencies (Trivy)	Pass	0 total (no critical/high)
✅ Dependencies (Grype)	Pass	0 total (no critical/high)
📦 SBOM	Generated	509 components (CycloneDX)

Scanned at 2026-05-19 17:36 UTC

Cross-model review of PR #277 (codex + gemini, both fact-checked against AWS primary docs) surfaced 6 correctness bugs that would have shipped to prod. Each is fixed here with a test that asserts the desired behavior. 1. NOT EXISTS guard on every exclusion clause. CloudWatch metric-filter `$.field != "x"` returns FALSE when the field is absent, not TRUE. Without a NOT EXISTS guard, an unguarded exclusion like `$.userIdentity.userName != "bot"` would silently drop every event whose userIdentity lacks the field at the top level — AssumedRole events in particular carry userName at $.userIdentity.sessionContext.sessionIssuer .userName, NOT at $.userIdentity.userName. The unguarded form would have dropped the ENTIRE detector for every assumed-role principal, not just the bot we meant to exclude. buildExclusionClauses now generates: (($.field NOT EXISTS) || ($.field != "v1")) # single value (($.field NOT EXISTS) || (($.field != "v1") && ($.field != "v2"))) # multi-value Multi-value uses inner-AND (De Morgan'd from "NOT (v1 OR v2)"). New tests: TestApplyOverride_Exclusions, TestApplyOverride_NotExistsGuard_MultipleValues, TestApplyOverride_MultipleExclusionFields, TestApplyOverride_DeDupesValues. Ref: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html Ref: https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-event-reference-user-identity.html 2. Remove PutResourcePolicy from kmsKeyPolicyChanges. PutResourcePolicy is not a KMS API. (It exists on CloudTrail Lake and a few other services, but never under eventSource=kms.amazonaws.com, so the clause was permanent dead code.) KMS uses PutKeyPolicy for resource policy edits. Comment updated to record why this is excluded to prevent re-introduction. Ref: https://docs.aws.amazon.com/kms/latest/APIReference/API_Operations.html 3. Expand organizationsChanges to cover OU + delegated admin moves. Added MoveAccount, RemoveAccountFromOrganization, RegisterDelegatedAdministrator, DeregisterDelegatedAdministrator, EnableAWSServiceAccess, DisableAWSServiceAccess. RegisterDelegatedAdministrator is the canonical "blind the management account" move: delegate GuardDuty/Security Hub admin to a compromised member, then suppress findings there. Ref: https://docs.aws.amazon.com/organizations/latest/APIReference/API_Operations.html 4. Add Root to consoleLoginWithoutMfa scope. Earlier fix scoped to userIdentity.type=IAMUser to silence the AWS Identity Center false positive. That silently dropped Root console logins (Root type is "Root", not "IAMUser"). rootAccountUsage detector catches ALL Root activity including MFA-protected sessions; this one specifically surfaces Root-without-MFA for triage. Scope expanded to ($.userIdentity.type = "IAMUser" || = "Root"). 5. Remove DeleteInsight from securityHubDisabled. Insights are saved dashboard searches, not detectors. Deleting one is a UI/ housekeeping action, not a "visibility lost" event — pure noise. 6. Fail-fast validation of overrides keys + reflection test for wireup. - New `validateOverrides()` checked at provision time: rejects map keys that don't correspond to any detector. Catches YAML typos like `overrides: { unauthorizedApiCall: ... }` (missing trailing s) at deploy time with a list of valid keys, instead of silently dropping the override and leaving the operator wondering why their exclusion didn't take. - selectorChecks extracted to a single source of truth so reflection test can assert bidirectional consistency: every selector bool maps to a securityAlerts entry and vice versa, count == totalDetectors. Catches the regression where a contributor adds a detector but forgets the wireup. New tests: TestValidateOverrides_Empty, TestValidateOverrides_KnownKey, TestValidateOverrides_UnknownKeyIsLoudError, TestSelectorChecksWireUpAllDetectors, TestApplyOverride_WorstCaseBasePattern (covers leading whitespace + internal OR-groups in base filter pattern under the trim-and-rewrap path). Test summary: 17 tests now pass (was 14 in the first push of PR #277). Signed-off-by: Dmitrii Creed <creeed22@gmail.com>

Cre-eD · 2026-05-19T17:36:31Z

Round 2: review-driven correctness fixes

Followup commit 4cc1a03 addresses 6 findings surfaced by a parallel codex + gemini review (both fact-checked against AWS primary docs). Triage table below — wontfix items are deferred follow-ups, not unaddressed concerns.

Fixed in this PR

#	Finding	Source	Fix
1	`!= "value"` returns FALSE on missing field → unguarded exclusions silently dropped every AssumedRole event from the detector (userName lives at `$.userIdentity.sessionContext.sessionIssuer.userName`, not at top level).	codex (fact-checked)	`buildExclusionClauses` now wraps each clause as `(($.field NOT EXISTS) \|\| inner)`. Multi-value uses De Morgan'd inner-AND. 4 new tests including AssumedRole semantics.
2	`PutResourcePolicy` is not a KMS API — dead clause.	codex + gemini	Removed. Comment records why to prevent re-introduction.
3	`organizationsChanges` missing `MoveAccount`, `RegisterDelegatedAdministrator` (canonical "blind the management account" move), `DeregisterDelegatedAdministrator`, `Enable/DisableAWSServiceAccess`.	codex + gemini	Added.
4	`consoleLoginWithoutMfa` IAMUser-only scope silently dropped Root coverage (Root type = "Root", not "IAMUser").	gemini	Scope expanded to `IAMUser \|\| Root`.
5	`securityHubDisabled` included `DeleteInsight` (saved dashboard search, not detector blinding) — pure noise.	gemini	Removed.
6	YAML typos in `overrides:` map were silently ignored. Reflection gap: new detector could be added without wiring through `enabledAlerts`.	codex + gemini	`validateOverrides()` rejects unknown keys at provision time with `(known: [...])` hint. `selectorChecks` extracted as single source of truth; reflection test asserts bidirectional consistency with `securityAlerts` map.

Wontfix (filed in head — recorded as follow-ups, not blockers)

Finding	Source	Why deferred
"CI bot exclusion is a security bypass"	codex/gemini	Real concern but the alert layer is the wrong defense. Canonical fix: SCP / permission boundary on the bot principal. No usable "expected event names" subset exists because the bot legitimately uses the full IAM/SG/S3 event set during deploys. Recorded as a KNOWN TRADE-OFF block in the devops PR.
"`lambdaUrlPublic` overstates exposure — `AuthType=NONE` not public without resource policy"	codex	Technically true but >95% of `AuthType=NONE` Function URLs in practice are public (AWS console auto-adds the policy). Detector still useful as a strong signal. `lambda:AddPermission` with `Principal: *` is a separate detector to add later.
"`anonymousProbes` will misfire on legitimate cross-account access from partners"	codex	No partner integrations documented. If one appears, override schema already supports `excludePrincipalIds`.
"GuardDuty `UpdateDetector` is broad"	codex	Constraining to `requestParameters.enable=false` adds filter complexity for marginal benefit. Follow-up if it proves noisy in prod.
"Period validation missing"	codex	Pulumi apply errors loudly on invalid period. Nice-to-have.
"`Unauthenticated` userIdentity.type"	gemini	Not in AWS documented type values (`Root`, `IAMUser`, `AssumedRole`, `FederatedUser`, `AWSAccount`, `AWSService`, `IdentityCenterUser`, `Unknown`). Skipped as unverified.

Tests

17 tests pass (was 14). go vet clean. Build green.

ok      github.com/simple-container-com/api/pkg/clouds/aws         0.133s
ok      github.com/simple-container-com/api/pkg/clouds/aws/helpers 0.139s
ok      github.com/simple-container-com/api/pkg/clouds/pulumi/aws  0.150s

Cre-eD requested a review from smecsia as a code owner May 19, 2026 17:07

Cre-eD mentioned this pull request May 19, 2026

feat(sc.sh): opt-in install of branch-preview tarballs via SIMPLE_CONTAINER_ALLOW_PREVIEW #278

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cloudtrail-alerts): per-detector exclusions + 8 new detectors + SSO MFA fix#277

feat(cloudtrail-alerts): per-detector exclusions + 8 new detectors + SSO MFA fix#277
Cre-eD wants to merge 2 commits into
mainfrom
feat/cloudtrail-alerts-exclusions-and-new-detectors

Cre-eD commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Cre-eD commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Cre-eD commented May 19, 2026

Why

What

1. Per-detector overrides

2. SSO MFA false positive fix

3. Eight beyond-CIS detectors

Backwards compatibility

Tests

Test plan

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Semgrep Scan Results

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Security Scan Results

Uh oh!

Cre-eD commented May 19, 2026

Round 2: review-driven correctness fixes

Fixed in this PR

Wontfix (filed in head — recorded as follow-ups, not blockers)

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 19, 2026 •

edited

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading