feat(cloudtrail-alerts): per-detector exclusions + 8 new detectors + SSO MFA fix#277
Open
Cre-eD wants to merge 2 commits into
Open
feat(cloudtrail-alerts): per-detector exclusions + 8 new detectors + SSO MFA fix#277Cre-eD wants to merge 2 commits into
Cre-eD wants to merge 2 commits into
Conversation
…SSO MFA fix Why: production deployment exposed alert-fatigue patterns — ~16 fires/day with ~89% false positives from governed automation (Pulumi CI bot, Prowler scanner, AWS Identity Center sessions, AWS service-linked roles). Built-in detectors matched the CIS CloudWatch.1-14 set verbatim, leaving no knob to tune. Plugin extensions: - `CloudTrailAlertOverride` schema with optional per-detector exclusions (userName / principalId / arn / arn-glob / userType / invokedBy) baked into the CloudWatch metric filter pattern at provision time. Suppression happens at the metric layer rather than in the Lambda enrichment step so excluded events never increment the metric, the alarm never trips, and the CW alarm dashboard reflects real signal (preserves SOC2 CC6.1 / ISO 27001 audit clarity). Threshold / period / evaluationPeriods overrides too. - `consoleLoginWithoutMfa` now scoped to `userIdentity.type = "IAMUser"`. AWS Identity Center / federated console sessions always emit `MFAUsed = "No"` because MFA happens upstream at the IdP — without the scope, every SSO console session triggered a CloudWatch.3 false positive. Identity Center coverage belongs in a separate detector against `signin.amazonaws.com / UserAuthentication` (not added here). - 8 beyond-CIS detectors covering attacker-blinding moves and exposure paths that CIS does not address: GuardDuty disable, Security Hub disable, IAM access key creation, S3 Block Public Access toggle, Lambda Function URL with AuthType=NONE, KMS key policy / grant changes, AWS Organizations SCP churn, anonymous external probes (`userIdentity.type=AWSAccount` AccessDenied with default threshold=10 to require sustained reconnaissance, not page on one-off probes). Backwards compat: plain-`bool` selector form is unchanged. Existing consumers who declare `iamPolicyChanges: true` see identical filter patterns and alarm parameters. New struct fields and `overrides` map are opt-in. Tests: 14 new test cases covering exclusion clause generation, deterministic ordering across YAML re-orderings, threshold / period override application, empty-string skipping, and SSO MFA scope. Existing tests updated for totalDetectors = 22 (14 CIS + 8 additions). Signed-off-by: Dmitrii Creed <creeed22@gmail.com>
Semgrep Scan ResultsRepository:
Scanned at 2026-05-19 17:36 UTC |
Security Scan ResultsRepository:
Scanned at 2026-05-19 17:36 UTC |
Cross-model review of PR #277 (codex + gemini, both fact-checked against AWS primary docs) surfaced 6 correctness bugs that would have shipped to prod. Each is fixed here with a test that asserts the desired behavior. 1. NOT EXISTS guard on every exclusion clause. CloudWatch metric-filter `$.field != "x"` returns FALSE when the field is absent, not TRUE. Without a NOT EXISTS guard, an unguarded exclusion like `$.userIdentity.userName != "bot"` would silently drop every event whose userIdentity lacks the field at the top level — AssumedRole events in particular carry userName at $.userIdentity.sessionContext.sessionIssuer .userName, NOT at $.userIdentity.userName. The unguarded form would have dropped the ENTIRE detector for every assumed-role principal, not just the bot we meant to exclude. buildExclusionClauses now generates: (($.field NOT EXISTS) || ($.field != "v1")) # single value (($.field NOT EXISTS) || (($.field != "v1") && ($.field != "v2"))) # multi-value Multi-value uses inner-AND (De Morgan'd from "NOT (v1 OR v2)"). New tests: TestApplyOverride_Exclusions, TestApplyOverride_NotExistsGuard_MultipleValues, TestApplyOverride_MultipleExclusionFields, TestApplyOverride_DeDupesValues. Ref: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html Ref: https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-event-reference-user-identity.html 2. Remove PutResourcePolicy from kmsKeyPolicyChanges. PutResourcePolicy is not a KMS API. (It exists on CloudTrail Lake and a few other services, but never under eventSource=kms.amazonaws.com, so the clause was permanent dead code.) KMS uses PutKeyPolicy for resource policy edits. Comment updated to record why this is excluded to prevent re-introduction. Ref: https://docs.aws.amazon.com/kms/latest/APIReference/API_Operations.html 3. Expand organizationsChanges to cover OU + delegated admin moves. Added MoveAccount, RemoveAccountFromOrganization, RegisterDelegatedAdministrator, DeregisterDelegatedAdministrator, EnableAWSServiceAccess, DisableAWSServiceAccess. RegisterDelegatedAdministrator is the canonical "blind the management account" move: delegate GuardDuty/Security Hub admin to a compromised member, then suppress findings there. Ref: https://docs.aws.amazon.com/organizations/latest/APIReference/API_Operations.html 4. Add Root to consoleLoginWithoutMfa scope. Earlier fix scoped to userIdentity.type=IAMUser to silence the AWS Identity Center false positive. That silently dropped Root console logins (Root type is "Root", not "IAMUser"). rootAccountUsage detector catches ALL Root activity including MFA-protected sessions; this one specifically surfaces Root-without-MFA for triage. Scope expanded to ($.userIdentity.type = "IAMUser" || = "Root"). 5. Remove DeleteInsight from securityHubDisabled. Insights are saved dashboard searches, not detectors. Deleting one is a UI/ housekeeping action, not a "visibility lost" event — pure noise. 6. Fail-fast validation of overrides keys + reflection test for wireup. - New `validateOverrides()` checked at provision time: rejects map keys that don't correspond to any detector. Catches YAML typos like `overrides: { unauthorizedApiCall: ... }` (missing trailing s) at deploy time with a list of valid keys, instead of silently dropping the override and leaving the operator wondering why their exclusion didn't take. - selectorChecks extracted to a single source of truth so reflection test can assert bidirectional consistency: every selector bool maps to a securityAlerts entry and vice versa, count == totalDetectors. Catches the regression where a contributor adds a detector but forgets the wireup. New tests: TestValidateOverrides_Empty, TestValidateOverrides_KnownKey, TestValidateOverrides_UnknownKeyIsLoudError, TestSelectorChecksWireUpAllDetectors, TestApplyOverride_WorstCaseBasePattern (covers leading whitespace + internal OR-groups in base filter pattern under the trim-and-rewrap path). Test summary: 17 tests now pass (was 14 in the first push of PR #277). Signed-off-by: Dmitrii Creed <creeed22@gmail.com>
Contributor
Author
Round 2: review-driven correctness fixesFollowup commit Fixed in this PR
Wontfix (filed in head — recorded as follow-ups, not blockers)
Tests17 tests pass (was 14). |
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Production deployment in
everworkerexposed the alert-fatigue profile the plugin produces today: ~16 fires/day, ~89% false positives. Over 14 days of CloudWatch alarm history:unauthorizedApiCallsGetFunctionUrlConfig(only real signal in the bucket).iamPolicyChangesintegrail-deployer-bot(Pulumi CI from Hetzner runner IPs)securityGroupChangesintegrail-deployer-bot(Pulumi CI deploy churn)s3BucketPolicyChangesintegrail-deployer-bot+ 6%github-actions-pulumi-githubconsoleLoginWithoutMfaThe built-in detectors matched the CIS CloudWatch.1-14 set verbatim and exposed only
booltoggles, so the only tuning options were "leave the noise on" or "disable a CIS control entirely." This PR adds knobs.What
1. Per-detector overrides
New
CloudTrailAlertOverridestruct, keyed by detector name inselectors.overrides:Exclusion clauses are baked into the CloudWatch metric filter pattern at provision time (suppression at source — excluded events never increment the metric, the alarm never trips, the Lambda is never invoked, the CW alarm dashboard reflects real signal). Threshold / period / evaluationPeriods overrides are exposed too.
Architecture choice rationale (cross-checked with codex + gemini): source-side suppression over runtime suppression because:
2. SSO MFA false positive fix
consoleLoginWithoutMfais now scoped touserIdentity.type = "IAMUser". AWS Identity Center / federated console sessions always emitConsoleLoginwithadditionalEventData.MFAUsed = "No"because MFA is enforced at the IdP, not at the AWS console step. Without this scope every SSO admin login triggered a CloudWatch.3 alert. Identity Center coverage belongs in a separate detector againstsignin.amazonaws.com / UserAuthenticationevents (not added here — different signal, different threshold model).References:
ConsoleLogincan showMFAUsed=No.UserAuthenticationis the canonical SSO sign-in event.3. Eight beyond-CIS detectors
Default off (opt-in). Cover attacker-blinding moves and exposure paths CIS CloudWatch.1-14 does not address:
guardDutyDisabledDeleteDetector,UpdateDetector,Disassociate/Delete/StopMonitoringMemberssecurityHubDisabledDisableSecurityHub,BatchDisableStandards,DisableImportFindingsForProduct,DeleteActionTarget,DeleteInsightaccessKeyCreationCreateAccessKeys3PublicAccessChangesPut/DeleteAccountPublicAccessBlock,Put/DeleteBucketPublicAccessBlocklambdaUrlPublicCreateFunctionUrlConfig/UpdateFunctionUrlConfigwithauthType = "NONE"kmsKeyPolicyChangesPutKeyPolicy,PutResourcePolicy,CreateGrant,Retire/RevokeGrantorganizationsChangesEnable/DisablePolicyType+LeaveOrganization+RemoveAccountFromOrganizationanonymousProbesuserIdentity.type=AWSAccountAccessDenied probes, threshold 10/5minBackwards compatibility
boolselector form is unchanged. Existing consumers see identical filter patterns + alarm parameters.overridesmap is optional; zero-valueCloudTrailAlertOverrideis a no-op.Tests
Existing tests updated for
totalDetectors = 22. 9 new test cases:TestApplyOverride_Exclusions— single exclusion fieldTestApplyOverride_MultipleExclusionFields— all 6 exclusion fields togetherTestApplyOverride_Deterministic— YAML re-ordering produces identical filter pattern (no Pulumi churn)TestApplyOverride_ThresholdAndPeriod— non-zero overrides appliedTestApplyOverride_EmptyOverrideIsNoop— zero value is no-opTestApplyOverride_EmptyStringsSkipped—["", "real-bot", ""]produces one clause, not threeTestEnabledAlerts_OverrideApplied— overrides reachenabledAlertsTestConsoleLoginWithoutMfa_RestrictedToIAMUser— locks down the SSO FP fixTestAnonymousProbes_DefaultThreshold— locks down the threshold=10 defaultTest plan
go test ./pkg/clouds/aws/... ./pkg/clouds/pulumi/aws/...passesgo vet ./pkg/clouds/aws/... ./pkg/clouds/pulumi/aws/...cleanintegrail/devopsPR uses the new fields against a live preview environment; verify alarms tripping → metric filter no longer matches Pulumi CI events → alarm history shows the gapeverworkerprod; re-measure 14d fire rate, target ≤2/week