A Kubernetes operator for monitoring AWS CloudWatch metrics with tag-based compliance validation.
Monitoring large-scale AWS environments often leads to Alert Fatigue or Blind Spots because static configurations can't keep up with dynamic infrastructure. Tagemon transforms your monitoring from static configuration files into a tag-driven ecosystem.
- Manual Overhead: Manually updating exporter configuration every time a new RDS instance or Auto Scaling Group is launched
- Inconsistent Alerting: Maintaining separate alert rules for each resource instead of a single alert rule that applies to all resources with tag-based dynamic thresholds
- Monitoring Gaps: Resources launched but not added to monitoring configuration become invisible, creating blind spots until someone manually discovers and configures them
- Infrastructure as Code via Tags: Stop editing YAML to monitor new resources. Simply tag an AWS resource, and Tagemon automatically discovers it and starts exporting its metrics to Prometheus
- Decentralized Thresholds: Empower developers to define their own alerting thresholds directly on AWS resource tags. Tagemon exports these as metrics, allowing for a single, universal Prometheus alert rule that respects per-resource limits
- Enforced Compliance: Automatically identify "Non-Compliant" resources that lack mandatory tags and view these gaps as Prometheus metrics to drive better cloud governance
- Reduce TTR (Time to Response): Monitoring is active the second a resource is tagged
- FinOps & Governance: Ensure 100% of your infrastructure is tagged and accounted for by tracking the
tagemon_resources_non_compliant_countmetric - Centralized Management: Single point of control for all CloudWatch monitoring across your organization, eliminating configuration drift and ensuring uniform metric collection standards
- ๐ฏ Tag-Based Monitoring: Monitor AWS resources based on tags
- ๐ CloudWatch Integration: Export CloudWatch metrics via YACE (Yet Another CloudWatch Exporter)
- โ Compliance Checking: Validate resources against tag policies
- ๐ Dynamic Configuration: Automatic deployment updates based on CRD changes
- ๐ Prometheus Compatible: Native Prometheus metrics exposition
- kubectl configured
- Helm 3.x (for Helm installation)
- AWS credentials with appropriate permissions
- AWS Resource Explorer configured with a view ARN
Create a values.yaml file:
controller:
serviceAccountName: tagemon-controller-manager
config:
yaceServiceAccountName: tagemon-yace
tagsHandler:
viewArn: "arn:aws:resource-explorer-2:us-west-2:ACCOUNT:view/NAME/ID"
region: "us-west-2"
interval: "60m"Install the chart:
helm install tagemon oci://docker.io/nextinsurancedevops/tagemon \
--namespace tagemon \
--create-namespace \
-f values.yaml๐ For all available configuration options, see chart/README.md
# Install CRDs
kubectl apply -f config/crd/bases/
# Deploy the operator
kubectl apply -f config/default/| Field | Type | Description | Example |
|---|---|---|---|
type |
string | AWS service namespace (format: AWS/SERVICE) |
AWS/EC2, AWS/RDS, AWS/ELB |
regions |
[]string | List of AWS regions to monitor | [us-east-1, us-west-2] |
awsRoles |
[]object | IAM roles for CloudWatch access | See AWS Roles |
statistics |
[]string | Default statistics for metrics | [Average, Maximum, Sum] |
period |
int32 | Default CloudWatch period in seconds (min: 1) | 300 (5 minutes) |
metrics |
[]object | Metrics to collect | See Metrics |
| Field | Type | Default | Description |
|---|---|---|---|
scrapingInterval |
int32 | - | Global scraping interval in seconds (min: 60) |
nilToZero |
bool | true |
Convert nil metric values to zero |
addCloudwatchTimestamp |
bool | false |
Include CloudWatch timestamp in metrics |
searchTags |
[]object | - | Filter resources by tags |
exportedTagsOnMetrics |
[]object | - | Tags to export as Prometheus labels |
dimensionNameRequirements |
[]string | - | Required CloudWatch dimension names for metrics |
podResources |
object | - | Resource requests/limits for YACE pods |
namePrefix |
string | - | Prefix for generated resource names (max 200 chars) |
resourceExplorerService |
string | - | Override AWS Resource Explorer service type (optional) |
This example demonstrates all features of Tagemon, including required fields and optional configurations:
apiVersion: tagemon.io/v1alpha1
kind: Tagemon
metadata:
name: tagemon-rds-monitoring
namespace: tagemon
spec:
# Required fields
type: AWS/RDS
regions:
- us-east-1
- us-west-2
awsRoles:
- roleArn: arn:aws:iam::123456789012:role/TagemonRole
statistics: [Maximum]
period: 60
metrics:
- name: CPUUtilization # name is required
statistics: [Average] # optional: overrides spec.statistics
period: 60 # optional: overrides spec.period
thresholdTags: # optional: tag-based compliance
- type: int
key: max_cpu_utilization_percent_threshold
resourceType: db
required: true # optional: defaults to true
- name: DBLoad
statistics: [Maximum] # optional
period: 60 # optional
thresholdTags: # optional
- type: int
key: max_db_load_threshold
resourceType: db
required: true # optional
- name: FreeStorageSpace
statistics: [Minimum] # optional
period: 60 # optional
thresholdTags: # optional
- type: int
key: allocated_storage_threshold
resourceType: db
required: false # optional
- type: int
key: low_free_storage_percent_threshold
resourceType: db
required: false # optional
- type: int
key: lowest_free_storage_percent_threshold
resourceType: db
required: false # optional
- type: int
key: storage_drop_mb_threshold
resourceType: db
required: false # optional
- name: ReplicaLag
statistics: [Minimum, Maximum] # optional
period: 60 # optional
thresholdTags: # optional
- type: int
key: high_replica_lag_threshold
resourceType: db
required: false # optional
# Optional fields
scrapingInterval: 240
nilToZero: true
# Optional: Filter resources by tags - only monitor resources with these tags
searchTags:
- key: monitor
value: "true"
# Optional: Dimension requirements for CloudWatch metrics
dimensionNameRequirements:
- DBInstanceIdentifier
# Optional: Export tags as Prometheus labels
exportedTagsOnMetrics:
- key: Environment
required: true
- key: DatabaseName
required: false
# Optional: Resource limits for YACE pods
podResources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256MiConfigure IAM roles for CloudWatch access:
awsRoles:
- roleArn: arn:aws:iam::123456789012:role/TagemonRoleFields:
roleArn(required): IAM role ARN (pattern:arn:aws:iam::ACCOUNT:role/NAME)
Define which CloudWatch metrics to collect. Each metric can override global settings and optionally define threshold tags for alerting.
Fields:
name(required): CloudWatch metric namestatistics(optional): Override default statistics (Sum,Average,Maximum,Minimum,SampleCount)period(optional): Override default period (min: 2 seconds)nilToZero(optional): Override default nil-to-zero behavioraddCloudwatchTimestamp(optional): Override timestamp behaviorthresholdTags(optional): Define tag-based thresholds for compliance validation (see Threshold Tags below)
See the complete example above for usage examples.
Validate resources against tag-based thresholds:
thresholdTags:
- type: percentage # int, bool, or percentage
key: cpu-threshold # Tag key on AWS resource
resourceType: instance # AWS resource type (e.g., instance, database)
required: true # Whether tag is mandatory (default: true)Threshold Types:
percentage: Value is a percentage (0-100)int: Value is an integerbool: Value is a boolean (true/false)
Example Use Case:
Tag an EC2 instance with cpu-threshold: 80, and Tagemon will validate that the instance has this tag and can use it for alerting thresholds.
Alerting with Threshold Tags:
Tagemon exposes threshold values as Prometheus metrics that can be used in alerting rules. The metric name follows the pattern tagemon_{threshold_tag_key} with labels tag_Name and account_id.
Example Prometheus alert rule that applies to all RDS instances using threshold tags:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: rds-high-cpu-alert
namespace: tagemon
spec:
groups:
- name: aws-rds
rules:
- alert: RDSHighCPUUtilization
annotations:
description: DB {{ $labels.tag_Name }} has a high CPU utilization
summary: 'DB {{ $labels.tag_Name }} has a high CPU Utilization, Utilization: {{ $value | printf "%.0f%%" }}'
expr: aws_rds_cpuutilization_average > on(tag_Name, account_id) tagemon_max_cpu_utilization_percent_threshold
for: 10m
labels:
severity: warningThis alert rule automatically applies to all RDS instances monitored by Tagemon. Each RDS instance can have its own threshold value defined via the max_cpu_utilization_percent_threshold tag, allowing per-resource alerting thresholds. In this example:
aws_rds_cpuutilization_averageis the CloudWatch metric exported by YACE for all RDS instancestagemon_max_cpu_utilization_percent_thresholdis the threshold metric created by Tagemon from the tag on each RDS instance- The
on(tag_Name, account_id)clause ensures the comparison is done per-resource, matching each RDS instance's CPU utilization against its own threshold value - The alert fires for any RDS instance when its CPU utilization exceeds the threshold value defined in its tag
Filter which AWS resources to monitor based on tags:
searchTags:
- key: Environment
value: production
- key: Team
value: platformOnly resources matching all specified tags will be monitored.
Export AWS resource tags as Prometheus labels:
exportedTagsOnMetrics:
- key: Environment
required: false # If true, resources without this tag will be filtered out
- key: Application
required: trueNote: This allows you to query metrics by tag values in Prometheus (e.g., {Environment="production"}).
Configure resource requests and limits for YACE pods. See the complete example above for usage.
podResources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi-
AWS Resource Explorer: Enable and create a view for tag-based compliance validation
- Example ARN:
arn:aws:resource-explorer-2:us-west-2:123456789012:view/MainView/abc123 - Setup Guide
- Example ARN:
-
IAM Permissions: Configure IAM roles for both service accounts using IRSA (IAM Roles for Service Accounts)
The controller service account needs permissions for tag compliance validation:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"resource-explorer-2:Search"
],
"Resource": "*"
}]
}The YACE service account needs permissions to assume YACE roles and collect CloudWatch metrics. See YACE Authentication documentation for details.
Minimum required permissions:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"tag:GetResources",
"cloudwatch:GetMetricData",
"cloudwatch:GetMetricStatistics",
"cloudwatch:ListMetrics",
"sts:AssumeRole"
],
"Resource": "*"
}]
}Note: Additional permissions may be required depending on the AWS services you're monitoring (e.g., apigateway:GET for API Gateway, autoscaling:DescribeAutoScalingGroups for Auto Scaling). See the complete YACE permissions list.
The operator is configured via the Helm chart values. Key settings:
controller:
serviceAccountName: tagemon-controller-manager
config:
tagsHandler:
viewArn: "arn:aws:resource-explorer-2:REGION:ACCOUNT:view/NAME/ID" # REQUIRED
region: "us-west-2" # REQUIRED
interval: "1m"
yaceServiceAccountName: tagemon-yace # Service account for YACE podsFor manual deployments, see config/controller-config.example.yaml.
Tagemon automatically validates AWS resources against tagging policies defined in your Tagemon CRDs. The tagging policy is built from:
- Name Tag: Tagemon automatically collects and expects the
Nametag to be present on all resources. This tag is used to identify resources in metrics and alerts (exposed as thetag_Namelabel in Prometheus metrics). - Required Tags: Tags specified in
exportedTagsOnMetricswithrequired: truemust be present on resources - Threshold Tags: Tags defined in
thresholdTagsmust exist and have valid values based on their type (int, bool, percentage) - Search Tags: Resources must match all
searchTagsto be considered for monitoring
Tagemon exposes the following Prometheus metrics for tagging policy compliance:
tagemon_resources_non_compliant_count: A gauge metric tracking the number of non-compliant resources, labeled by:resource_type: The AWS resource type (e.g.,rds/db,ec2/instance)account_id: The AWS account ID- Custom labels: Any labels configured via
nonCompliantMetricCustomLabelsin the controller config
This metric helps you monitor and alert on tagging policy violations across your AWS infrastructure.
The Tags Handler periodically scans AWS resources using AWS Resource Explorer and validates them against the tagging policy. Resources that don't meet the policy requirements are:
- Logged as non-compliant
- Tracked in the
tagemon_resources_non_compliant_countmetric - Excluded from CloudWatch metric collection until they become compliant
Tagemon consists of three main components:
- YACE Handler: Manages YACE deployments for CloudWatch metrics
- Tags Handler: Validates resource compliance using AWS Resource Explorer and exposes compliance metrics
- Config Handler: Manages controller configuration
- Go 1.24+
- Docker
- kubectl
- kustomize
# Generate CRD manifests, RBAC, etc.
make manifests
# Generate code (DeepCopy methods)
make generate
# Build the binary
make build
# Run tests
make test
# Run linter
make lint# Install CRDs into your cluster
make install
# Run controller locally (outside cluster)
make run
# Uninstall CRDs
make uninstall# List all Tagemon resources
kubectl get tagemon -A
# View Tagemon details
kubectl describe tagemon <name> -n tagemon
# Check created YACE deployments
kubectl get deployments -n tagemon -l app.kubernetes.io/created-by=tagemon
# View operator logs
kubectl logs -n tagemon deployment/tagemon-controller-manager -f# Check operator status
kubectl get pods -n tagemon -l app.kubernetes.io/name=tagemon
# View YACE pod logs
kubectl logs -n tagemon -l app.kubernetes.io/created-by=tagemon
# Check controller metrics
kubectl port-forward -n tagemon svc/tagemon-controller-manager-metrics 8080:8080
curl http://localhost:8080/metrics- Namespace Isolation: Operator watches only the namespace where it's deployed
- Service Accounts: Two separate service accounts are created:
tagemon-controller-manager: For the operator (cluster-scoped)tagemon-yace: For YACE pods
- Resource Management: Each Tagemon CR creates a dedicated YACE deployment
- Monitoring: Requires Prometheus Operator for ServiceMonitor support
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
For information about how we collect, use, and protect your data, please see our Privacy Policy.
- YACE - CloudWatch metrics exporter
- tag-patrol - AWS tag compliance validation
- Kubebuilder - Kubernetes operator framework
For issues and questions:
- ๐ GitHub Issues
- ๐ Documentation