fix: change keystore pool monitor to OTLP metric#220
Conversation
|
@coderabbitai summary |
✅ Actions performedSummary regeneration triggered. |
📝 WalkthroughWalkthroughMoves keystore pool monitoring from the API server into a new internal daemon using OpenTelemetry metrics and a cached refresh loop; adds daemon implementation and tests, and removes the prior Prometheus polling and its test/export helper. Changes
Sequence DiagramsequenceDiagram
participant API as API Server
participant Daemon as internal/daemon
participant OTel as OpenTelemetry Meter
participant DB as Database
API->>Daemon: MonitorKeystorePoolSize(ctx, cfg)
activate Daemon
Daemon->>DB: Start DB connection & build SQL repo/pool
Daemon->>OTel: Create meter & register gauge callback (keystore_pool_available)
deactivate Daemon
loop Collection / Ticker cycle
OTel->>Daemon: invoke callback(ctx)
activate Daemon
alt cached value missing or stale
Daemon->>DB: pool.Count(ctx)
DB-->>Daemon: count
Daemon->>Daemon: update cachedCount, lastRead
end
Daemon->>OTel: Observe(cachedCount)
deactivate Daemon
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
45cd2d6 to
25e0a4b
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/api-server/main.go`:
- Line 128: The call to daemon.MonitorKeystorePoolSize(ctx, cfg) starts the
keystore monitor (which via NewKeystorePoolMonitorFromCfg opens its own DB
connection) even when metrics are disabled; move this invocation behind the
service metrics-enabled gate in startStatusServer or make
MonitorKeystorePoolSize lazy so it does not call
NewKeystorePoolMonitorFromCfg/allocate a DB pool until the gauge is actually
collected. Specifically: check the service metrics flag in startStatusServer
before calling daemon.MonitorKeystorePoolSize(ctx, cfg), or modify
daemon.MonitorKeystorePoolSize / NewKeystorePoolMonitorFromCfg to defer DB
initialization until the first metrics scrape (e.g., lazy-init inside the
monitor/gauge creation), ensuring no extra DB pool is opened when metrics are
off.
In `@internal/daemon/keystore_pool_monitor_test.go`:
- Around line 72-75: This test mutates global OTel state by calling
otel.SetMeterProvider(provider); save the current global with prev :=
otel.GetMeterProvider() before setting the test provider and register a
t.Cleanup that restores it via otel.SetMeterProvider(prev) (and if needed shuts
down or closes the test provider/reader created by
metric.NewMeterProvider/metric.NewManualReader) so the global MeterProvider and
registered callbacks are not leaked into other tests; update the setup around
metric.NewManualReader, metric.NewMeterProvider, and otel.SetMeterProvider to
perform this save-and-restore in t.Cleanup.
In `@internal/daemon/keystore_pool_monitor.go`:
- Around line 24-36: The monitor currently drops the DB connection created in
NewKeystorePoolMonitorFromCfg and the registration handle returned by
meter.RegisterCallback() in MonitorKeystorePoolSize; add fields to
KeystorePoolMonitor (e.g., dbConn of the DB connection type and reg
metric.Registration or callback handle) to retain both resources, update
NewKeystorePoolMonitorFromCfg to assign the created DB connection to
monitor.dbConn before passing it to manager.Pool, and change
MonitorKeystorePoolSize to store the return value of meter.RegisterCallback() on
monitor.reg so it can be unregistered; also add a Close/Shutdown method on
KeystorePoolMonitor that closes monitor.dbConn and calls
monitor.reg.Unregister() (or equivalent) to clean up.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 85869ccd-db05-493b-9770-329935e88940
📒 Files selected for processing (5)
cmd/api-server/export_test.gocmd/api-server/main.gocmd/api-server/main_test.gointernal/daemon/keystore_pool_monitor.gointernal/daemon/keystore_pool_monitor_test.go
💤 Files with no reviewable changes (1)
- cmd/api-server/main_test.go
| reader := metric.NewManualReader() | ||
| provider := metric.NewMeterProvider(metric.WithReader(reader)) | ||
| otel.SetMeterProvider(provider) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's read the test file to see the full context around lines 72-75
cd /tmp && git ls-files | grep -i keystore_pool_monitor_test && head -1Repository: openkcm/cmk
Length of output: 123
🏁 Script executed:
# Find the test file
fd keystore_pool_monitor_testRepository: openkcm/cmk
Length of output: 100
🏁 Script executed:
# Read the test file with context around the flagged lines
cat -n internal/daemon/keystore_pool_monitor_test.go | sed -n '60,90p'Repository: openkcm/cmk
Length of output: 958
🏁 Script executed:
# Check go.mod for Go version requirement
cat go.mod | head -20Repository: openkcm/cmk
Length of output: 696
🏁 Script executed:
# Look for other tests that might register the same keystore_pool_available instrument
rg "keystore_pool_available" -A 3 -B 3 internal/daemon/Repository: openkcm/cmk
Length of output: 3324
🏁 Script executed:
# Read the full test function
cat -n internal/daemon/keystore_pool_monitor_test.go | sed -n '66,150p'Repository: openkcm/cmk
Length of output: 2811
🏁 Script executed:
# Check for any cleanup patterns in the test file
rg -i "cleanup|shutdown|setmeterprovider" internal/daemon/keystore_pool_monitor_test.go -B 2 -A 2Repository: openkcm/cmk
Length of output: 249
🏁 Script executed:
# Check if there are other test files that might also use OTel
fd -e go internal/daemon/ | xargs rg -l "SetMeterProvider|otel\." | head -10Repository: openkcm/cmk
Length of output: 777
Restore the global MeterProvider in t.Cleanup.
otel.SetMeterProvider(provider) mutates process-global OTel state. Leaving it installed after this test can leak the registered callback into later tests and produce duplicate keystore_pool_available observations once another test also registers the same instrument.
Suggested cleanup
import (
+ "context"
"encoding/json"
"testing"
"time"
@@
reader := metric.NewManualReader()
provider := metric.NewMeterProvider(metric.WithReader(reader))
+ prevProvider := otel.GetMeterProvider()
otel.SetMeterProvider(provider)
+ t.Cleanup(func() {
+ _ = provider.Shutdown(context.Background())
+ otel.SetMeterProvider(prevProvider)
+ })📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| reader := metric.NewManualReader() | |
| provider := metric.NewMeterProvider(metric.WithReader(reader)) | |
| otel.SetMeterProvider(provider) | |
| import ( | |
| "context" | |
| "encoding/json" | |
| "testing" | |
| "time" | |
| ... | |
| ) | |
| ... | |
| reader := metric.NewManualReader() | |
| provider := metric.NewMeterProvider(metric.WithReader(reader)) | |
| prevProvider := otel.GetMeterProvider() | |
| otel.SetMeterProvider(provider) | |
| t.Cleanup(func() { | |
| _ = provider.Shutdown(context.Background()) | |
| otel.SetMeterProvider(prevProvider) | |
| }) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/daemon/keystore_pool_monitor_test.go` around lines 72 - 75, This
test mutates global OTel state by calling otel.SetMeterProvider(provider); save
the current global with prev := otel.GetMeterProvider() before setting the test
provider and register a t.Cleanup that restores it via
otel.SetMeterProvider(prev) (and if needed shuts down or closes the test
provider/reader created by metric.NewMeterProvider/metric.NewManualReader) so
the global MeterProvider and registered callbacks are not leaked into other
tests; update the setup around metric.NewManualReader, metric.NewMeterProvider,
and otel.SetMeterProvider to perform this save-and-restore in t.Cleanup.
| now := time.Now() | ||
|
|
||
| m.mu.Lock() | ||
| shouldRefresh := m.cachedCount == nil || now.Sub(*m.lastRead) >= m.interval |
There was a problem hiding this comment.
The cache feels odd here. Maybe the period of the metrics push to the Otel-collector should rather be configurable in common-sdk.
There was a problem hiding this comment.
Yeah I agree it doesn't look good.
This caching was to make it work with the global periodic reader callback. I'm not sure we can configure the interval by each metric.
I've changed it to the previous approach to keep its own ticker and push the record upon each interval, specific to this metric only.
|
|
||
| log.Debug(ctx, "Registering keystore pool size gauge metric") | ||
|
|
||
| _, err = monitor.meter.RegisterCallback(monitor.callback, monitor.gauge) |
There was a problem hiding this comment.
This breaks encapsulation. Couldn't this be done inside of the constructor method (as parameter to meter.Int64ObservableGauge)?
0bc7b1e to
3893dbd
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
internal/daemon/keystore_pool_monitor.go (1)
52-57:⚠️ Potential issue | 🟠 MajorDon't create a standalone DB pool in monitor init.
NewKeystorePoolMonitorFromCfgis called from status-server startup, before the main application DB is initialized, so openingdbConhere creates a second pool dedicated to this metric path. Because the monitor is started unconditionally, that extra pool is allocated even when no metrics reader/exporter is active. Please reuse the application DB/pool or defer this initialization until the metric is actually collected.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/daemon/keystore_pool_monitor.go` around lines 52 - 57, The monitor currently opens its own DB connection via db.StartDBConnection and creates a separate pool with manager.NewPool inside NewKeystorePoolMonitorFromCfg; change this so the function does not create a standalone DB/pool at init — instead accept the application's repository/pool (e.g., pass an existing sql.Repository or manager.Pool into NewKeystorePoolMonitorFromCfg) or store cfg and defer calling db.StartDBConnection/manager.NewPool until the metric collection runs; update NewKeystorePoolMonitorFromCfg signature and internal use of dbCon/NewPool (and the caller in status-server startup) to reuse the main app DB/pool or lazily initialize only when metrics are enabled.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/daemon/keystore_pool_monitor.go`:
- Around line 82-85: The current error path in the monitor around
m.pool.Count(ctx) logs context cancellation as a failure; update the error
handling in the keystore pool monitor so that if err is context.Canceled or
context.DeadlineExceeded (use errors.Is(err, context.Canceled) || errors.Is(err,
context.DeadlineExceeded)) you return without logging, otherwise keep the
existing log.Error(ctx, "failed to get keystore pool size", err) and return;
this change should be applied around the m.pool.Count(ctx) call in the monitor
loop.
- Around line 67-76: Start currently waits for the first ticker tick so the
keystore_pool_available gauge can be missing after startup; call m.record(ctx)
once before entering the for/select loop in KeystorePoolMonitor.Start to prime
the metric (i.e., invoke record immediately, then create/start the ticker and
continue with the existing loop), referencing the Start method and the
m.record(ctx) helper to locate where to add the initial call.
---
Duplicate comments:
In `@internal/daemon/keystore_pool_monitor.go`:
- Around line 52-57: The monitor currently opens its own DB connection via
db.StartDBConnection and creates a separate pool with manager.NewPool inside
NewKeystorePoolMonitorFromCfg; change this so the function does not create a
standalone DB/pool at init — instead accept the application's repository/pool
(e.g., pass an existing sql.Repository or manager.Pool into
NewKeystorePoolMonitorFromCfg) or store cfg and defer calling
db.StartDBConnection/manager.NewPool until the metric collection runs; update
NewKeystorePoolMonitorFromCfg signature and internal use of dbCon/NewPool (and
the caller in status-server startup) to reuse the main app DB/pool or lazily
initialize only when metrics are enabled.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4c3ba778-faa2-4cf2-b129-16e88d91f9e3
📒 Files selected for processing (5)
cmd/api-server/export_test.gocmd/api-server/main.gocmd/api-server/main_test.gointernal/daemon/keystore_pool_monitor.gointernal/daemon/keystore_pool_monitor_test.go
💤 Files with no reviewable changes (1)
- cmd/api-server/main_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- cmd/api-server/export_test.go
- internal/daemon/keystore_pool_monitor_test.go
3893dbd to
4325008
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
internal/daemon/keystore_pool_monitor.go (1)
78-90:⚠️ Potential issue | 🟠 MajorDB connection created in
getPoolis not retained for cleanup.The DB connection created at line 83 is passed to
manager.NewPoolbut not stored on the monitor struct. When the application shuts down or the monitor is stopped, there's no way to close this connection. Consider storingdbConon the monitor and adding aClose()method to properly clean up resources.,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/daemon/keystore_pool_monitor.go` around lines 78 - 90, The DB connection returned by db.StartDBConnection in KeystorePoolMonitor.getPool is not retained for later cleanup; modify the KeystorePoolMonitor struct to add a field (e.g., dbCon *sql.DB or appropriate connection type) and assign m.dbCon = dbCon before creating the pool (manager.NewPool), then implement a Close() method on KeystorePoolMonitor that closes m.dbCon (and any pool cleanup) and ensure callers invoke Close() when stopping the monitor or shutting down; update references to m.pool and manager.NewPool accordingly so the connection is owned by the monitor and can be closed.internal/daemon/keystore_pool_monitor_test.go (1)
72-75:⚠️ Potential issue | 🟡 MinorRestore global MeterProvider in
t.Cleanup.
otel.SetMeterProvider(provider)mutates process-global state. Without cleanup, this can leak the registered callback into subsequent tests. Save the previous provider and restore it int.Cleanup.,
💡 Suggested fix
reader := metric.NewManualReader() provider := metric.NewMeterProvider(metric.WithReader(reader)) + prevProvider := otel.GetMeterProvider() otel.SetMeterProvider(provider) + t.Cleanup(func() { + _ = provider.Shutdown(t.Context()) + otel.SetMeterProvider(prevProvider) + })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/daemon/keystore_pool_monitor_test.go` around lines 72 - 75, Save the existing global MeterProvider before calling otel.SetMeterProvider and restore it in t.Cleanup to avoid leaking state between tests: capture prev := otel.GetMeterProvider() before creating/setting the test provider (the block with metric.NewManualReader, metric.NewMeterProvider and otel.SetMeterProvider) and add t.Cleanup(func() { otel.SetMeterProvider(prev) }) so the original provider is reinstated after the test.
🧹 Nitpick comments (2)
internal/daemon/keystore_pool_monitor.go (1)
113-122: Consider returning a shutdown signal for graceful termination.
MonitorKeystorePoolSizespawns a goroutine but provides no way for the caller to wait for it to finish. If graceful shutdown requires ensuring the monitor has stopped before proceeding, consider returning adonechannel or accepting async.WaitGroup.💡 Example signature
func MonitorKeystorePoolSize(ctx context.Context, cfg *config.Config) <-chan struct{} { // ... done := make(chan struct{}) go func() { defer close(done) monitor.Start(ctx) }() return done }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/daemon/keystore_pool_monitor.go` around lines 113 - 122, Change MonitorKeystorePoolSize to return a shutdown/done signal so callers can wait for the goroutine to finish: modify the signature (e.g., return <-chan struct{}), create a done channel after NewKeystorePoolMonitorFromCfg succeeds, spawn the goroutine to call monitor.Start(ctx) and defer close(done) so the channel closes when Start returns, and return the done channel to the caller; keep the existing error handling when NewKeystorePoolMonitorFromCfg fails (return a closed or nil channel as appropriate).internal/daemon/keystore_pool_monitor_test.go (1)
86-129: Consider shorter intervals to reduce test duration.The test uses a 5-second interval with multiple sleeps, totaling ~8 seconds of wall-clock time. For faster CI feedback, consider reducing the interval (e.g.,
100ms) and adjusting sleep durations proportionally.💡 Example with shorter intervals
KeystorePool: config.KeystorePool{ - Interval: 5 * time.Second, + Interval: 100 * time.Millisecond, }, }) t.Run("initial read", func(t *testing.T) { - time.Sleep(1 * time.Second) + time.Sleep(50 * time.Millisecond) // ... }) t.Run("previous value is returned within interval", func(t *testing.T) { - time.Sleep(2 * time.Second) + time.Sleep(40 * time.Millisecond) // ... }) t.Run("value is refreshed after interval", func(t *testing.T) { - time.Sleep(5 * time.Second) + time.Sleep(100 * time.Millisecond) // ... })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/daemon/keystore_pool_monitor_test.go` around lines 86 - 129, Reduce the test timing to speed CI by shrinking the KeystorePool interval and corresponding sleeps: set config.KeystorePool.Interval from 5*s to something like 100*time.Millisecond, change the initial sleep to ~50-100ms to allow commit before monitoring, in the "previous value" case sleep for less than the interval (e.g., 50ms) before calling reader.Collect and addToKeystorePool, and in the "value is refreshed" case sleep for slightly more than the interval (e.g., 150-200ms) before collecting; update references in the test to use the KeystorePool.Interval change and keep assertions using getKeystorePoolMetric and reader unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@internal/daemon/keystore_pool_monitor_test.go`:
- Around line 72-75: Save the existing global MeterProvider before calling
otel.SetMeterProvider and restore it in t.Cleanup to avoid leaking state between
tests: capture prev := otel.GetMeterProvider() before creating/setting the test
provider (the block with metric.NewManualReader, metric.NewMeterProvider and
otel.SetMeterProvider) and add t.Cleanup(func() { otel.SetMeterProvider(prev) })
so the original provider is reinstated after the test.
In `@internal/daemon/keystore_pool_monitor.go`:
- Around line 78-90: The DB connection returned by db.StartDBConnection in
KeystorePoolMonitor.getPool is not retained for later cleanup; modify the
KeystorePoolMonitor struct to add a field (e.g., dbCon *sql.DB or appropriate
connection type) and assign m.dbCon = dbCon before creating the pool
(manager.NewPool), then implement a Close() method on KeystorePoolMonitor that
closes m.dbCon (and any pool cleanup) and ensure callers invoke Close() when
stopping the monitor or shutting down; update references to m.pool and
manager.NewPool accordingly so the connection is owned by the monitor and can be
closed.
---
Nitpick comments:
In `@internal/daemon/keystore_pool_monitor_test.go`:
- Around line 86-129: Reduce the test timing to speed CI by shrinking the
KeystorePool interval and corresponding sleeps: set config.KeystorePool.Interval
from 5*s to something like 100*time.Millisecond, change the initial sleep to
~50-100ms to allow commit before monitoring, in the "previous value" case sleep
for less than the interval (e.g., 50ms) before calling reader.Collect and
addToKeystorePool, and in the "value is refreshed" case sleep for slightly more
than the interval (e.g., 150-200ms) before collecting; update references in the
test to use the KeystorePool.Interval change and keep assertions using
getKeystorePoolMetric and reader unchanged.
In `@internal/daemon/keystore_pool_monitor.go`:
- Around line 113-122: Change MonitorKeystorePoolSize to return a shutdown/done
signal so callers can wait for the goroutine to finish: modify the signature
(e.g., return <-chan struct{}), create a done channel after
NewKeystorePoolMonitorFromCfg succeeds, spawn the goroutine to call
monitor.Start(ctx) and defer close(done) so the channel closes when Start
returns, and return the done channel to the caller; keep the existing error
handling when NewKeystorePoolMonitorFromCfg fails (return a closed or nil
channel as appropriate).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e27ec83d-c312-4079-9dde-c02536b9a384
📒 Files selected for processing (5)
cmd/api-server/export_test.gocmd/api-server/main.gocmd/api-server/main_test.gointernal/daemon/keystore_pool_monitor.gointernal/daemon/keystore_pool_monitor_test.go
💤 Files with no reviewable changes (1)
- cmd/api-server/main_test.go
✅ Files skipped from review due to trivial changes (1)
- cmd/api-server/main.go
4325008 to
d7f5a05
Compare
Summary by CodeRabbit
Refactor
Tests