Problem
The systemplane currently allows runtime mutation of infrastructure connection settings (postgres.*, redis.*, rabbitmq.*) via the /v1/system/configs API. This creates a critical chicken-and-egg problem that can permanently brick the application.
The Circular Dependency
Pod starts → Needs PostgreSQL connection → Reads config from systemplane →
Systemplane is stored IN PostgreSQL → But postgres.primary_host has wrong value →
Cannot connect to PostgreSQL → Pod crashes → No way to fix the configuration
Since the systemplane stores its configuration in the same PostgreSQL database it is trying to configure, if someone changes postgres.primary_host via the API to an incorrect value:
- The current pod may continue working (existing connection)
- All new pods will fail to start - they cannot reach the systemplane store
- The configuration cannot be fixed via API - the API is unreachable
- Manual database intervention is required to recover
Affected Keys
The following keys are currently marked as MutableAtRuntime: true but should be false:
PostgreSQL (self-locking - systemplane lives here):
postgres.primary_host
postgres.primary_port
postgres.primary_user
postgres.primary_db
postgres.replica_host
postgres.replica_port
postgres.replica_user
postgres.replica_db
Redis (often used for caching/sessions):
redis.host
redis.master_name
RabbitMQ (core messaging infrastructure):
rabbitmq.host
rabbitmq.port
rabbitmq.user
Additional Issue: Configuration Sharing Between Pods
Because systemplane persists non-default values to PostgreSQL via SeedStore(), these values become shared across all pods. If the first pod resolves a DNS hostname to an IP address and stores it, or if infrastructure changes, new pods receive stale values from the store instead of fresh values from environment variables.
This was observed in production when:
- Old pods were healthy with existing connections
- New pods (after restart) failed to connect
- Investigation revealed pods were using IP addresses (
21.26.7.96) from the systemplane store instead of DNS hostnames (postgresql.dev.firmino.lerian.net) from environment variables
Proposed Solution
Change infrastructure connection keys to be bootstrap-only:
{
Key: "postgres.primary_host",
Kind: domain.KindConfig,
AllowedScopes: []domain.Scope{domain.ScopeGlobal},
DefaultValue: defaultPGHost,
ValueType: domain.ValueTypeString,
Validator: validateNonEmptyString,
ApplyBehavior: domain.ApplyBootstrapOnly, // <- Change from ApplyBundleRebuild
MutableAtRuntime: false, // <- Change from true
Description: "PostgreSQL primary host address",
// ...
}
What Should Remain Mutable
Settings that are safe for runtime tuning should stay mutable:
- Connection pool sizes (
postgres.max_open_conns, redis.pool_size)
- Timeouts (
postgres.query_timeout_sec, redis.read_timeout_ms)
- Application-level settings (rate limits, feature flags, etc.)
Migration Consideration
Existing deployments may have these values stored in the systemplane table. A migration or documentation should be provided to:
- Remove stored infrastructure host/port values from the systemplane table
- Ensure environment variables are the sole source of truth for these settings
Impact
- Severity: High - can cause permanent application lockout
- Affected versions: All versions using systemplane with current key definitions
- Workaround: Manual PostgreSQL intervention to delete/fix systemplane entries
Problem
The systemplane currently allows runtime mutation of infrastructure connection settings (
postgres.*,redis.*,rabbitmq.*) via the/v1/system/configsAPI. This creates a critical chicken-and-egg problem that can permanently brick the application.The Circular Dependency
Since the systemplane stores its configuration in the same PostgreSQL database it is trying to configure, if someone changes
postgres.primary_hostvia the API to an incorrect value:Affected Keys
The following keys are currently marked as
MutableAtRuntime: truebut should befalse:PostgreSQL (self-locking - systemplane lives here):
postgres.primary_hostpostgres.primary_portpostgres.primary_userpostgres.primary_dbpostgres.replica_hostpostgres.replica_portpostgres.replica_userpostgres.replica_dbRedis (often used for caching/sessions):
redis.hostredis.master_nameRabbitMQ (core messaging infrastructure):
rabbitmq.hostrabbitmq.portrabbitmq.userAdditional Issue: Configuration Sharing Between Pods
Because systemplane persists non-default values to PostgreSQL via
SeedStore(), these values become shared across all pods. If the first pod resolves a DNS hostname to an IP address and stores it, or if infrastructure changes, new pods receive stale values from the store instead of fresh values from environment variables.This was observed in production when:
21.26.7.96) from the systemplane store instead of DNS hostnames (postgresql.dev.firmino.lerian.net) from environment variablesProposed Solution
Change infrastructure connection keys to be bootstrap-only:
{ Key: "postgres.primary_host", Kind: domain.KindConfig, AllowedScopes: []domain.Scope{domain.ScopeGlobal}, DefaultValue: defaultPGHost, ValueType: domain.ValueTypeString, Validator: validateNonEmptyString, ApplyBehavior: domain.ApplyBootstrapOnly, // <- Change from ApplyBundleRebuild MutableAtRuntime: false, // <- Change from true Description: "PostgreSQL primary host address", // ... }What Should Remain Mutable
Settings that are safe for runtime tuning should stay mutable:
postgres.max_open_conns,redis.pool_size)postgres.query_timeout_sec,redis.read_timeout_ms)Migration Consideration
Existing deployments may have these values stored in the systemplane table. A migration or documentation should be provided to:
Impact