Skip to content

Infrastructure connection configs should be bootstrap-only, not mutable at runtime #97

@ferr3ira-gabriel

Description

@ferr3ira-gabriel

Problem

The systemplane currently allows runtime mutation of infrastructure connection settings (postgres.*, redis.*, rabbitmq.*) via the /v1/system/configs API. This creates a critical chicken-and-egg problem that can permanently brick the application.

The Circular Dependency

Pod starts → Needs PostgreSQL connection → Reads config from systemplane →
Systemplane is stored IN PostgreSQL → But postgres.primary_host has wrong value →
Cannot connect to PostgreSQL → Pod crashes → No way to fix the configuration

Since the systemplane stores its configuration in the same PostgreSQL database it is trying to configure, if someone changes postgres.primary_host via the API to an incorrect value:

  1. The current pod may continue working (existing connection)
  2. All new pods will fail to start - they cannot reach the systemplane store
  3. The configuration cannot be fixed via API - the API is unreachable
  4. Manual database intervention is required to recover

Affected Keys

The following keys are currently marked as MutableAtRuntime: true but should be false:

PostgreSQL (self-locking - systemplane lives here):

  • postgres.primary_host
  • postgres.primary_port
  • postgres.primary_user
  • postgres.primary_db
  • postgres.replica_host
  • postgres.replica_port
  • postgres.replica_user
  • postgres.replica_db

Redis (often used for caching/sessions):

  • redis.host
  • redis.master_name

RabbitMQ (core messaging infrastructure):

  • rabbitmq.host
  • rabbitmq.port
  • rabbitmq.user

Additional Issue: Configuration Sharing Between Pods

Because systemplane persists non-default values to PostgreSQL via SeedStore(), these values become shared across all pods. If the first pod resolves a DNS hostname to an IP address and stores it, or if infrastructure changes, new pods receive stale values from the store instead of fresh values from environment variables.

This was observed in production when:

  1. Old pods were healthy with existing connections
  2. New pods (after restart) failed to connect
  3. Investigation revealed pods were using IP addresses (21.26.7.96) from the systemplane store instead of DNS hostnames (postgresql.dev.firmino.lerian.net) from environment variables

Proposed Solution

Change infrastructure connection keys to be bootstrap-only:

{
    Key:              "postgres.primary_host",
    Kind:             domain.KindConfig,
    AllowedScopes:    []domain.Scope{domain.ScopeGlobal},
    DefaultValue:     defaultPGHost,
    ValueType:        domain.ValueTypeString,
    Validator:        validateNonEmptyString,
    ApplyBehavior:    domain.ApplyBootstrapOnly,  // <- Change from ApplyBundleRebuild
    MutableAtRuntime: false,                       // <- Change from true
    Description:      "PostgreSQL primary host address",
    // ...
}

What Should Remain Mutable

Settings that are safe for runtime tuning should stay mutable:

  • Connection pool sizes (postgres.max_open_conns, redis.pool_size)
  • Timeouts (postgres.query_timeout_sec, redis.read_timeout_ms)
  • Application-level settings (rate limits, feature flags, etc.)

Migration Consideration

Existing deployments may have these values stored in the systemplane table. A migration or documentation should be provided to:

  1. Remove stored infrastructure host/port values from the systemplane table
  2. Ensure environment variables are the sole source of truth for these settings

Impact

  • Severity: High - can cause permanent application lockout
  • Affected versions: All versions using systemplane with current key definitions
  • Workaround: Manual PostgreSQL intervention to delete/fix systemplane entries

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions