Skip to content

Define integration strategy with federated control plane for workload deployment scheduling #85

@scotwells

Description

@scotwells

Background

The compute service was originally designed around the MVP architecture defined in the federation enhancement: operators watch project API servers, write resources to a single Datum Platform API Server, and that platform API server communicates directly with edge cluster cells at each POP.

The platform is now targeting a launch architecture that replaces this with a Karmada-based federated control plane:

  • Virtual Control Planes (VCPs) replace per-project API servers (lightweight, ~0.8MB each)
  • Operators watch project VCPs and write resources to a Karmada Federation API Server instead of directly to a platform API server
  • Karmada PropagationPolicies drive resource propagation from the federation API server to the correct Datum POP cell(s)
  • The federation layer sits between the compute operator and the edge cluster cells where workloads actually run

For the purposes of this issue, we assume a single federation control plane (as targeted for launch).

The current compute service has no defined integration path for this federated model.

Problem

The compute service's WorkloadDeploymentScheduler and WorkloadReconciler are built assuming a direct connection to a single control plane that has visibility into all Location resources across all POPs. In the launch federation architecture:

  1. Placement scheduling changes: WorkloadDeployment resources need to be routed to the correct POP cell(s) via Karmada PropagationPolicies — it is unclear whether our current WorkloadDeploymentScheduler should be replaced by, or supplemented with, Karmada propagation rules.

  2. Location resource availability: Location resources (from network-services-operator, keyed by topology.datum.net/city-code) are currently queried from the project control plane. In the federated model, it is unclear which API server hosts these resources and how the compute operator discovers available compute locations across all POP cells.

  3. Resource ownership in the federation layer: It is unclear which resources (WorkloadDeployment, Instance, or new intermediary types) belong in the Federation API Server vs. the edge control plane API server, and how status flows back up through the federation layer to the consumer-facing Workload resource.

  4. Operator topology: In the launch architecture, operators run in a dedicated Control Plane Cell and connect to both the Federation API Server (to write propagation targets) and multiple project VCPs (to read consumer intent). The compute operator's multicluster-runtime setup needs to be adapted to this topology.

Goals

Define a clear integration strategy that answers:

  • What resources does the compute operator write to the Karmada Federation API Server, and in what form?
  • How do Karmada PropagationPolicy resources get created to route WorkloadDeployments to the correct POP cell(s) based on consumer-specified city codes?
  • Does the WorkloadDeploymentScheduler get replaced by Karmada propagation, supplemented by it, or remain as-is with a translation layer?
  • Where do Location resources live in the federated topology, and how does the compute operator discover them?
  • How does Instance status flow from edge cluster cells back through the federation layer to the consumer-facing Workload status?
  • What changes are needed to the compute operator's multicluster-runtime configuration to connect to the federation API server instead of (or in addition to) the single platform API server?

Relevant Context

Datum enhancements:

Karmada documentation:

Out of Scope

  • Changes to the consumer-facing Workload or WorkloadDeployment API surface

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions