Problem
Today, traffic between VPC workloads follows the shortest available path through the network fabric — but the shortest path isn't always the fastest. Customers running latency-sensitive workloads (real-time APIs, databases, video streaming) have no way to express that their traffic should prefer low-latency paths, and no visibility into the actual latency their traffic experiences.
As the platform scales across regions and availability zones, the gap between "shortest path" and "lowest latency path" will grow. Customers with strict SLA requirements need confidence that their traffic is routed optimally — not just reachably.
Desired Outcome
VPC customers should be able to:
- See the latency their traffic experiences across the network — between workloads, across clusters, and through connectors
- Express latency preferences as simple policies on their VPC (e.g., "optimize for low latency" or "keep latency under 10ms")
- Get automatic path optimization — the platform should continuously measure network conditions and route traffic over the best-performing paths without manual intervention
- Differentiate by tier — premium VPC tiers could receive latency-optimized routing as a value-added capability
Why This Matters
- Competitive differentiation: No major cloud provider currently exposes tenant-controllable latency-aware routing as a first-class VPC feature
- Revenue enabler: Latency SLA tiers (standard, low-latency, ultra-low-latency) create natural pricing differentiation
- Operational simplicity: Measured, automated path selection replaces manual traffic engineering
- Data sovereignty: Latency constraints implicitly prefer geographically local paths, supporting compliance requirements without explicit geo-fencing rules
- Multi-region readiness: As the platform expands across regions, latency-aware routing prevents cross-region detours that degrade user experience
How This Could Work
Research into modern network path selection suggests a layered approach:
- Measure: Active delay probes between nodes continuously track real-time latency across all network paths. STAMP (RFC 8762) provides proven mechanisms for this, with SRv6-specific extensions (RFC 9503) that ensure probes follow the exact same paths as real traffic.
- Advertise: Measured latency data is distributed alongside routing information so the control plane has a complete picture of network performance. Standards like BGP-LS TE Performance Metrics (RFC 8571) and IS-IS TE Metric Extensions (RFC 8570) define how delay values propagate through the routing system.
- Policy: Customers express intent through simple VPC-level settings. The platform translates "optimize for latency" into the appropriate path selection constraints. The SR Policy Architecture (RFC 9256) defines how a "Color" value maps high-level intent to specific forwarding paths, and IGP Flexible Algorithm (RFC 9350) enables the network to compute delay-optimized topologies automatically.
- Steer: The existing SRv6 (RFC 8986) traffic engineering capability in the Galactic data plane can route tenant traffic through specific paths — what's needed is the intelligence layer that picks the right path based on measured delay.
This builds naturally on the existing BGP control plane and SRv6 data plane architecture. The core insight from research into protocols like DDM (Delay Driven Multipath) is that delay is a simple, effective, and universal signal for path quality — it inherently encodes congestion, distance, and link health into a single measurable value. Academic work like Google's Swift ("Delay is Simple and Effective for Congestion Control in the Datacenter", SIGCOMM 2020) validates this approach at hyperscaler scale.
Related Standards & Background Reading
SRv6 Data Plane & Traffic Engineering
Delay Measurement
Performance-Aware Routing
Policy & Intent Frameworks
Prior Art & Research
Open Questions
- What latency granularity do customers actually need? (per-VPC, per-workload, per-connection?)
- Should latency preferences be a VPC-level setting or expressed through a separate policy resource?
- How should latency data be exposed to customers? (metrics dashboard, API, status on VPC resource?)
- What is the right default — should all VPCs get basic latency optimization, or is it opt-in?
Problem
Today, traffic between VPC workloads follows the shortest available path through the network fabric — but the shortest path isn't always the fastest. Customers running latency-sensitive workloads (real-time APIs, databases, video streaming) have no way to express that their traffic should prefer low-latency paths, and no visibility into the actual latency their traffic experiences.
As the platform scales across regions and availability zones, the gap between "shortest path" and "lowest latency path" will grow. Customers with strict SLA requirements need confidence that their traffic is routed optimally — not just reachably.
Desired Outcome
VPC customers should be able to:
Why This Matters
How This Could Work
Research into modern network path selection suggests a layered approach:
This builds naturally on the existing BGP control plane and SRv6 data plane architecture. The core insight from research into protocols like DDM (Delay Driven Multipath) is that delay is a simple, effective, and universal signal for path quality — it inherently encodes congestion, distance, and link health into a single measurable value. Academic work like Google's Swift ("Delay is Simple and Effective for Congestion Control in the Datacenter", SIGCOMM 2020) validates this approach at hyperscaler scale.
Related Standards & Background Reading
SRv6 Data Plane & Traffic Engineering
Delay Measurement
Performance-Aware Routing
Policy & Intent Frameworks
Prior Art & Research
Open Questions