Skip to content

Performance testing: instance launch time and workload throughput #101

@scotwells

Description

@scotwells

Parent Issue

Tracked by datum-cloud/enhancements#682 (Launch Workload Compute Service — "UFOs")

Summary

The initiative goals call out performance validation before launch, but no testing plan or benchmarks exist. Before the compute service goes to customers, we need to know how it performs relative to the alternatives a customer would consider — both to validate the product story and to identify bottlenecks while there is still time to address them.

Goals

  • Establish a repeatable benchmark suite for instance launch time (time from API call to instance ready)
  • Measure workload throughput (request/s, latency distribution) for representative workload types
  • Compare results against at least two alternatives relevant to the target use case (e.g. AWS Lambda, Fly.io, Cloudflare Workers)
  • Document the results and identify any performance gaps that need addressing before launch
  • Define ongoing performance regression thresholds so regressions are caught before they reach customers

Non-Goals

  • Optimizing the Unikraft runtime itself
  • Load testing the control plane (a separate concern from workload performance)

Open Questions

  • Which workload types should be benchmarked — AI inference sidecar, general HTTP, or both?
  • What launch time target is acceptable for a good customer experience (e.g. < 500ms p99)?
  • Which PoP should benchmarks run against initially?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions