feat(scheduler): optimize worker scheduling to O(1) using Valkey/Redi…#13
Conversation
3a51efa to
a570c3c
Compare
|
Fixed format issue using |
| for _, worker := range workers { | ||
| if worker.GetActorId() == input.ActorID && worker.GetWorkerPool() == state.ActorTemplate.Spec.WorkerPoolRef.Name && worker.GetWorkerNamespace() == state.ActorTemplate.Spec.WorkerPoolRef.Namespace { | ||
| // Re-use previously assigned worker if available. | ||
| if state.Actor.GetAteomPodName() != "" { |
There was a problem hiding this comment.
This will never be non-empty during resume (except maybe on retry of a failed resume?)
|
|
||
| // Run SAdd sequentially outside the transaction to avoid cluster slot restrictions. | ||
| if shouldAddToIdle { | ||
| setKey := fmt.Sprintf("pool:%s:%s:idle_workers", worker.GetWorkerNamespace(), worker.GetWorkerPool()) |
There was a problem hiding this comment.
One of our design goals is to let all ActorTemplates share the same WorkerPool to maximize efficiency. In that case, this will mean that we are sending all restore traffic through a single hash bucket (and thus, one single valkey node).
We are going to need to solve this problem, and I'm having difficulty seeing how it could be solved within redis.
However, this is heaps better than the current strategy of "read every worker all the time".
There was a problem hiding this comment.
100% I have a much more advanced change that handles locality but because it is much more ossifying I felt this was a decent intermediate step while the datamodel and DB choices were finalized.
| } | ||
|
|
||
| // Run SAdd sequentially outside the transaction to avoid cluster slot restrictions. | ||
| if shouldAddToIdle { |
There was a problem hiding this comment.
I think this needs to be a three phase process.
- Mark the worker as free, but not yet returned to the free set.
- Return to the free set
- Make the worker as fully free.
Otherwise, a crash between steps 1 and 2 will permanently leak the worker from consideration.
With the three-phase approach, we can have an additional background thread sweeping workers that are "free, but not yet returned to the free set" back to the free set.
There was a problem hiding this comment.
I think we do have a re-sync process? Will need to double check. But I will make this change. Thanks Taahir.
Description
Addresses #12
This PR optimizes Substrate's critical scheduling path by replacing the legacy$O(N)$ worker range scan bottleneck with an $O(1)$ constant-time idle worker Set queue backed by Valkey/Redis.
Architectural Rationale & Changes
Previously, when an actor was resumed, the scheduler step (
AssignWorkerStep.Executeinworkflow_resume.go) would fetch all registered workers in the pool viastore.ListWorkers(ctx). For a pool of 10,000 workers, this required serial keyspace scans over all master shards, loading and unmarshaling 10,000 JSON blocks on every single wakeup request.This PR completely eliminates that bottleneck by shifting worker state queue management into the database layer:
1. Interface Enhancements (
store.Interface)ClaimIdleWorker(ctx, namespace, pool, actorID, actorNamespace, actorTemplate)to the database contract.2. Set-Based Idle Pool Management (
ateredis.go)We leverage Valkey/Redis's high-performance in-memory
Setindexing to track available capacity:ClaimIdleWorkerexecutes a single, atomicSPOPoperation on thepool:<namespace>:<pool>:idle_workersSet to claim a random free worker in constant time.SPOPis an atomic server-side operation, concurrent scheduler instances are guaranteed to pop unique worker IDs. This completely eliminates optimistic lock collisions and database retries on concurrent wakeups.CreateWorker.DeleteWorker.ActorIdtransitions to empty inUpdateWorker.3. Handling Redis Cluster Slot Restrictions (
CROSSSLOTfix)During integration testing, multi-key transaction pipelines (trying to write a worker record and mutate the idle set in a single transaction block) failed with
CROSSSLOTerrors because the keys hashed to different clustered slots.We solved this by splitting the operations sequentially outside the transactions:
WATCHtransactions).SAdd/SRemon the idle Set) are executed as separate, independent commands immediately upon transaction success. This ensures 100% compatibility with clustered production Valkey.Verification & Tests Completed
go vet(0 errors/warnings).