Improve the performance of creating FairSchedulingAlgoContext#4807
Improve the performance of creating FairSchedulingAlgoContext#4807JamesMurkin merged 10 commits intomasterfrom
Conversation
When creating the fairshare scheduling context, we loop through all jobs of the jobdb and then filter out the ones relevant to the current pool We then use these jobs to calculate relevant info about the pool - Which jobs are on which nodes - The demand of each queue (and subsequently the fairshare/adjusted fairshare) This is simple but quite inefficient if you have a lot (millions) of jobs in the system and many pools (most of which only a small fraction of jobs interact with) - As a result most of the scheduling time can be taken up by this data shuffling Now instead of loading all jobs, we'll load on relevant jobs: - All leased jobs (needed to calculate which jobs are on which nodes) - Possibly this could be further improved by storing them by node but for now this is simple - All queued jobs against pools we are processing (current pool + away pools) This should have a significant impact, especially in the case we have 1 pool with most jobs queued against it and many smaller pools There is larger reworks we could do here to make it even more efficient, however for now this should give us most the benefit for a minor change - Example would be to calculate demand for each pool + jobs on each node upfront. Incrementally update this as pools are processed rather than recalculate it entirely Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Greptile SummaryThis PR significantly improves Confidence Score: 5/5Safe to merge — both previously reported P0/P1 issues are resolved and only minor P2 suggestions remain The two critical issues flagged in prior review threads (delete() copy-paste bug assigning to wrong field, and the oversized pre-allocation in GetQueuedJobsByPool) are both corrected in the current HEAD. Remaining findings are P2: a comment/implementation mismatch on terminal job scope, and a minor Go idiom preference for the seen-map type. Neither affects correctness. scheduling_algo.go — verify the GetAllTerminalJobs() scope is acceptable for your largest deployments; jobdb.go — no issues remaining after prior fixes Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[newFairSchedulingAlgoContext] --> B[Build allPools\ncurrentPool + AwayPools + awayAllocationPools]
B --> C[GetAllLeasedJobs\nAll pools - needed for node population]
B --> D[GetAllTerminalJobs\nAll pools - filtered to currentPool downstream]
B --> E[For each pool in allPools:\nGetQueuedJobsByPool]
C --> F[Combine into allJobs]
D --> F
E --> F
F --> G[UniqueBy job.Id\nDeduplicate]
G --> H[calculateJobSchedulingInfo]
H --> I{job.InTerminalState?}
I -- Yes, apply shortJobPenalty\nif jobPool==currentPool --> J[shortJobPenaltyByQueue]
I -- No --> K{job.Queued?}
K -- No, has run --> L[jobsByPool / allocation]
K -- Yes --> M[demand calculation\nif pool in job.Pools]
Greploops — Automatically fix all review issues by running Reviews (9): Last reviewed commit: "Merge branch 'master' into improve_fsctx..." | Re-trigger Greptile |
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
When creating the fairshare scheduling context, we loop through all jobs of the jobdb and then filter out the ones relevant to the current pool
We then use these jobs to calculate relevant info about the pool
This is simple but quite inefficient if you have a lot (millions) of jobs in the system and many pools (most of which only a small fraction of jobs interact with)
Now instead of loading all jobs, we'll load on relevant jobs:
This should have a significant impact, especially in the case we have 1 pool with most jobs queued against it and many smaller pools
There is larger reworks we could do here to make it even more efficient, however for now this should give us most the benefit for a minor change