Skip to content

Improve CUDA resource management for MPI jobs#185

Open
vmitq wants to merge 1 commit intowavefunction91:masterfrom
vmitq:feature/cuda-oversubscribe
Open

Improve CUDA resource management for MPI jobs#185
vmitq wants to merge 1 commit intowavefunction91:masterfrom
vmitq:feature/cuda-oversubscribe

Conversation

@vmitq
Copy link

@vmitq vmitq commented Mar 13, 2026

The code detects the number of local MPI processes and available CUDA devices and assigns a GPU ID to each process in a round-robin fashion. When determining the available memory, it is divided evenly among the processes sharing the same GPU.

That simplifies GPU resource management when running jobs with multiple GPUs per host or multiple processes per GPU.

Split memory evenly between processes on one GPU
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant