Skip to content

[Core feature] Decouple submitterPod resources from ray task pod_template #5666

@jpoler

Description

@jpoler

Motivation: Why do you think this is important?

Currently the ray plugin uses the pod_template provided to the task as the basis for all pod specs:

  • The RayCluster head
  • RayCluster workers
  • The ray job submit kubernetes Job

This is a pain point when the RayCluster head and workers are intended to be scheduled on GPU nodes. I do not want to waste an entire GPU node for the submitter.

Goal: What should the final outcome look like, ideally?

It is not possible to configure RayCluster pod templates and the submitter pod template separately. If it were, it would be possible to schedule the submitter with appropriately minimal resource requests and leave out other configurations that have nothing to do with the submitter pod (for example in my use case only the ray head/worker need the GPU, shared memory volume mount, service account, etc.)

I found #4170, which looks like it was trying to address this issue, but it hasn't seen any progress since October 2023. At a high level the approach it takes makes sense to me, where the pod_template provided to the task configures the resources for the submitter job, and then the ray head/worker have new config fields to configure their resources explicitly. In my opinion this change looks like it is headed in the right direction, but would be improved with a slight adaptation where it allows for the user to provide the entire pod template alongside resources. Otherwise it won't be possible to do things on the ray head/worker like configure volume mounts and env vars, etc.

Describe alternatives you've considered

I don't see an alternative to adding separate config parameters for separate pod specs. It doesn't seem like a good idea to hard-code the submitter pod spec for minimal resource requests (e.g. just a small request/limit for CPU and memory), because there very well could be a use case where someone wants a GPU for the submitter. It wouldn't make a lot of sense to preclude that use-case IMO.

I do see this PR that adds a Resource config to

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions