Skip to content

feat(mlflow): add GPU node scheduling pattern #170

@kriscoleman

Description

@kriscoleman

Parent epic: #166
Integration branch: feat/mlflow-enterprise-patterns

Add GPU scheduling support for ML/AI workloads with node selectors, tolerations, and nvidia.com/gpu resource limits.

Scope

  • Add gpu section to charts/mlflow/values.yaml: enabled: false, nodeSelector, tolerations, resources.limits["nvidia.com/gpu"]
  • Update charts/mlflow/templates/deployment.yaml to conditionally inject:
    • nodeSelector from .Values.gpu.nodeSelector when gpu.enabled
    • tolerations from .Values.gpu.tolerations when gpu.enabled
    • resources.limits including nvidia.com/gpu when gpu.enabled
  • Add comments explaining the pattern for vendors adapting it to AMD ROCm or other GPU providers

Files touched

  • applications/mlflow/charts/mlflow/values.yaml (add gpu section)
  • applications/mlflow/charts/mlflow/templates/deployment.yaml (conditional GPU blocks)

PR target

Branch PRs to feat/mlflow-enterprise-patterns (not main).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions