Skip to content

[feat][CGS][application-manager] add intelligent queue selection based on yarn resource usage#986

Merged
casionone merged 9 commits intodev-1.18.2-webankfrom
dev-1.18.0-secondary-queue
Apr 13, 2026
Merged

[feat][CGS][application-manager] add intelligent queue selection based on yarn resource usage#986
casionone merged 9 commits intodev-1.18.2-webankfrom
dev-1.18.0-secondary-queue

Conversation

@v-kkhuang
Copy link
Copy Markdown

What is the purpose of the change

Background/Problem:
Currently, Linkis uses a fixed queue configuration for Yarn jobs. This can lead to inefficient resource utilization when some queues are heavily loaded while others have available capacity. The system needs the ability to automatically select the optimal queue based on real-time resource usage.

Purpose of Change:
This PR adds intelligent queue selection functionality that monitors Yarn queue resource usage in real-time and automatically selects the optimal queue based on configurable thresholds. When the secondary queue has available capacity (usage below threshold), jobs are directed there; otherwise, they use the primary queue.

Value/Impact:
After the change, Linkis can optimize resource utilization across multiple queues, reduce job wait times, and improve overall cluster efficiency. The system provides configurable thresholds, engine/creator filtering, and automatic fallback to ensure stability.

Related issues/PRs

Related issues: close apache#5415
Related pr:none

Brief change log

  • Add performSmartQueueSelection method in DefaultEngineCreateService to execute queue selection before YarnResource creation
  • Add configuration parameters for secondary queue feature (enable, threshold, supported engines, supported creators)
  • Add intelligent queue selection logic with three-dimensional resource usage evaluation (memory, CPU, instances)
  • Update RMConfiguration with secondary queue settings

Checklist

  • I have read the Contributing Guidelines on pull requests.
  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible
  • If this is a code change: I have written unit tests to fully verify the new behavior.

v-kkhuang and others added 3 commits March 31, 2026 09:32
…eption (#964)

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 修复: * 增加任务重试开关覆盖范围
@v-kkhuang v-kkhuang added the enhancement New feature or request label Apr 13, 2026
@v-kkhuang v-kkhuang changed the base branch from dev-1.18.0-webank to dev-1.18.2-webank April 13, 2026 08:17
Copy link
Copy Markdown

@casionone casionone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@casionone casionone merged commit ee1213a into dev-1.18.2-webank Apr 13, 2026
12 of 16 checks passed
v-kkhuang added a commit that referenced this pull request Apr 16, 2026
…d on yarn resource usage (#986)

* [fix][CGS][engineconn] fix sr task retry causing init_sql loading exception (#964)

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 修复: * 增加任务重试开关覆盖范围

* #AI commit# 开发阶段: spark支持第二队列选择

* #AI commit# 开发阶段: 优化第二队列逻辑

* #AI commit# 开发阶段: 优化 wds.linkis.rm.secondary.yarnqueue.enable默认值

* #AI commit# 开发阶段: refactor: translate Chinese logs to English in smart queue selection

- Translate all Chinese log messages to English for consistency
- Update comments and documentation to English
- No functional changes, only log message translation

* Revert "#AI commit# 开发阶段: refactor: translate Chinese logs to English in smart queue selection"

This reverts commit 47fb4e6.

* #AI commit# 开发阶段: 优化日志英文打印

* #AI commit# 开发阶段: 优化日志英文打印
v-kkhuang added a commit that referenced this pull request Apr 16, 2026
…d on yarn resource usage (#986)

* [fix][CGS][engineconn] fix sr task retry causing init_sql loading exception (#964)

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 修复: * 增加任务重试开关覆盖范围

* #AI commit# 开发阶段: spark支持第二队列选择

* #AI commit# 开发阶段: 优化第二队列逻辑

* #AI commit# 开发阶段: 优化 wds.linkis.rm.secondary.yarnqueue.enable默认值

* #AI commit# 开发阶段: refactor: translate Chinese logs to English in smart queue selection

- Translate all Chinese log messages to English for consistency
- Update comments and documentation to English
- No functional changes, only log message translation

* Revert "#AI commit# 开发阶段: refactor: translate Chinese logs to English in smart queue selection"

This reverts commit 47fb4e6.

* #AI commit# 开发阶段: 优化日志英文打印

* #AI commit# 开发阶段: 优化日志英文打印
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants