Skip to content

[refactor][CGS][application-manager] optimize queue selection execution timing and configuration retrieval#988

Merged
casionone merged 16 commits intodev-1.18.2-webankfrom
dev-1.18.0-secondary-queue
Apr 14, 2026
Merged

[refactor][CGS][application-manager] optimize queue selection execution timing and configuration retrieval#988
casionone merged 16 commits intodev-1.18.2-webankfrom
dev-1.18.0-secondary-queue

Conversation

@v-kkhuang
Copy link
Copy Markdown

What is the purpose of the change

Background/Problem:
The smart queue selection logic was previously executed in the createEngineConn method before resource generation, which could lead to inconsistent queue configuration between the resource request and the actual engine creation process. Additionally, queue configuration values were not properly trimmed, potentially causing comparison issues.

Purpose of Change:
To address this problem, this PR moves the smart queue selection execution from createEngineConn to inside the generateResource method, ensuring queue selection happens in the correct context with proper configuration enrichment. Also adds trim() to queue configuration values to eliminate whitespace issues.

Value/Impact:
After the change, the queue selection logic is better positioned in the execution flow, ensuring consistency between resource generation and engine creation. The trim operation prevents configuration mismatches due to trailing or leading whitespace.

Related issues/PRs

Related issues: close apache#5415
Related pr:none

Brief change log

  • Move performSmartQueueSelection execution from createEngineConn to generateResource method
  • Add trim() to primaryQueue and secondaryQueue configuration retrieval
  • Update step numbering in code comments to reflect the new execution flow
  • Ensure queue selection happens after console configuration enrichment

Checklist

  • I have read the Contributing Guidelines on pull requests.
  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible
  • If this is a code change: I have written unit tests to fully verify the new behavior.

v-kkhuang and others added 16 commits March 31, 2026 09:32
…eption (#964)

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 修复: * 增加任务重试开关覆盖范围
…t queue selection

- Translate all Chinese log messages to English for consistency
- Update comments and documentation to English
- No functional changes, only log message translation
Add permission validation before using secondary queue to prevent task submission failures:

Features:
- Add configuration SECONDARY_QUEUE_PERMISSION_CHECK_ENABLED to enable/disable permission check
- Add configuration SECONDARY_QUEUE_ALLOWED_USERS to configure user whitelist
- Modify performSmartQueueSelection method to accept user parameter
- Add checkQueuePermission method to validate user access to secondary queue
- If user has no permission, log warning and fallback to primary queue
- Prevents task submission failures due to insufficient queue permissions

Configuration:
- wds.linkis.rm.secondary.yarnqueue.permission.check.enable (default: false)
- wds.linkis.rm.secondary.yarnqueue.allowed.users (default: empty)
…econdary queue

Replace configuration-based whitelist with actual Yarn permission verification:

Changes:
- Remove configuration items SECONDARY_QUEUE_PERMISSION_CHECK_ENABLED and SECONDARY_QUEUE_ALLOWED_USERS
- Rewrite checkQueuePermission method to use Yarn API for real permission validation
- Query Yarn app info via externalResourceService.getAppInfo to verify user access
- Detect permission errors (403/404/forbidden/unauthorized) and fallback to primary queue
- Handle transient errors gracefully to avoid blocking legitimate users

Permission Check Logic:
1. Try to get app info from target queue using Yarn REST API
2. If successful (even with empty app list) → user has permission
3. If permission error (403/404) → log warning and return false
4. If other error (network/timeout) → assume OK to avoid blocking
Copy link
Copy Markdown

@casionone casionone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@casionone casionone merged commit 40ac87e into dev-1.18.2-webank Apr 14, 2026
12 of 16 checks passed
v-kkhuang added a commit that referenced this pull request Apr 16, 2026
…on timing and configuration retrieval (#988)

* [fix][CGS][engineconn] fix sr task retry causing init_sql loading exception (#964)

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 修复: * 增加任务重试开关覆盖范围

* #AI commit# 开发阶段: spark支持第二队列选择

* #AI commit# 开发阶段: 优化第二队列逻辑

* #AI commit# 开发阶段: 优化 wds.linkis.rm.secondary.yarnqueue.enable默认值

* #AI commit# 开发阶段: refactor: translate Chinese logs to English in smart queue selection

- Translate all Chinese log messages to English for consistency
- Update comments and documentation to English
- No functional changes, only log message translation

* Revert "#AI commit# 开发阶段: refactor: translate Chinese logs to English in smart queue selection"

This reverts commit 47fb4e6.

* #AI commit# 开发阶段: 优化日志英文打印

* #AI commit# 开发阶段: 优化日志英文打印

* #AI commit# 开发阶段: 优化spark备用队列执行逻辑

* #AI commit# feat: add permission check for secondary queue selection

Add permission validation before using secondary queue to prevent task submission failures:

Features:
- Add configuration SECONDARY_QUEUE_PERMISSION_CHECK_ENABLED to enable/disable permission check
- Add configuration SECONDARY_QUEUE_ALLOWED_USERS to configure user whitelist
- Modify performSmartQueueSelection method to accept user parameter
- Add checkQueuePermission method to validate user access to secondary queue
- If user has no permission, log warning and fallback to primary queue
- Prevents task submission failures due to insufficient queue permissions

Configuration:
- wds.linkis.rm.secondary.yarnqueue.permission.check.enable (default: false)
- wds.linkis.rm.secondary.yarnqueue.allowed.users (default: empty)

* #AI commit# refactor: implement Yarn API-based permission check for secondary queue

Replace configuration-based whitelist with actual Yarn permission verification:

Changes:
- Remove configuration items SECONDARY_QUEUE_PERMISSION_CHECK_ENABLED and SECONDARY_QUEUE_ALLOWED_USERS
- Rewrite checkQueuePermission method to use Yarn API for real permission validation
- Query Yarn app info via externalResourceService.getAppInfo to verify user access
- Detect permission errors (403/404/forbidden/unauthorized) and fallback to primary queue
- Handle transient errors gracefully to avoid blocking legitimate users

Permission Check Logic:
1. Try to get app info from target queue using Yarn REST API
2. If successful (even with empty app list) → user has permission
3. If permission error (403/404) → log warning and return false
4. If other error (network/timeout) → assume OK to avoid blocking

* Revert "#AI commit# refactor: implement Yarn API-based permission check for secondary queue"

This reverts commit f91be62.

* Revert "#AI commit# feat: add permission check for secondary queue selection"

This reverts commit 08dad25.

* #AI commit# 开发阶段: 优化获取配置队列方式
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants