Skip to content

<fix>[securitygroup]: remove strict sequential priority restriction#3344

Closed
MatheMatrix wants to merge 84 commits into5.5.6from
sync/ye.zou/fix/ZSTAC-79067
Closed

<fix>[securitygroup]: remove strict sequential priority restriction#3344
MatheMatrix wants to merge 84 commits into5.5.6from
sync/ye.zou/fix/ZSTAC-79067

Conversation

@MatheMatrix
Copy link
Copy Markdown
Owner

Resolves: ZSTAC-79067\n\nRemove overly strict sequential priority validation for security group rules API.

sync from gitlab !9173

…isk offering

Resolves: ZSTAC-74683

Change-Id: Id0339ed0221e92e506f60745cde972cc3ee6d9ae
When anti-split-brain check selects a disconnected MDS node, the HTTP
call now times out after 30s instead of 5+ minutes, and automatically
retries the next available MDS via tryNext mechanism.

Resolves: ZSTAC-80595

Change-Id: I1be80f1b70cad1606eb38d1f0078c8f2781e6941
When MN restarts during a destroy operation, the hypervisor may report
the VM as Stopped. Without this transition, the state machine throws
an exception and the VM stays stuck in Destroying state forever.

Resolves: ZSTAC-80620

Change-Id: I037edba70d145a44a88ce0d3573089182fedb162
…pacity

Resolves: ZSTAC-79709

Change-Id: I45a2133bbb8c51c25ae3549d59e588976192a08d
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

放宽安全组规则优先级验证为仅要求正整数并保留全局上限,并发与稳定性改进(管理节点/调度/资源目的地同步化)、若干存储/网络/VM细节修正、测试用例调整以匹配新优先级行为,以及新增错误码与数据脱敏逻辑。

Changes

Cohort / File(s) Summary
安全组 API 验证逻辑
plugin/securityGroup/src/main/java/org/zstack/network/securitygroup/SecurityGroupApiInterceptor.java
移除“从1开始且连续”校验,改为仅检查优先级为正整数(>=1)并保留对全局最大值的检查,更新错误提示。
相关测试用例
test/src/test/groovy/.../securitygroup/AddSecurityGroupRuleOptimizedCase.groovy, test/src/test/groovy/.../securitygroup/ChangeSecurityGroupRuleCase.groovy
调整断言:不再期待非连续优先级抛出异常,改为允许添加/更新并在操作后校验规则存在;添加行为变更注释。
管理节点与并发控制
core/src/main/java/.../cloudbus/ResourceDestinationMakerImpl.java, portal/src/main/java/.../ManagementNodeManagerImpl.java, core/src/main/java/.../thread/DispatchQueueImpl.java
增加 synchronized / lifecycleLock 及两阶段缺失节点检测以防竞态;调度任务在 telemetry 情况下保留或不保留父追踪上下文(Conditional tracing),提高并发安全与心跳/事件一致性。
VM/网络小修正
compute/src/main/java/.../VmNicManagerImpl.java, header/src/main/java/.../vm/VmInstanceSpec.java, header/src/main/java/.../vm/VmInstanceState.java, network/src/main/java/.../l3/L3BasicNetwork.java
添加 VmNic 空值保护,root 磁盘大小回退取 virtual 与 actual 的较大值,Destroying 状态新增 stopped 转换;APIGetFreeIp 结果过滤掉 ReservedIpRange 范围内的 IP。
存储与网络服务改动
plugin/ceph/src/main/java/.../CephPrimaryStorageBase.java, plugin/zbs/src/main/java/.../ZbsStorageController.java, header/src/main/java/.../storage/addon/primary/ExternalPrimaryStorageInventory.java
快照删除容量释放改为按比率调用;ZBS 的 GetVolumeClients 改为可重试的 HttpCaller 调用并新增 setTryNext(bool);为外部主存储脱敏 mdsUrls/mdsInfos。
负载均衡与错误码
plugin/loadBalancer/src/main/java/.../LoadBalancerApiInterceptor.java, utils/src/main/java/.../clouderrorcode/CloudOperationsErrorCode.java
拦截删除 SLB 场景以在灰度升级期间阻断删除操作(受全局配置控制);新增备份取消超时错误码常量。

Sequence Diagram(s)

sequenceDiagram
    participant Heartbeat as Heartbeat/Reconciler
    participant Manager as ManagementNodeManager
    participant DB as Database
    participant HashRing as HashRing

    Note over Heartbeat,Manager: 周期性心跳/重建流程
    Heartbeat->>Manager: 检测到节点列表差异
    Manager->>DB: 查询 ManagementNodeVO(可能缺失)
    alt 节点在 DB 中存在
        Manager->>HashRing: schedule nodeJoin(on node's thread)
        HashRing-->>Manager: 更新/加入节点信息
    else 节点在 DB 中缺失(第一次)
        Manager->>Manager: 标记 suspectedMissingFromDb(暂存)
    else 节点在 DB 中缺失(连续检测)
        Manager->>HashRing: 从 hash ring 移除节点
        HashRing-->>Manager: 触发 nodeLeft 流程
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰🌿 优先级变得随意些,
正数为准不作堆,
节点心跳更谨慎,
存储呼叫会重试,
小兔轻跳庆改回。

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed 标题遵循 [scope]: 格式,共67个字符,清晰准确地描述了PR的主要变化:移除安全组规则的严格顺序优先级限制。
Description check ✅ Passed 描述相关联地指向了具体的问题单号 ZSTAC-79067,说明了变更目的(移除过度严格的顺序优先级验证),并注明了来源。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sync/ye.zou/fix/ZSTAC-79067

Comment @coderabbitai help to get the list of available commands and usage tips.

Resolves: ZSTAC-78989

Change-Id: I0fe3a56ab724978944c69afadaab7ff7353e4c0f
Resolves: ZSTAC-82153

Change-Id: Ib51c2e21553277416d1a9444be55aca2aa4b2fc4
…fterAddIpAddress to prevent NPE during rollback

Resolves: ZSTAC-81741

Change-Id: I53bcf20a10306afc7b6172da294d347b74e6c41f
Resolves: ZSTAC-81182

Change-Id: Id1bb642154dc66ae9995dcc4d9fc00cdce9bcaf8
@MatheMatrix MatheMatrix force-pushed the sync/ye.zou/fix/ZSTAC-79067 branch from bb2a0da to f7b3159 Compare February 13, 2026 10:47
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@plugin/securityGroup/src/main/java/org/zstack/network/securitygroup/SecurityGroupApiInterceptor.java`:
- Around line 491-493: The priority validation in SecurityGroupApiInterceptor
currently only enforces ao.getPriority() > 0 and misses an upper bound; update
the check in the method that validates ao.getPriority() to also ensure
ao.getPriority() <= SECURITY_GROUP_RULES_NUM_LIMIT and throw an
ApiMessageInterceptionException with a new appropriate error code (instead of
ORG_ZSTACK_NETWORK_SECURITYGROUP_10042) when the value exceeds the limit, using
a clear error message that includes the provided priority and the max allowed
(reference SECURITY_GROUP_RULES_NUM_LIMIT and the validation block around
ao.getPriority()).

gitlab and others added 8 commits February 16, 2026 00:50
<fix>[vm]: add Destroying->Stopped state transition

See merge request zstackio/zstack!9156
<fix>[i18n]: improve snapshot error message for unattached volume

See merge request zstackio/zstack!9192
<fix>[zbs]: reduce mds connect timeout and enable tryNext for volume clients

See merge request zstackio/zstack!9153
<fix>[vm]: use max of virtual and actual size for root disk allocation

See merge request zstackio/zstack!9155
…talling

In dual management node scenarios, concurrent modifications to the
consistent hash ring from heartbeat reconciliation and canonical event
callbacks can cause NodeHash/Nodes inconsistency, leading to message
routing failures and task timeouts.

Fix: (1) synchronized all ResourceDestinationMakerImpl methods to
ensure atomic nodeHash+nodes updates, (2) added lifecycleLock in
ManagementNodeManagerImpl to serialize heartbeat reconciliation with
event callbacks, (3) added two-round delayed confirmation before
removing nodes from hash ring to avoid race with NodeJoin events.

Resolves: ZSTAC-77711

Change-Id: I3d33d53595dd302784dff17417a5b25f2d0f3426
<fix>[network]: filter reserved IPs in getFreeIp

See merge request zstackio/zstack!9170
<fix>[ceph]: apply over-provisioning ratio when releasing snapshot capacity

See merge request zstackio/zstack!9162
<fix>[network]: add null check for L3 network system tags in API interceptor

See merge request zstackio/zstack!9169
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
test/src/test/groovy/org/zstack/test/integration/networkservice/provider/flat/securitygroup/AddSecurityGroupRuleOptimizedCase.groovy (1)

612-617: 建议添加断言验证规则添加结果。

当前代码块添加了混合规则(ingress + egress)在优先级 12,但没有断言验证规则是否成功添加。与上方 594-600 行的处理方式(有 assert sg3.rules.find { it.priority == 13 } != null)保持一致会更好。

♻️ 建议的改进
         // ZSTAC-79067: mixed ingress+egress add at non-consecutive priority is now allowed
         sg3 = addSecurityGroupRule {
             securityGroupUuid = sg3.uuid
             rules = [rule_12, ingressRule]
             priority = 12
         }
+        assert sg3.rules.find { it.dstIpRange == "2.2.2.2-2.2.2.10" && it.priority == 12 } != null
+        assert sg3.rules.find { it.type == "Ingress" && it.dstPortRange == "12-13" } != null
test/src/test/groovy/org/zstack/test/integration/networkservice/provider/flat/securitygroup/ChangeSecurityGroupRuleCase.groovy (1)

234-236: 存在未使用的变量 rule_1

方法 testChangeRulePriorityError 中,第 232 行获取了 rule_1 变量,但由于移除了连续优先级限制的测试断言,该变量现在未被使用。建议清理未使用的代码。

♻️ 建议的清理
     void testChangeRulePriorityError() {
         sg3 = querySecurityGroup {
             conditions = ["uuid=${sg3.uuid}"]
         }[0]

         assert sg3 != null
-        SecurityGroupRuleInventory rule_1 = sg3.rules.find { it.type == "Ingress" && it.priority == 1 && it.ipVersion == 4 }

         // ZSTAC-79067: priorities beyond current rule count are now allowed
         // (removed consecutive priority restriction)
     }

AlanJager and others added 10 commits February 16, 2026 17:11
The mdsUrls field in ExternalPrimaryStorage config contains
user:password@host format credentials. Add desensitization to
mask credentials as ***@host in API/CLI output.

Resolves: ZSTAC-80664

Change-Id: I94bdede5a1b52eb039de70efb5458693484405f7
Add ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT constant to
CloudOperationsErrorCode for use in premium volumebackup module.

Resolves: ZSTAC-82195

Change-Id: Ibc405876e1171b637cf76b91a6822574fb6e7811
<fix>[core]: synchronize consistent hash ring to prevent dual-MN race condition

See merge request zstackio/zstack!9154
SyncTaskFuture constructor calls Context.current() unconditionally,
triggering ServiceLoader for ContextStorageProvider even when
telemetry is disabled. If sentry-opentelemetry-bootstrap jar is
on classpath, ServiceLoader fails with "not a subtype" due to
ClassLoader isolation in Tomcat, throwing ServiceConfigurationError
(extends Error) that escapes all catch(Exception) blocks.

1. Why is this change necessary?
MN startup crashes with ORG_ZSTACK_CORE_WORKFLOW_10001 because
Context.current() triggers ServiceLoader unconditionally in
SyncTaskFuture constructor, even when telemetry is disabled.

2. How does it address the problem?
Only call Context.current() when isTelemetryEnabled() returns
true, matching the existing guard pattern used in other
DispatchQueueImpl code paths (lines 351, 1069).

3. Are there any side effects?
None. When telemetry is disabled, parentContext was never used.

# Summary of changes (by module):
- core/thread: conditionalize Context.current() in SyncTaskFuture

Related: ZSTAC-82275
Change-Id: I5c0e1f15769c746c630028a29df8cf1815620608
<fix>[thread]: guard Context.current() with telemetry check

See merge request zstackio/zstack!9202
Resolves: ZSTAC-82195

Change-Id: I3d5e91d09d7c088d3c53e3839f8b32f4bce32dec
<fix>[loadBalancer]: block SLB deletion during grayscale upgrade

See merge request zstackio/zstack!9187
<fix>[volumebackup]: add backup cancel timeout error code

See merge request zstackio/zstack!9200
<fix>[core]: add @nologging to sensitive config fields

See merge request zstackio/zstack!9171
…plit-brain

When a management node departs, its VM skip-trace entries were
immediately removed. If VMs were still being started by kvmagent,
the next VM sync would falsely detect them as Stopped and trigger
HA, causing split-brain.

Fix: transfer departed MN skip-trace entries to an orphaned set with
10-minute TTL instead of immediate deletion. VMs in the orphaned set
remain skip-traced until the TTL expires or they are explicitly
continued, preventing false HA triggers during MN restart scenarios.

Resolves: ZSTAC-80821

Change-Id: I3222e260b2d7b33dc43aba0431ce59a788566b34
gitlab and others added 26 commits February 26, 2026 10:36
<fix>[expon]: fix vhost installPath overwrite and test cleanup

See merge request zstackio/zstack!9194
When SelectBackupStorageMsg carries preferBsTypes, the sorting
logic uses indexOf() which returns -1 for non-preferred types
(e.g. VCenter BS), causing them to sort before preferred ones.
This fix filters backup storages by preferBsTypes first, then
sorts within the filtered set. Also adds error code
ADDON_PRIMARY_10015 when no matching backup storage is available.

1. Why is this change necessary?
SelectBackupStorageMsg.preferBsTypes is used to sort backup
storages via indexOf() on the preference list. However,
indexOf() returns -1 for types not in the list, which sorts
non-preferred types (like VCenter BS) before preferred ones.
In a mixed VCenter + Expon environment, this causes VCenter
BS to be incorrectly selected over the intended Expon BS.

2. How does it address the problem?
Instead of only sorting, the code now first filters the
candidate backup storages to only include those whose bsType
matches the preferBsTypes list. Then it sorts within the
filtered set. A new error code ADDON_PRIMARY_10015 is added
for the case where no matching backup storage is available
after filtering. All 10 locale i18n JSON files are updated.

3. Are there any side effects?
None. The filtering is only applied when preferBsTypes is
non-empty, preserving existing behavior for other callers.

# Summary of changes (by module):
- storage: filter BS candidates by preferBsTypes before sort
- utils: add ADDON_PRIMARY_10015 error code constant
- conf/i18n: add i18n entries for new error code (10 locales)
- test: add ExternalPrimaryStorageSelectBackupStorageCase

Related: ZSTAC-71706
Change-Id: Ia3af38cc50e69132a1c769180792363495c1080f
  Add global error code internationalization support that translates
  ErrorCode messages based on the client's Accept-Language header.

  Core changes:
  - Add GlobalErrorCodeI18nService interface and implementation that
    loads locale-specific JSON templates from classpath at startup
  - Add LocaleUtils to parse Accept-Language headers and resolve the
    best matching locale with quality-based priority
  - Extend ErrorCode with message and formatArgs fields to carry
    localized text and format parameters
  - Wire Platform.err to populate formatArgs when args are present

  REST integration:
  - RESTApiController: localize ErrorCode for both MessageReply and
    APIEvent responses, resolve locale from Accept-Language header
  - RestServer: localize error responses using LocaleUtils, use
    LocaleUtils.DEFAULT_LOCALE constant instead of hardcoded string

  Robustness:
  - Filter Accept-Language entries with quality <= 0
  - Defensive copy of formatArgs in ErrorCode getter/setter/copy-ctor
  - try-finally for HttpURLConnection in integration tests
  - Log exception stacktrace in localization failure catch block

  Tests:
  - Unit tests for LocaleUtils, GlobalErrorCodeI18nService
  - Integration test ErrorCodeI18nCase covering sync/async API and
    Accept-Language fallback behavior

 Resolves: ZSTAC-81675

Change-Id: I706d6d6b6f687662656e6576736d73676c7a6e6a
<fix>[storage]: fix wrong BS selected in mixed VCenter env

See merge request zstackio/zstack!9243
<feature>[errorcode]: global error code i18n

See merge request zstackio/zstack!9246
copy zstack-cli/zstack-ctl from venv to /usr/bin before chmod

Python 3.11 pip (>=19.0) no longer installs data_files to absolute
paths when running inside a virtualenv.

Resolves: ZSTAC-73161

Change-Id: I6c74676f6f6b786e616972786a717777797a626f
<fix>[ansible]: support python3 ansible install

See merge request zstackio/zstack!9254
Add global error code i18n mapping for ORG_ZSTACK_AI_MESSAGE_10003
across all 10 language files.

Resolves: ZSTAC-82084

Change-Id: Ic1ce5d983b6651708003098363158728fad2fb85
…/sys

Add ensure_python3_venv() to detect legacy Python 2 venvs and recreate
them as Python 3.11. Previously zstack-cli only checked if the venv
directory existed, causing it to reuse a Python 2 venv (pip 6.1.1)
during upgrade, which failed to install packages.

Also fix:
- ansible version check: 2.11.12.3 -> 2.16.14 to match ansible 9.13.0
- ansible/ansible-playbook wrapper: use absolute path to avoid recursion

Resolves: ZSTAC-82619

Change-Id: I7362737267617a626d75786c646a7a776f69766f
<fix>[ansible]: support python3 ansible install

See merge request zstackio/zstack!9259
Resolves: ZSTAC-72079

Change-Id: Ifa08d55dedbc6ff7beaf96c96142b470d1993dbf
Resolves: ZSTAC-72079

Change-Id: I56358d8064ae020e564f2682518f215c8bb96bd5
…lity

Resolves: ZSTAC-72079

Change-Id: Ie345ac5c788d4b53912ee80ec7dd437bf149601f
use Json_getKeyValue from beforeMigrate.sql to replace
JSON_EXTRACT which requires MySQL 5.7+, and change
endTime column type from TIMESTAMP to DATETIME.

Resolves: ZSTAC-72079

Change-Id: I50f05f114eae474f94f1046811dc0a8734b88c42
Redesign findSimilar() with a three-phase strategy to prevent
performance degradation when operr() format args contain very
long strings (e.g., serialized ErrorCodeList or HTML bodies):

Phase 1: regex match against raw fmt template (always short).
Phase 2: fallback to formatted string only if Phase 1 misses
         and length <= maxElaborationRegex (8192).
Phase 3: distance match always uses raw fmt template.

Also add length guard in findMostSimilarRegex() and remove
redundant String.format length check in Platform.elaborate().

Root cause: StringSimilarity.findSimilar() was running 199
regex patterns via ReTree against multi-KB formatted strings,
causing 7+ second latency in error code creation hot paths.

Resolves: ZSTAC-72079

Change-Id: I38b98a762deb436da31e4884da05d12b38b98a76
<fix>[ai]: add eval task sort columns for ZQL and SDK fields

See merge request zstackio/zstack!9266
… N+1 SQL queries

Reduces SQL count from ~52 to ~6 for VM queries and ~32 to ~5 for Host
queries by enabling Hibernate batch fetching for EAGER collections.

Resolves: ZSTAC-79217

Change-Id: I6bb2a852c1f3f3780f2bc6cf90ce330ae1b6d9fb
Resolves: ZSTAC-51417

Change-Id: I0e49827c4c9c07c1be5e7a03b8817a21d424d95d
refactor[core]: add hibernate.default_batch_fetch_size=50 to reduce N 1 SQL queries

See merge request zstackio/zstack!9282
ZSTAC-80898

Change-Id: If279bcbcd53634346188ddc2f218dd81b2e0075e
fix(vm): handle NoState + Expunging safety net [ZSTAC-80898]

See merge request zstackio/zstack!9275
1. synchronized access to errors/missed LinkedHashMap(accessOrder=true)
   to fix concurrent modification bug causing 7s latency spikes
2. cache miss in findSimilar to avoid repeated expensive regex scans
3. remove unrelated PRE_PUSH_CHECKER_TEST GlobalConfig to fix CI

Resolves: ZSTAC-72079

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
<fix>[compute]: add quota check for VM CPU/memory upgrade to show correct error message (ZSTAC-51417)

See merge request zstackio/zstack!9284
<fix>[utils]: fix StringSimilarity concurrency, distance filter, slow log (ZSTAC-72079)

See merge request zstackio/zstack!9269
<fix>[ai]: add i18n for AI_MESSAGE_10003 ZSTAC-82084

See merge request zstackio/zstack!9256
Resolves: ZSTAC-79067

Change-Id: I5d788cfc99b7292d1078a88fee635bd83fb5b5f0
@MatheMatrix MatheMatrix force-pushed the sync/ye.zou/fix/ZSTAC-79067 branch from 522ed42 to 931d8d9 Compare March 8, 2026 15:45
…s to match new validation

Resolves: ZSTAC-79067

Change-Id: I6f134853bf7e16834a3d38df34334ebafd589167
@zstack-robot-1 zstack-robot-1 deleted the sync/ye.zou/fix/ZSTAC-79067 branch March 13, 2026 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants