<fix>[securitygroup]: remove strict sequential priority restriction#3344
<fix>[securitygroup]: remove strict sequential priority restriction#3344MatheMatrix wants to merge 84 commits into5.5.6from
Conversation
…isk offering Resolves: ZSTAC-74683 Change-Id: Id0339ed0221e92e506f60745cde972cc3ee6d9ae
When anti-split-brain check selects a disconnected MDS node, the HTTP call now times out after 30s instead of 5+ minutes, and automatically retries the next available MDS via tryNext mechanism. Resolves: ZSTAC-80595 Change-Id: I1be80f1b70cad1606eb38d1f0078c8f2781e6941
When MN restarts during a destroy operation, the hypervisor may report the VM as Stopped. Without this transition, the state machine throws an exception and the VM stays stuck in Destroying state forever. Resolves: ZSTAC-80620 Change-Id: I037edba70d145a44a88ce0d3573089182fedb162
…pacity Resolves: ZSTAC-79709 Change-Id: I45a2133bbb8c51c25ae3549d59e588976192a08d
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
Walkthrough放宽安全组规则优先级验证为仅要求正整数并保留全局上限,并发与稳定性改进(管理节点/调度/资源目的地同步化)、若干存储/网络/VM细节修正、测试用例调整以匹配新优先级行为,以及新增错误码与数据脱敏逻辑。 Changes
Sequence Diagram(s)sequenceDiagram
participant Heartbeat as Heartbeat/Reconciler
participant Manager as ManagementNodeManager
participant DB as Database
participant HashRing as HashRing
Note over Heartbeat,Manager: 周期性心跳/重建流程
Heartbeat->>Manager: 检测到节点列表差异
Manager->>DB: 查询 ManagementNodeVO(可能缺失)
alt 节点在 DB 中存在
Manager->>HashRing: schedule nodeJoin(on node's thread)
HashRing-->>Manager: 更新/加入节点信息
else 节点在 DB 中缺失(第一次)
Manager->>Manager: 标记 suspectedMissingFromDb(暂存)
else 节点在 DB 中缺失(连续检测)
Manager->>HashRing: 从 hash ring 移除节点
HashRing-->>Manager: 触发 nodeLeft 流程
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Resolves: ZSTAC-78989 Change-Id: I0fe3a56ab724978944c69afadaab7ff7353e4c0f
Resolves: ZSTAC-82153 Change-Id: Ib51c2e21553277416d1a9444be55aca2aa4b2fc4
…fterAddIpAddress to prevent NPE during rollback Resolves: ZSTAC-81741 Change-Id: I53bcf20a10306afc7b6172da294d347b74e6c41f
Resolves: ZSTAC-81182 Change-Id: Id1bb642154dc66ae9995dcc4d9fc00cdce9bcaf8
bb2a0da to
f7b3159
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@plugin/securityGroup/src/main/java/org/zstack/network/securitygroup/SecurityGroupApiInterceptor.java`:
- Around line 491-493: The priority validation in SecurityGroupApiInterceptor
currently only enforces ao.getPriority() > 0 and misses an upper bound; update
the check in the method that validates ao.getPriority() to also ensure
ao.getPriority() <= SECURITY_GROUP_RULES_NUM_LIMIT and throw an
ApiMessageInterceptionException with a new appropriate error code (instead of
ORG_ZSTACK_NETWORK_SECURITYGROUP_10042) when the value exceeds the limit, using
a clear error message that includes the provided priority and the max allowed
(reference SECURITY_GROUP_RULES_NUM_LIMIT and the validation block around
ao.getPriority()).
<fix>[vm]: add Destroying->Stopped state transition See merge request zstackio/zstack!9156
<fix>[i18n]: improve snapshot error message for unattached volume See merge request zstackio/zstack!9192
<fix>[zbs]: reduce mds connect timeout and enable tryNext for volume clients See merge request zstackio/zstack!9153
<fix>[vm]: use max of virtual and actual size for root disk allocation See merge request zstackio/zstack!9155
…talling In dual management node scenarios, concurrent modifications to the consistent hash ring from heartbeat reconciliation and canonical event callbacks can cause NodeHash/Nodes inconsistency, leading to message routing failures and task timeouts. Fix: (1) synchronized all ResourceDestinationMakerImpl methods to ensure atomic nodeHash+nodes updates, (2) added lifecycleLock in ManagementNodeManagerImpl to serialize heartbeat reconciliation with event callbacks, (3) added two-round delayed confirmation before removing nodes from hash ring to avoid race with NodeJoin events. Resolves: ZSTAC-77711 Change-Id: I3d33d53595dd302784dff17417a5b25f2d0f3426
<fix>[network]: filter reserved IPs in getFreeIp See merge request zstackio/zstack!9170
<fix>[ceph]: apply over-provisioning ratio when releasing snapshot capacity See merge request zstackio/zstack!9162
<fix>[network]: add null check for L3 network system tags in API interceptor See merge request zstackio/zstack!9169
There was a problem hiding this comment.
🧹 Nitpick comments (2)
test/src/test/groovy/org/zstack/test/integration/networkservice/provider/flat/securitygroup/AddSecurityGroupRuleOptimizedCase.groovy (1)
612-617: 建议添加断言验证规则添加结果。当前代码块添加了混合规则(ingress + egress)在优先级 12,但没有断言验证规则是否成功添加。与上方 594-600 行的处理方式(有
assert sg3.rules.find { it.priority == 13 } != null)保持一致会更好。♻️ 建议的改进
// ZSTAC-79067: mixed ingress+egress add at non-consecutive priority is now allowed sg3 = addSecurityGroupRule { securityGroupUuid = sg3.uuid rules = [rule_12, ingressRule] priority = 12 } + assert sg3.rules.find { it.dstIpRange == "2.2.2.2-2.2.2.10" && it.priority == 12 } != null + assert sg3.rules.find { it.type == "Ingress" && it.dstPortRange == "12-13" } != nulltest/src/test/groovy/org/zstack/test/integration/networkservice/provider/flat/securitygroup/ChangeSecurityGroupRuleCase.groovy (1)
234-236: 存在未使用的变量rule_1。方法
testChangeRulePriorityError中,第 232 行获取了rule_1变量,但由于移除了连续优先级限制的测试断言,该变量现在未被使用。建议清理未使用的代码。♻️ 建议的清理
void testChangeRulePriorityError() { sg3 = querySecurityGroup { conditions = ["uuid=${sg3.uuid}"] }[0] assert sg3 != null - SecurityGroupRuleInventory rule_1 = sg3.rules.find { it.type == "Ingress" && it.priority == 1 && it.ipVersion == 4 } // ZSTAC-79067: priorities beyond current rule count are now allowed // (removed consecutive priority restriction) }
The mdsUrls field in ExternalPrimaryStorage config contains user:password@host format credentials. Add desensitization to mask credentials as ***@host in API/CLI output. Resolves: ZSTAC-80664 Change-Id: I94bdede5a1b52eb039de70efb5458693484405f7
Add ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT constant to CloudOperationsErrorCode for use in premium volumebackup module. Resolves: ZSTAC-82195 Change-Id: Ibc405876e1171b637cf76b91a6822574fb6e7811
<fix>[core]: synchronize consistent hash ring to prevent dual-MN race condition See merge request zstackio/zstack!9154
SyncTaskFuture constructor calls Context.current() unconditionally, triggering ServiceLoader for ContextStorageProvider even when telemetry is disabled. If sentry-opentelemetry-bootstrap jar is on classpath, ServiceLoader fails with "not a subtype" due to ClassLoader isolation in Tomcat, throwing ServiceConfigurationError (extends Error) that escapes all catch(Exception) blocks. 1. Why is this change necessary? MN startup crashes with ORG_ZSTACK_CORE_WORKFLOW_10001 because Context.current() triggers ServiceLoader unconditionally in SyncTaskFuture constructor, even when telemetry is disabled. 2. How does it address the problem? Only call Context.current() when isTelemetryEnabled() returns true, matching the existing guard pattern used in other DispatchQueueImpl code paths (lines 351, 1069). 3. Are there any side effects? None. When telemetry is disabled, parentContext was never used. # Summary of changes (by module): - core/thread: conditionalize Context.current() in SyncTaskFuture Related: ZSTAC-82275 Change-Id: I5c0e1f15769c746c630028a29df8cf1815620608
<fix>[thread]: guard Context.current() with telemetry check See merge request zstackio/zstack!9202
Resolves: ZSTAC-82195 Change-Id: I3d5e91d09d7c088d3c53e3839f8b32f4bce32dec
<fix>[loadBalancer]: block SLB deletion during grayscale upgrade See merge request zstackio/zstack!9187
<fix>[volumebackup]: add backup cancel timeout error code See merge request zstackio/zstack!9200
<fix>[core]: add @nologging to sensitive config fields See merge request zstackio/zstack!9171
…plit-brain When a management node departs, its VM skip-trace entries were immediately removed. If VMs were still being started by kvmagent, the next VM sync would falsely detect them as Stopped and trigger HA, causing split-brain. Fix: transfer departed MN skip-trace entries to an orphaned set with 10-minute TTL instead of immediate deletion. VMs in the orphaned set remain skip-traced until the TTL expires or they are explicitly continued, preventing false HA triggers during MN restart scenarios. Resolves: ZSTAC-80821 Change-Id: I3222e260b2d7b33dc43aba0431ce59a788566b34
<fix>[expon]: fix vhost installPath overwrite and test cleanup See merge request zstackio/zstack!9194
When SelectBackupStorageMsg carries preferBsTypes, the sorting logic uses indexOf() which returns -1 for non-preferred types (e.g. VCenter BS), causing them to sort before preferred ones. This fix filters backup storages by preferBsTypes first, then sorts within the filtered set. Also adds error code ADDON_PRIMARY_10015 when no matching backup storage is available. 1. Why is this change necessary? SelectBackupStorageMsg.preferBsTypes is used to sort backup storages via indexOf() on the preference list. However, indexOf() returns -1 for types not in the list, which sorts non-preferred types (like VCenter BS) before preferred ones. In a mixed VCenter + Expon environment, this causes VCenter BS to be incorrectly selected over the intended Expon BS. 2. How does it address the problem? Instead of only sorting, the code now first filters the candidate backup storages to only include those whose bsType matches the preferBsTypes list. Then it sorts within the filtered set. A new error code ADDON_PRIMARY_10015 is added for the case where no matching backup storage is available after filtering. All 10 locale i18n JSON files are updated. 3. Are there any side effects? None. The filtering is only applied when preferBsTypes is non-empty, preserving existing behavior for other callers. # Summary of changes (by module): - storage: filter BS candidates by preferBsTypes before sort - utils: add ADDON_PRIMARY_10015 error code constant - conf/i18n: add i18n entries for new error code (10 locales) - test: add ExternalPrimaryStorageSelectBackupStorageCase Related: ZSTAC-71706 Change-Id: Ia3af38cc50e69132a1c769180792363495c1080f
Add global error code internationalization support that translates
ErrorCode messages based on the client's Accept-Language header.
Core changes:
- Add GlobalErrorCodeI18nService interface and implementation that
loads locale-specific JSON templates from classpath at startup
- Add LocaleUtils to parse Accept-Language headers and resolve the
best matching locale with quality-based priority
- Extend ErrorCode with message and formatArgs fields to carry
localized text and format parameters
- Wire Platform.err to populate formatArgs when args are present
REST integration:
- RESTApiController: localize ErrorCode for both MessageReply and
APIEvent responses, resolve locale from Accept-Language header
- RestServer: localize error responses using LocaleUtils, use
LocaleUtils.DEFAULT_LOCALE constant instead of hardcoded string
Robustness:
- Filter Accept-Language entries with quality <= 0
- Defensive copy of formatArgs in ErrorCode getter/setter/copy-ctor
- try-finally for HttpURLConnection in integration tests
- Log exception stacktrace in localization failure catch block
Tests:
- Unit tests for LocaleUtils, GlobalErrorCodeI18nService
- Integration test ErrorCodeI18nCase covering sync/async API and
Accept-Language fallback behavior
Resolves: ZSTAC-81675
Change-Id: I706d6d6b6f687662656e6576736d73676c7a6e6a
<fix>[storage]: fix wrong BS selected in mixed VCenter env See merge request zstackio/zstack!9243
<feature>[errorcode]: global error code i18n See merge request zstackio/zstack!9246
copy zstack-cli/zstack-ctl from venv to /usr/bin before chmod Python 3.11 pip (>=19.0) no longer installs data_files to absolute paths when running inside a virtualenv. Resolves: ZSTAC-73161 Change-Id: I6c74676f6f6b786e616972786a717777797a626f
<fix>[ansible]: support python3 ansible install See merge request zstackio/zstack!9254
Add global error code i18n mapping for ORG_ZSTACK_AI_MESSAGE_10003 across all 10 language files. Resolves: ZSTAC-82084 Change-Id: Ic1ce5d983b6651708003098363158728fad2fb85
…/sys Add ensure_python3_venv() to detect legacy Python 2 venvs and recreate them as Python 3.11. Previously zstack-cli only checked if the venv directory existed, causing it to reuse a Python 2 venv (pip 6.1.1) during upgrade, which failed to install packages. Also fix: - ansible version check: 2.11.12.3 -> 2.16.14 to match ansible 9.13.0 - ansible/ansible-playbook wrapper: use absolute path to avoid recursion Resolves: ZSTAC-82619 Change-Id: I7362737267617a626d75786c646a7a776f69766f
<fix>[ansible]: support python3 ansible install See merge request zstackio/zstack!9259
Resolves: ZSTAC-72079 Change-Id: Ifa08d55dedbc6ff7beaf96c96142b470d1993dbf
Resolves: ZSTAC-72079 Change-Id: I56358d8064ae020e564f2682518f215c8bb96bd5
…lity Resolves: ZSTAC-72079 Change-Id: Ie345ac5c788d4b53912ee80ec7dd437bf149601f
use Json_getKeyValue from beforeMigrate.sql to replace JSON_EXTRACT which requires MySQL 5.7+, and change endTime column type from TIMESTAMP to DATETIME. Resolves: ZSTAC-72079 Change-Id: I50f05f114eae474f94f1046811dc0a8734b88c42
Redesign findSimilar() with a three-phase strategy to prevent
performance degradation when operr() format args contain very
long strings (e.g., serialized ErrorCodeList or HTML bodies):
Phase 1: regex match against raw fmt template (always short).
Phase 2: fallback to formatted string only if Phase 1 misses
and length <= maxElaborationRegex (8192).
Phase 3: distance match always uses raw fmt template.
Also add length guard in findMostSimilarRegex() and remove
redundant String.format length check in Platform.elaborate().
Root cause: StringSimilarity.findSimilar() was running 199
regex patterns via ReTree against multi-KB formatted strings,
causing 7+ second latency in error code creation hot paths.
Resolves: ZSTAC-72079
Change-Id: I38b98a762deb436da31e4884da05d12b38b98a76
<fix>[ai]: add eval task sort columns for ZQL and SDK fields See merge request zstackio/zstack!9266
… N+1 SQL queries Reduces SQL count from ~52 to ~6 for VM queries and ~32 to ~5 for Host queries by enabling Hibernate batch fetching for EAGER collections. Resolves: ZSTAC-79217 Change-Id: I6bb2a852c1f3f3780f2bc6cf90ce330ae1b6d9fb
Resolves: ZSTAC-51417 Change-Id: I0e49827c4c9c07c1be5e7a03b8817a21d424d95d
refactor[core]: add hibernate.default_batch_fetch_size=50 to reduce N 1 SQL queries See merge request zstackio/zstack!9282
ZSTAC-80898 Change-Id: If279bcbcd53634346188ddc2f218dd81b2e0075e
fix(vm): handle NoState + Expunging safety net [ZSTAC-80898] See merge request zstackio/zstack!9275
1. synchronized access to errors/missed LinkedHashMap(accessOrder=true) to fix concurrent modification bug causing 7s latency spikes 2. cache miss in findSimilar to avoid repeated expensive regex scans 3. remove unrelated PRE_PUSH_CHECKER_TEST GlobalConfig to fix CI Resolves: ZSTAC-72079 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
<fix>[compute]: add quota check for VM CPU/memory upgrade to show correct error message (ZSTAC-51417) See merge request zstackio/zstack!9284
<fix>[utils]: fix StringSimilarity concurrency, distance filter, slow log (ZSTAC-72079) See merge request zstackio/zstack!9269
<fix>[ai]: add i18n for AI_MESSAGE_10003 ZSTAC-82084 See merge request zstackio/zstack!9256
Resolves: ZSTAC-79067 Change-Id: I5d788cfc99b7292d1078a88fee635bd83fb5b5f0
522ed42 to
931d8d9
Compare
…s to match new validation Resolves: ZSTAC-79067 Change-Id: I6f134853bf7e16834a3d38df34334ebafd589167
Resolves: ZSTAC-79067\n\nRemove overly strict sequential priority validation for security group rules API.
sync from gitlab !9173