diff --git a/docs/SKILL-INDEX.md b/docs/SKILL-INDEX.md index 0007bf4..17aa4f8 100644 --- a/docs/SKILL-INDEX.md +++ b/docs/SKILL-INDEX.md @@ -53,6 +53,7 @@ - [`addon-k3d-backup-restore-prereqs-guide.md`](addon-k3d-backup-restore-prereqs-guide.md) — k3d 跑 Backup/Restore 的两层前置:装 VolumeSnapshot CRD + 建默认 BackupRepo - [`addon-idc-image-registry-mirror-guide.md`](addon-idc-image-registry-mirror-guide.md) — IDC vcluster 镜像供给的三档决策(mirror 主路径 / ACR 直拉 / sideload 兜底)+ chart audit 反 ghcr.io / docker.io 主依赖 + vcluster 内 chaos-mesh 双层 syncer 注入;含 6 line × 3 IDC 行 ground-truth cross-engine reuse 表 - [`addon-multi-ns-registry-scan-preflight-guide.md`](addon-multi-ns-registry-scan-preflight-guide.md) — *(also relevant in: 写新 smoke / chaos 测试)* 多 ns / 多 topology 测试启动前 audit live `ComponentVersion` + `ParametersDefinition.toolsSetup.toolConfigs[].image` 的 image source;verified scope vs scan-only future-gate 二分 +- [`addon-runtime-contract-preflight-guide.md`](addon-runtime-contract-preflight-guide.md) — chart spec 与 runtime 实际契约的 3 个对齐 surface preflight:Layer 1 (chart spec 字段被 KB CRD schema reject, install-time fail) / Layer 2 (vcluster substrate bootstrap precondition, CoreDNS image pull fail 但 control plane 报 Running) / Layer 5 (chart spec 没 declare addon 自身 runtime env, ActionSet `spec.env` declare 与 backup script 引用差集 → silent-empty + cryptic surface error 如 ORA-12154)。Non-contiguous Layer 编号(0/3/4 reserved)+ 4-step preflight + 4-pillar 跨引擎表 + Layer 1/2/5 mermaid 决策树 + Archetype "chart-spec-doesnt-declare-runtime-requirement"。Oracle line W7 grounded (idc4 19c standalone 553MB 2m32s + 12c standalone 628MB 1m58s 双 minor-version confirm) + MariaDB line idc bastion view (PR #54 §1 §5 mirror family + 11:16-12:55 syncer 实战) cross-line co-author。与 [`addon-kb-schema-version-preflight-guide.md`](addon-kb-schema-version-preflight-guide.md)(schema dimension)+ [`addon-test-script-preflight-guide.md`](addon-test-script-preflight-guide.md)(shared client state dimension)+ [`addon-vcluster-kb-install-preflight-guide.md`](addon-vcluster-kb-install-preflight-guide.md)(bootstrap harness dimension)+ [`addon-multi-ns-registry-scan-preflight-guide.md`](addon-multi-ns-registry-scan-preflight-guide.md)(测试 scope dimension)共同构成 preflight family 5 doc,scope 严格不重叠 - [`addon-idc-vcluster-migration-checklist-guide.md`](addon-idc-vcluster-migration-checklist-guide.md) — addon 测试环境从本地 k3d / 单租户 idc 迁到 IDC 共享 k8s + 每线 vcluster 时的迁移 checklist:per-cluster baseline 维度、4 条架构 invariant(runner-on-host / vcluster API 内通主路 NodePort 兜底 / 控制器全部 in-vcluster / 显式 KUBECONFIG 锁)、13 条已观察反模式(NodePort SAN / HTTPS_PROXY / libc / 跨 arch / default-class 不 sync / data-protection 缺装 / VolumeSnapshot CRD 缺 / chart 全局 nodeSelector 经 CM_NODE_SELECTOR 泄漏 / IDC LB 无 EXTERNAL-IP / pod IP cap 探测 / chaos 跑赢 CNI 回收 / helm hook 镜像不存在 / 缺装组件枚举),+ "环境 fault before engine fault" 5 项 cross-cutting checklist - [`addon-vanilla-vcluster-bootstrap-guide.md`](addon-vanilla-vcluster-bootstrap-guide.md) — 在 IDC host k8s 上创建 vanilla(OSS 0.19.x,非 Loft 企业版)vcluster 的完整 bootstrap 步骤:default StorageClass / node 资源 / ghcr.io mirror / kubeconfig 隔离 4 项前置 + helm install + kubeconfig (port-forward vs NodePort) + 验证 + 8 个跨线 Blocker 汇总 @@ -128,6 +129,7 @@ - [`docs/addon-ship-readiness-multi-phase-validation-guide.md`](addon-ship-readiness-multi-phase-validation-guide.md) — addon 何时算可以 ship 的三段矩阵(baseline / chaos × N / regression × N),累积 N、Wilson 95% CI、ship 阈值表(数据丢失 0% / 服务不可用 5% / transient 30%)、二段判定(产品 fail = 0 + caveat 全 document)+ 5 个常见误判 - [`docs/addon-github-submission-discipline-guide.md`](addon-github-submission-discipline-guide.md) — 多 agent 协作 + GitHub 公开仓库的边界纪律:(1) AI provenance trailer(`Co-Authored-By: Claude` / `🤖 Generated with` / `noreply@anthropic.com`)不外漏的硬规则与兜底命令链(heredoc + `git commit --amend -m "$(... | sed)"` strip / push 前 grep 自检);(2) 多 agent 并发推同一 PR branch 的 cascade 事故响应 playbook(force-with-lease lemma:lease 锚 last-fetched remote tip 不防 fetch 后并发 push / 双向 `git log --oneline` ritual / dropped-commit owner self-recover / single-owner-execute 收口)。5 条 doctrine(A force-with-lease / B per-commit grep / C cascade single-owner-execute / D forensic 自查 / E content-delta verify)+ §5 cross-cutting rules(forensic self-review / Doctrine E shorthand / evidence-post obligation / 递归 self-application) - [`docs/addon-soak-test-result-classification-guide.md`](addon-soak-test-result-classification-guide.md) — 长跑型测试(24h+ soak / chaos / fault-injection)出结果之后的结果分类方法论。**核心 framing**:fault total / PASS-FAIL 计数无法回答"是否 ACCEPTED",必须把每条 fault 注入按"哪一层先失败"落到 4-state schema:(1) `invariant-break`(不变量破坏 → ROLLBACK)/ (2) `product-path-failure`(产品恢复路径失败)/ (3) `harness-race`(测试工具时序竞争)/ (4) `external-environmental-cascade`(外部环境级联)。**判据**:Q1(bad_ack > 0)→ Q2(cluster 终态)→ Q3(OpsRequest Failed + N≥2 自验证)→ Q4(duration 超 mean+3σ + 外部事件关联 + 对照样本)→ product-pass(mermaid 流程图)。**N≥2 自验证最小证据门槛**:harness-race 需同类 Succeed 对照、cascade 需 baseline ±1σ 对照样本;单 sample 不能下"非产品"结论。**ACCEPTED 判据**:`invariant-break = 0 AND product-path-failure = 0`,harness-race / cascade 不阻塞但触发对应修复 ticket。grounded N=3 CH30 harness-race 对照(fault-026 vs 029/033)+ N=2 CH20 cascade negative-control(fault-028 21min outlier vs 031 95s 1σ 内)+ AG quorum non-sticky 3-transition 行为附注。与 [`addon-test-acceptance-and-first-blocker-guide.md`](addon-test-acceptance-and-first-blocker-guide.md) 单次 fail first-blocker 分层方法论形成互补对子(前者聚合维度,后者单次维度) +- [`docs/addon-runtime-contract-preflight-guide.md`](addon-runtime-contract-preflight-guide.md) — chart install schema 全过 + smoke 全 PASS 但 runtime cryptic 报错的隐藏失败模式预检方法论。聚焦三个对齐 surface:(1) Layer 0/1 chart spec 字段 → KB CRD schema 兼容性(image / chart / CRD 三层);(2) Layer 2 vcluster substrate bootstrap precondition(外网拉镜像不通时的离线方案);(3) Layer 5 ActionSet env contract drift(chart 没 declare runtime env,runtime 拿到 empty → 下游 cryptic 报错)。包含 4-step preflight 流程 + 跨引擎 4-pillar 口径同步表 + 决策树 + 4 case appendix(Oracle W7 ActionSet env / vcluster CoreDNS ImagePullBackOff / chart 跟 KB main isExclusive / MariaDB mirror family)。与 [`addon-kb-schema-version-preflight-guide.md`](addon-kb-schema-version-preflight-guide.md) 是 schema dimension vs runtime contract dimension 正交对子;与 [`addon-chart-vs-kb-schema-skew-diagnosis-guide.md`](addon-chart-vs-kb-schema-skew-diagnosis-guide.md) 是 preflight (本文) vs diagnosis (后者) lifecycle 互补;与 [`addon-soak-test-result-classification-guide.md`](addon-soak-test-result-classification-guide.md) 是 lifecycle preflight before vs classification after 互补 ## 案例材料 diff --git a/docs/addon-runtime-contract-preflight-guide.md b/docs/addon-runtime-contract-preflight-guide.md new file mode 100644 index 0000000..fb47999 --- /dev/null +++ b/docs/addon-runtime-contract-preflight-guide.md @@ -0,0 +1,709 @@ +# Addon Runtime Contract Preflight 指南 + +> **Audience**: addon dev / addon test / cross-line TL +> **Status**: draft (v1) +> **Applies to**: any KB addon +> **Applies to KB version**: any (preflight 方法论本身版本无关;具体 ActionSet env 字段集随 KB 版本演化) +> **Affected by version skew**: yes — chart spec 与 runtime 实际契约的对齐随 KB 版本 / addon 版本 / vcluster substrate 版本三向漂移;本文方法论 stable,具体 declare 字段 / image 路径需对照实测版本 + +本文面向 Addon 开发与测试工程师,重点解决一类隐藏度极高的失败:**chart install schema 验证全部通过、smoke 测试全 PASS,但 runtime 跑某类操作时 cryptic 报错**。本质是 **chart spec 与 runtime 实际契约漂移**:spec 里没声明的运行时依赖,实际跑起来 silent 失败或被 cryptic 错误掩盖。 + +## 先用白话理解这篇文档 + +### 这篇文档解决什么问题 + +测试报"OpsRequest 失败"或"Backup Job 报 ORA-12154 / DNS resolve fail"时,团队第一反应通常是"DB / network 出问题了"。但这种归因经常错位——**同样的 surface error 可能源自完全独立的 root cause 层**: + +1. **Chart 没 declare runtime env**(KB dataprotection ActionSet `spec.env` 缺字段,runtime 引用得到空字符串) +2. **Substrate 没 bootstrap 好**(vcluster 默认 coredns 镜像拉不到,pod 内 DNS 不工作但 control plane 报 Running) +3. **真 DB / network 层问题**(少数情况) + +笼统说"DB / network bug"会让团队**在错误的层排障**:花 1 小时调 listener / TLS / 重启 pod,最后发现是 ActionSet spec 里少了一个 env 声明。 + +→ 真正的方法论是:**测试启动前显式 audit chart spec 与 runtime 契约的 3 个对齐 surface(chart spec / vcluster substrate / runtime env contract),把 cryptic 错误前置成可读的 preflight fail-fast**。 + +### 何时本文方法论 apply + +| 场景 | 关键决策 | +|---|---| +| 接 KB dataprotection 的 addon 写 Backup / Restore ActionSet | 必读 §6.2,audit `spec.env` declare 与 backup script 引用差集 | +| 在 vcluster 上跑 smoke / chaos / dataprotection | 必读 §6.3,preflight CoreDNS image + 跨 pod DNS resolution | +| chart `install` 报 `field not declared in schema` | 必读 §6.1,区分 chart 字段 / KB 版本字段 / KB main 三种漂移 | +| smoke 全 PASS 但 dataprotection / cross-pod 操作 fail | 走 §8 决策树,区分 Layer 1 / 2 / 5 | +| 上 IDC vcluster 之前 | 4-step preflight 走一遍(§4) | +| 同 surface error 多次出现,但根因 hopping | §7 archetype "chart spec doesn't declare a runtime requirement" | + +### 读完你能做什么决策 + +- **写新 addon 时**:能 5 秒列出 3 个 runtime contract 对齐 surface,避免 silent 漂移 +- **接 dataprotection 时**:能立刻 audit ActionSet spec.env vs backup script 引用的差集 +- **vcluster bootstrap 时**:能 preflight CoreDNS image 并选对 mirror(aliyuncs / dockerproxy.net / ACR / sideload) +- **看到 `ORA-12154 TNS:could not resolve` 这类 cryptic 错误时**:能用 §8 决策树 60 秒分到正确 layer,不再错误归 DB +- **review chart PR 时**:能识别"chart spec 看起来 OK 但 runtime 会 silent fail"的 contract gap 并 block + +### 为什么独立成篇 + +跟 [`addon-kb-schema-version-preflight-guide.md`](addon-kb-schema-version-preflight-guide.md)(KB schema 三层 image/chart/CRD 版本对齐)+ [`addon-test-script-preflight-guide.md`](addon-test-script-preflight-guide.md)(test runner 跨 line shared client state)+ [`addon-vcluster-kb-install-preflight-guide.md`](addon-vcluster-kb-install-preflight-guide.md)(vcluster 内 KB 安装 bootstrap)+ [`addon-multi-ns-registry-scan-preflight-guide.md`](addon-multi-ns-registry-scan-preflight-guide.md)(多 ns 测试 scope 拆分)一起构成 **preflight family**,覆盖 chaos test lifecycle 启动前的不同对齐 surface。 + +各 doc scope 严格独立: +- `kb-schema-version-preflight` = **schema dimension**(image / chart / CRD 三层 artifact 版本一致) +- 本文 = **runtime contract dimension**(chart spec 与 runtime 实际契约对齐) +- `test-script-preflight` = **shared client state dimension**(跨 line tenant kubeconfig) +- `vcluster-kb-install-preflight` = **bootstrap harness dimension**(KB install in vcluster 时序) +- `multi-ns-registry-scan-preflight` = **测试 scope dimension**(verified vs scan-only 二分) + +本文聚焦 **"chart spec 与 runtime 实际契约的 3 个对齐 surface"** 这一独立主题,不与 schema-version 重叠。 + +--- + +## 适用场景 + +当你负责或维护以下工作时,本文适用: + +- 写新 addon 的 ActionSet(Backup / Restore / Reconfigure / Switchover),需声明 `spec.env` +- 把 addon 上 IDC 共享 k8s + per-line vcluster +- chart `install` 报 `field not declared in schema` 时根因定位 +- 在 vcluster 上跑 dataprotection 测试 / cross-pod replication 测试 +- chaos test lifecycle preflight 全量审计 + +不适用(属于其他 doc): +- KB image / chart / CRD 三层 artifact 版本对齐 → [`addon-kb-schema-version-preflight-guide.md`](addon-kb-schema-version-preflight-guide.md) +- runner shared `~/.kube/config` 跨 line 干扰 → [`addon-test-script-preflight-guide.md`](addon-test-script-preflight-guide.md) +- helm install KB 在 vcluster 时序 → [`addon-vcluster-kb-install-preflight-guide.md`](addon-vcluster-kb-install-preflight-guide.md) +- 测试 fail 后 4-state 归类 → [`addon-soak-test-result-classification-guide.md`](addon-soak-test-result-classification-guide.md) +- chart `field not declared in schema` 局部 bug vs 代差 vs 跟 KB main → [`addon-chart-vs-kb-schema-skew-diagnosis-guide.md`](addon-chart-vs-kb-schema-skew-diagnosis-guide.md)(本文 §6.1 引用此 doc,不重复) + +## Runtime Contract 三 Layer 模型 + +`addon-kb-schema-version-preflight-guide.md` 的 schema dimension 用 image / chart / CRD 三层定义"哪个版本"。本文的 runtime contract dimension 用 **三 Layer 模型**定义"chart spec 与 runtime 实际契约的对齐 surface"。 + +> **编号说明**:Layer 编号采用 **non-contiguous numbering(1 / 2 / 5)**,**Layer 0 / 3 / 4 留作未来 reservation**。Schema dimension 的 image / chart / CRD 三层在本文映射为 Layer 0(image 是底层 artifact, schema-version doc 主管),Layer 3 / 4 留给"storage version migration / API removal" 这类未来 sediment 主题。当前实战 grounded sample 集中在 Layer 1 / 2 / 5。 + +| Layer | 维度 | 漂移现象 | 主管 doc | +|---|---|---|---| +| Layer 0 | Image artifact (reserved) | image build 不一致、digest 漂移 | [`addon-kb-schema-version-preflight-guide.md`](addon-kb-schema-version-preflight-guide.md) | +| **Layer 1** | **Chart spec 字段** | chart 模板字段被 KB CRD schema reject(install 阶段就 fail) | 本文 §6.1 + [`addon-chart-vs-kb-schema-skew-diagnosis-guide.md`](addon-chart-vs-kb-schema-skew-diagnosis-guide.md) | +| **Layer 2** | **vcluster substrate bootstrap precondition** | vcluster 默认 coredns / metrics-server / 关键基础组件 image 拉不到,control plane 报 Running 但 cluster-internal DNS 不工作 | 本文 §6.3 | +| Layer 3 | Storage version migration (reserved) | CRD storage version 升级中断、object 残留旧 version | (planned, future) | +| Layer 4 | Removed API surface (reserved) | 删除的 API 仍被脚本引用 | (planned, future) | +| **Layer 5** | **Runtime env contract** | chart spec 没 declare addon 自身 env,runtime 引用得到空字符串,下游 cryptic 错误(不是 install fail) | 本文 §6.2 | + +**Layer 1 vs Layer 5 的判别**: +- Layer 1 = **install-time schema reject**(helm install 直接 fail,message: `field not declared in schema`) +- Layer 5 = **runtime-time silent-empty**(install / smoke 全 PASS,dataprotection 时 cryptic error) + +**Layer 2 vs Layer 5 的判别**: +- Layer 2 = **substrate side**(vcluster 自己的 coredns / 基础组件没起来;fix 在 cluster bootstrap) +- Layer 5 = **chart spec side**(chart 模板缺字段;fix 在 chart 模板) +- 同 surface error(如 ORA-12154 TNS:could not resolve),不同根因,必须用 §8 决策树区分 + +## 4-step Preflight 流程 + +接 dataprotection 或上 vcluster 之前,按 4 步走一遍: + +### Step 1 — Layer 0/1: schema 三层版本对齐 + +走 [`addon-kb-schema-version-preflight-guide.md`](addon-kb-schema-version-preflight-guide.md) 的三层 audit(image / chart / live CRD)。本步通过后才进 Layer 1 chart spec 字段 audit: + +```bash +# Chart 模板字段是否被当前 KB CRD schema 接受? +helm template | kubectl apply --dry-run=server -f - 2>&1 | grep -E '(field not declared|unknown field|forbidden)' && echo "Layer 1 fail" || echo "Layer 1 pass" +``` + +### Step 2 — Layer 2: vcluster substrate bootstrap + +vcluster control plane Running 不等于 cluster 内部基础设施 Ready。必检: + +```bash +# CoreDNS 是否真的在跑(不仅 deployment 存在) +kubectl -n kube-system get pods -l k8s-app=kube-dns | grep -v Running | grep -v NAME && echo "Layer 2 fail" || echo "Layer 2 pass" + +# 任意业务 pod 内能否 resolve cluster-internal DNS +kubectl exec -- nslookup kubernetes.default.svc.cluster.local 2>&1 | grep -E '(NXDOMAIN|server can.t find)' && echo "Layer 2 fail" || echo "Layer 2 pass" +``` + +### Step 3 — Layer 5: ActionSet env contract audit + +每条 ActionSet(Backup / Restore / 等)audit `spec.env` declare 与 backup script 引用的差集: + +```bash +# 列 ActionSet declare 的 env name +kubectl get actionset -o jsonpath='{.spec.backup.backupData.env[*].name}' | tr ' ' '\n' | sort > /tmp/declared.txt + +# Grep backup script 引用的 env +grep -oE '\$\{[A-Z_]+\}' .sh | sed 's/[${}]//g' | sort -u > /tmp/referenced.txt + +# 差集即 Layer 5 风险 surface +comm -23 /tmp/referenced.txt /tmp/declared.txt +# 输出非空 = chart 没 declare 但 script 引用 = Layer 5 silent-empty 风险 +``` + +注意:DP_* 框架变量(`DP_DB_HOST` / `DP_DB_PORT` / `DP_DB_USER` / `DP_DB_PASSWORD`)由 KB dataprotection runner 自动注入,不需 ActionSet declare;但 addon 自身需要的 `ORACLE_SID` / `MYSQL_PORT` / `PGDATABASE` 等必须显式 declare。 + +### Step 3.5 — BackupPolicy 引用的 systemAccount 是否 CMPD declare(关闭 inter-version chart drift gap) + +W8 audit 实战发现 silent 漂移类型:addon chart 内不同 minor variant 的 CMPD declare 不一致。例如 oracle addon chart 的 `cmpd.yaml` (12c) declare `systemAccounts: kbdataprotection`,但 sibling `cmpd-19c.yaml` (19c) 没 declare 这个 systemAccount → BackupPolicy `target.account: kbdataprotection` 引用一个不存在的 systemAccount → KB 永远不会生成对应 secret → runtime silent fail(oracle-expdp worker pod `CreateContainerConfigError` on 19c,PASS on 12c,identical KB build)。 + +**Audit shape 关键澄清**: `DP_DB_USER` / `DP_DB_PASSWORD` **不在 ActionSet `spec.env` 里 declare**,由 KB **dataprotection Job runner** 基于 `BackupPolicy.spec.backupMethods[].target.account`(引用 systemAccount name)**自动注入**。所以 systemAccount → `DP_DB_*` 的契约边界在 **BackupPolicy layer**,不在 ActionSet layer — audit 必须从 BackupPolicy 起,不能只看 ActionSet。 + +**Audit check(cluster-side,install 后跑)**: + +```bash +# 列 CMPD declare 的 systemAccount name 集合 +kubectl get cmpd -o jsonpath='{.spec.systemAccounts[*].name}' \ + | tr ' ' '\n' | sort -u > /tmp/cmpd-accounts.txt + +# 列 BackupPolicy 各 method 引用的 systemAccount name (target.account) +# (BackupPolicy 是 cluster install 后 KB 从 BackupPolicyTemplate 渲染出来的实际对象) +kubectl get backuppolicy -o jsonpath='{.spec.backupMethods[*].target.account}' \ + | tr ' ' '\n' | sort -u > /tmp/bp-referenced-accounts.txt + +# 差集 = silent runtime fail surface(BackupPolicy 引用了 CMPD 没 declare 的 systemAccount) +comm -23 /tmp/bp-referenced-accounts.txt /tmp/cmpd-accounts.txt +# 非空 = runtime gap,KB 永远不会生成对应 secret,dataprotection Job 起不来 +# 注:如 ActionSet 改用 raw username/password 直接注入(不通过 KB DP runner secret),此 check 不适用 — 但这是少数情形,KB DP runner secret 注入是 default contract +``` + +**Cross-variant CMPD 字段 diff audit(chart-side,作 Pre-install scan)**: + +```bash +# 在 chart source dir — 包含 BackupPolicyTemplate 的反向校验 +diff <(yq '.spec.systemAccounts[].name' templates/cmpd.yaml | sort -u) \ + <(yq '.spec.systemAccounts[].name' templates/cmpd-19c.yaml | sort -u) +diff <(yq '.spec.vars[].valueFrom.credentialVarRef.name' templates/cmpd.yaml | sort -u) \ + <(yq '.spec.vars[].valueFrom.credentialVarRef.name' templates/cmpd-19c.yaml | sort -u) +diff <(yq '.spec.lifecycleActions | keys' templates/cmpd.yaml) \ + <(yq '.spec.lifecycleActions | keys' templates/cmpd-19c.yaml) + +# BackupPolicyTemplate ↔ CMPD 跨 CRD reference 校验(chart 源头就 catch W8 类漂移) +diff <(yq '.spec.systemAccounts[].name' templates/cmpd-19c.yaml | sort -u) \ + <(yq '.spec.backupMethods[].target.account' templates/backuppolicytemplate.yaml | sort -u) +# 后者 - 前者 非空 = chart 源头 broken(BackupPolicy 引用了 19c CMPD 没 declare 的 systemAccount) +``` + +W8 grounded 形态:`addons/oracle/templates/backuppolicytemplate.yaml` 第 33 行 `account: kbdataprotection`,但 `cmpd-19c.yaml` 不 declare `kbdataprotection`,所以 19c cluster install 后 KB DP runner 找不到 secret → `oracle-expdp` Job `CreateContainerConfigError`。本 step 的 audit 价值就在 catch 这种**跨 CRD reference 漂移**。 + +### Step 4 — 实跑 1 个 dataprotection / cross-pod 操作 round-trip + +前 3 步全 pass 后,必跑一次端到端验证:发起一个 Backup(最小数据集)、观察 pod 内 Backup script 的 env list 与连接是否成功。空跑 install 不算 preflight 通过。 + +```bash +# 创建最小 Backup +kubectl create -f - <--backup-policy + backupMethod: +YAML + +# 等到完成 / 失败 +kubectl get backup -w | grep preflight-backup +``` + +4 步全 pass 才是 ready-for-test 状态。任意一步 fail 都不能进 chaos / soak / production handoff。 + +## 跨引擎口径同步(4-pillar 表) + +任何 addon 接 dataprotection 时,对照下表填 4 pillar,确保 runtime contract 完整: + +| Pillar | 验证内容 | Layer | 命令模板 | +|---|---|---|---| +| **Spec declare** | ActionSet `spec.env` 是否 declare 引擎自身需要的 env | Layer 5 | `kubectl get actionset -o jsonpath='{.spec.backup.backupData.env[*].name}'` | +| **Script reference** | backup script 引用的 env 是否都在 Spec declare 集合中 | Layer 5 | `grep -oE '\$\{[A-Z_]+\}'