Skip to content

docs: add lifecycle script exit code reliability guide#81

Open
nayutah wants to merge 1 commit intomainfrom
docs/lifecycle-script-exit-code-reliability
Open

docs: add lifecycle script exit code reliability guide#81
nayutah wants to merge 1 commit intomainfrom
docs/lifecycle-script-exit-code-reliability

Conversation

@nayutah
Copy link
Copy Markdown

@nayutah nayutah commented May 6, 2026

Summary

  • Adds addon-lifecycle-script-exit-code-reliability-guide.md — a cross-engine guide on writing and verifying KubeBlocks lifecycle action scripts
  • Covers the exit-0 silent-failure root cause, three mandatory test paths (happy path / forced failure / runtime readback), kbagent 60s clamp, retryPolicy budget, and a pre-merge 7-item checklist
  • Appendix A: Oracle W8b sqlplus case (contributed by @sophia); Appendix B: redis-cli exit-0 pitfalls
  • Adds one entry to docs/SKILL-INDEX.md Section 1 (设计/开发新 addon)

Test plan

Adds addon-lifecycle-script-exit-code-reliability-guide.md — a
cross-engine reference for writing and reviewing KubeBlocks lifecycle
action scripts. Covers:

- The exit-0 silent-failure root cause and why KubeBlocks cannot detect it
- Three mandatory verification paths: happy path, forced failure, runtime readback
- kbagent 60s hard clamp and retryPolicy budget calculation
- Pre-merge 7-item checklist

Appendix A: Oracle W8b sqlplus WHENEVER SQLERROR / CDB open-mode wait
case (contributed by Sophia).
Appendix B: redis-cli exit-0 pitfalls and output-inspection pattern.

Adds one entry to SKILL-INDEX.md Section 1 (设计/开发新 addon).
Copy link
Copy Markdown
Contributor

@weicao weicao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issue before merge:

  • The Oracle forced-failure example says Expected: exit 1476. Shell/process exit statuses are not a reliable place to preserve the full Oracle error number; values are constrained/truncated by the OS/shell. The test should assert only exit != 0 and verify the output contains ORA-01476, not compare the exact numeric exit code.

Please change that expected line to something like: Expected: non-zero exit; output contains ORA-01476. The rest of the structure looks aligned: engine-neutral body, runtime readback retained, and SKILL-INDEX entry present.

@weicao
Copy link
Copy Markdown
Contributor

weicao commented May 6, 2026

Oracle Appendix A / forced-failure assertion patch proposal from Sophia, reviewed by James.

Allen's requested change is addressed in two places:

  1. Add the engine-neutral rule in §2.2: forced failure asserts rc != 0; exact business error codes are checked from stdout/stderr, not shell rc.
  2. Update Appendix A Oracle W8b: replace Expected: exit 1476 with non-zero exit; output contains ORA-01476.

James completed 8-class XP review and approved the final form. git diff --check passes locally.

diff --git a/docs/addon-lifecycle-script-exit-code-reliability-guide.md b/docs/addon-lifecycle-script-exit-code-reliability-guide.md
index 85e2fba..40d6b2e 100644
--- a/docs/addon-lifecycle-script-exit-code-reliability-guide.md
+++ b/docs/addon-lifecycle-script-exit-code-reliability-guide.md
@@ -82,6 +82,12 @@ This is the path most teams skip. Without it, you cannot know whether your error
 Criterion: exit_code != 0  (the script must propagate the failure)

+Rule: for forced failure, assert rc != 0 only. Check stdout/stderr for the specific error text or code. Do not expect shell exit code to preserve business error numbers; shell exit status is effectively 0-255, and CLIs may truncate or remap it.
+
+ +Criterion: exit_code != 0 AND output_contains_expected_engine_error == true +
+

2.3 Runtime Readback

After a successful action (exit 0), independently verify the side effect occurred:
@@ -217,14 +223,24 @@ Rather than staging a complex "DB not open" scenario, verify `WHENEVER SQLERROR
With fix (expected: non-zero exit — ORA-01476):

-kubectl exec -i -n $NAMESPACE $POD -c oracle -- bash -lc "
+if output=$(kubectl exec -i -n $NAMESPACE $POD -c oracle -- bash -lc "
sqlplus -S / as sysdba <<'SQL'
WHENEVER SQLERROR EXIT SQL.SQLCODE
SELECT 1/0 FROM dual;
EXIT;
-SQL"
-echo "Exit code: $?"
-# Expected: exit 1476 (ORA-01476: divisor is equal to zero)
+SQL" 2>&1)
+then
+  rc=0
+else
+  rc=$?
+fi
+printf '%s\n' "$output"
+echo "Exit code: $rc"
+
+# Expected: non-zero exit; output contains ORA-01476.
+# Do not assert rc == 1476: shell exit status is 8-bit, so SQLCODE is truncated.
+test "$rc" -ne 0
+printf '%s\n' "$output" | grep -q "ORA-01476"

Without fix (expected: exit 0 — old silent behavior):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants