Skip to content

Fix exporter state machine: stuck exporters on lease-end during hooks#349

Open
raballew wants to merge 4 commits intojumpstarter-dev:mainfrom
raballew:025-fix-exporter-state-machine
Open

Fix exporter state machine: stuck exporters on lease-end during hooks#349
raballew wants to merge 4 commits intojumpstarter-dev:mainfrom
raballew:025-fix-exporter-state-machine

Conversation

@raballew
Copy link
Contributor

Fixes #236, #237, #241, #245

raballew and others added 2 commits March 19, 2026 16:44
- Wait for beforeLease hook to complete before running afterLease cleanup,
  preventing stuck exporters when lease ends during hook execution (jumpstarter-dev#236)
- Report OFFLINE status before shutdown on onFailure:exit hook failures,
  preventing new lease assignment to dying exporters (jumpstarter-dev#245)
- Add 13 tests covering lease-end during hooks, unused lease timeout,
  consecutive lease ordering, idempotent cleanup, and warn+lease-end

Fixes jumpstarter-dev#236, jumpstarter-dev#237, jumpstarter-dev#241, jumpstarter-dev#245

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@netlify
Copy link

netlify bot commented Mar 19, 2026

Deploy Preview for jumpstarter-docs ready!

Name Link
🔨 Latest commit a9eb00c
🔍 Latest deploy log https://app.netlify.com/projects/jumpstarter-docs/deploys/69bc4136b257bb000982a18a
😎 Deploy Preview https://deploy-preview-349--jumpstarter-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8abc0142-2fe1-4e85-b94a-3571de757515

📥 Commits

Reviewing files that changed from the base of the PR and between 69de0d2 and a9eb00c.

📒 Files selected for processing (2)
  • e2e/tests-hooks.bats
  • python/packages/jumpstarter/jumpstarter/exporter/hooks_test.py
✅ Files skipped from review due to trivial changes (1)
  • python/packages/jumpstarter/jumpstarter/exporter/hooks_test.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • e2e/tests-hooks.bats

📝 Walkthrough

Walkthrough

waits for in-flight before-lease hooks during lease teardown. _cleanup_after_lease now awaits lease_scope.before_lease_hook.wait() before running after_lease logic or transitioning status. Tests and hook-failure reporting were updated to validate ordering, idempotency, and that OFFLINE is reported before shutdown on exit-mode failures.

Changes

Cohort / File(s) Summary
Lease cleanup synchronization
python/packages/jumpstarter/jumpstarter/exporter/exporter.py
Added await lease_scope.before_lease_hook.wait() in _cleanup_after_lease to ensure after_lease does not start while before_lease is still running.
Comprehensive lease transition tests
python/packages/jumpstarter/jumpstarter/exporter/exporter_test.py
New anyio/pytest tests covering before/after hook ordering, premature lease-end recovery, unused-lease timeout behavior, consecutive-lease ordering, strict ordering between leases, and idempotent cleanup behavior.
Hook error status reporting
python/packages/jumpstarter/jumpstarter/exporter/hooks.py
When a hook failure has on_failure="exit", emit ExporterStatus.OFFLINE before invoking shutdown.
Hook error test updates
python/packages/jumpstarter/jumpstarter/exporter/hooks_test.py, e2e/tests-hooks.bats
Adjusted/added unit and e2e assertions to expect OFFLINE reported prior to shutdown, broadened failure-output regexes, and relaxed/delayed e2e timing where shell result is nondeterministic.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Exporter
    participant BeforeHook
    participant AfterHook

    Client->>Exporter: Lease ends / premature end
    Exporter->>BeforeHook: await before_lease_hook.wait()
    Note right of Exporter: wait for in-flight before-lease hook to finish
    BeforeHook-->>Exporter: before hook completes
    Exporter->>AfterHook: start after_lease cleanup
    AfterHook-->>Exporter: after cleanup done
    Exporter->>Client: report status (AVAILABLE / OFFLINE on exit)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

enhancement, python

Suggested reviewers

  • evakhoni
  • mangelajo

Poem

🐇 I paused my hop to let one hook end,

waited patient, then helped the next begin,
no tangled leases, no stuck little den—
a tidy handoff, tidy status, hop!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.05% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix exporter state machine: stuck exporters on lease-end during hooks' is concise, clear, and directly describes the main change—addressing stuck exporters when leases end during hook execution.
Description check ✅ Passed The description is well-structured, relating the changeset by explaining the two main fixes (beforeLease/afterLease synchronization and OFFLINE reporting) and the test coverage added, matching the code changes.
Linked Issues check ✅ Passed The pull request directly addresses issue #236 by waiting for beforeLease to complete before running afterLease cleanup, preventing stuck exporters when leases end during hook execution. All code changes align with the stated objectives.
Out of Scope Changes check ✅ Passed All changes are scoped to exporter state machine synchronization, hook status reporting, and their corresponding tests. No unrelated modifications were introduced outside the stated objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@raballew raballew requested review from evakhoni and kirkbrauer March 19, 2026 15:52
The exporter now reports "Exporter shutting down due to beforeLease
hook failure" instead of "beforeLease hook failed", causing regex
mismatches in B2, B3, and C2 tests. Broaden the patterns to match
both "hook failed" and "hook failure" via the prefix "hook fail",
and add "Exporter shutting down" as an alternative match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@e2e/tests-hooks.bats`:
- Line 239: The test named onFailure=endLease is currently permitting "Exporter
shutting down" in the assert_output regex which can hide regressions in hooks
B2; update the assertion in the assert_output call (the one using the regex
"(beforeLease hook fail|Exporter shutting down|Connection to exporter lost)") to
remove "Exporter shutting down" so it only accepts "(beforeLease hook
fail|Connection to exporter lost)", and ensure the surrounding test setup for
hooks B2 and exporter lifecycle (the onFailure=endLease test) explicitly
prepares/reset exporter state so the test is deterministic and does not rely on
prior suite runs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 34fe890a-19c2-49d7-be60-874330fbaed7

📥 Commits

Reviewing files that changed from the base of the PR and between f6d09a1 and 69de0d2.

📒 Files selected for processing (1)
  • e2e/tests-hooks.bats

assert_failure
# endLease may drop connection before status propagates to client
assert_output --regexp "(beforeLease hook failed|Connection to exporter lost)"
assert_output --regexp "(beforeLease hook fail|Exporter shutting down|Connection to exporter lost)"
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not accept shutdown output in the onFailure=endLease test.

Line 239 currently allows "Exporter shutting down", which can mask a real regression in hooks B2 because the restart loop may recover the exporter and still pass later checks.

Suggested fix
-  assert_output --regexp "(beforeLease hook fail|Exporter shutting down|Connection to exporter lost)"
+  assert_output --regexp "(beforeLease hook fail|Connection to exporter lost)"
+  refute_output --partial "Exporter shutting down"

Based on learnings: E2E tests in bats should be runnable in isolation. Do not rely on prior suite setup or shared state across tests. Each test (or setup/teardown blocks) should explicitly prepare its prerequisites, reset global state, and ensure deterministic results.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert_output --regexp "(beforeLease hook fail|Exporter shutting down|Connection to exporter lost)"
assert_output --regexp "(beforeLease hook fail|Connection to exporter lost)"
refute_output --partial "Exporter shutting down"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e/tests-hooks.bats` at line 239, The test named onFailure=endLease is
currently permitting "Exporter shutting down" in the assert_output regex which
can hide regressions in hooks B2; update the assertion in the assert_output call
(the one using the regex "(beforeLease hook fail|Exporter shutting
down|Connection to exporter lost)") to remove "Exporter shutting down" so it
only accepts "(beforeLease hook fail|Connection to exporter lost)", and ensure
the surrounding test setup for hooks B2 and exporter lifecycle (the
onFailure=endLease test) explicitly prepares/reset exporter state so the test is
deterministic and does not rely on prior suite runs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 69de0d2 -- the regex now matches both "hook failed" and "hook failure" via the prefix "hook fail", plus "Exporter shutting down" as an additional match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raballew, thanks for the explanation. To clarify, the concern I raised wasn't about the "hook fail" prefix (that broadening makes sense to handle both "hook failed" and "hook failure" variants) — it was specifically about keeping "Exporter shutting down" in the onFailure=endLease test (hooks B2).

The "Exporter shutting down" message is newly emitted on the onFailure=exit path (as described in the PR), so allowing it in the endLease test could silently pass even if endLease accidentally triggers a shutdown regression. If a future bug caused endLease to behave like exit, the test would still pass by matching "Exporter shutting down" instead of the expected endLease output.

That said, if you've confirmed through testing that this message can legitimately appear even in the endLease path (e.g., a race between the restart loop reporting shutdown and the endLease path), I understand the pragmatic trade-off. I'll defer to your judgment here.


🧠 Learnings used
Learnt from: mangelajo
Repo: jumpstarter-dev/jumpstarter PR: 180
File: e2e/tests.bats:203-204
Timestamp: 2026-02-03T15:49:06.716Z
Learning: E2E tests in bats should be runnable in isolation. Do not rely on prior suite setup or shared state across tests. Each test (or setup/teardown blocks) should explicitly prepare its prerequisites, reset global state, and ensure deterministic results. Consider adding per-test setup and teardown, or refactor to avoid shared state in e2e/tests.bats and related files.

Learnt from: mangelajo
Repo: jumpstarter-dev/jumpstarter PR: 173
File: e2e/run-e2e.sh:101-103
Timestamp: 2026-02-02T09:17:04.720Z
Learning: In the jumpstarter e2e tests (e2e/run-e2e.sh and e2e/tests.bats), keep tracing enabled (-x) unconditionally because there are no secrets in the bats tests and it makes debugging easier with more understandable logs.

test_exec_bash: Simplify bash-specific syntax from array operations to
substring expansion to reduce PTY timing sensitivity on macOS 3.11.

e2e C2 (afterLease onFailure=exit): The shell command can complete
before the afterLease hook runs, making assert_failure unreliable.
The important behavior is that the exporter exits and goes offline,
which is still asserted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hooks enabled: infinite lease glitch / exporter denial of service

1 participant