Skip to content

Add --wait flag to babs status#354

Draft
asmacdo wants to merge 2 commits intoPennLINC:mainfrom
asmacdo:status-wait
Draft

Add --wait flag to babs status#354
asmacdo wants to merge 2 commits intoPennLINC:mainfrom
asmacdo:status-wait

Conversation

@asmacdo
Copy link
Copy Markdown
Collaborator

@asmacdo asmacdo commented Mar 25, 2026

Add --wait flag to babs status

Problem

babs status is a snapshot — it prints the current state and exits.
Automation needs to wait for jobs to finish before running babs merge, so end-to-end users must write wrapper scripts that poll squeue.
Workarounds exist, such as the walkthrough-tests.sh using an squeue loop, but users shouldn't have to implement that themselves, and that approach doesn't update job_status.csv.

Changes

Add --wait and --wait-interval flags to babs status:

# Poll until all submitted jobs complete or fail (default: check every 5 min)
babs status --wait

# Custom interval
babs status --wait --wait-interval 30

Behavior:

  • Prints job status summary each iteration (same as babs status)
  • Exits 0 when all submitted jobs have results
  • Exits 1 if any jobs failed, or if no jobs have been submitted
  • Exits 130 on Ctrl-C (clean handling)
  • Project-scoped: only watches jobs from this BABS project, not all user jobs

Test changes

  • 8 unit tests for babs_status_wait covering success, failure, mixed results, polling loop, no-submitted-jobs, report-each-iteration, and Ctrl-C handling
  • E2E walkthrough tests now use babs status --wait instead of manual squeue polling loops
  • E2E tests validate job_status.csv after babs merge — every submitted job must have has_results=True and is_failed=False

Housekeeping

  • Move pytest-timeout from conda environment files to pyproject.toml (separate commit)

TODO:

  • handle SIGINT from con-duct, ctrl+c isnt working
  • handle case where not all jobs are submitted, currently exits 1 if not all jobs submitted and finished

Test plan

  • Unit tests pass (pytest tests/test_interaction.py)
  • E2E walkthrough tests pass in container CI

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.73%. Comparing base (912ef8a) to head (cb9f079).

Files with missing lines Patch % Lines
babs/cli.py 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #354      +/-   ##
==========================================
+ Coverage   79.55%   79.73%   +0.18%     
==========================================
  Files          17       17              
  Lines        1956     1984      +28     
  Branches      331      335       +4     
==========================================
+ Hits         1556     1582      +26     
- Misses        279      280       +1     
- Partials      121      122       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@asmacdo asmacdo force-pushed the status-wait branch 2 times, most recently from 5cb9844 to 230527c Compare April 2, 2026 02:22
@asmacdo
Copy link
Copy Markdown
Collaborator Author

asmacdo commented Apr 2, 2026

I've successfully used this on 2 jobs that passed. A third job failed, and this change caused a problem. I fixed a babs-status bug (babs status didn't update jobs that were stuck in R or PD). My failing job took too long to fail to try again, but I added an integration test in for that case.

If CI is green, I think this one's ready. found another bug :(
fixed by rebase on 359

@asmacdo asmacdo changed the title WIP: Add --wait flag to babs status Add --wait flag to babs status Apr 2, 2026
@asmacdo
Copy link
Copy Markdown
Collaborator Author

asmacdo commented Apr 2, 2026

FAILED tests/test_babs_workflow.py::test_babs_init_raw_bids[session] - Failed: Timeout >300.0s

But that one seems flakey, seen it time out a handful of times before.

@asmacdo asmacdo marked this pull request as ready for review April 2, 2026 02:34
@asmacdo asmacdo marked this pull request as draft April 2, 2026 16:04
@asmacdo
Copy link
Copy Markdown
Collaborator Author

asmacdo commented Apr 2, 2026

New issue: succeeded jobs now marked as succeeded and failed. fixed by rebase on 359

@asmacdo asmacdo force-pushed the status-wait branch 4 times, most recently from 14092b8 to 7b01181 Compare April 13, 2026 17:17
asmacdo and others added 2 commits April 13, 2026 17:34
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Polls until all submitted jobs complete or fail, then exits.
Exit code 0 if all succeeded, 1 if any failed or none submitted,
130 on Ctrl-C.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@asmacdo asmacdo marked this pull request as ready for review April 13, 2026 19:26
@asmacdo
Copy link
Copy Markdown
Collaborator Author

asmacdo commented Apr 13, 2026

After rebasing on the modified babs status rewrite, I ran this successfully with my demo dataset, I think we are ready!

@asmacdo asmacdo requested a review from tien-tong April 13, 2026 19:27
@asmacdo asmacdo marked this pull request as draft April 13, 2026 21:15
@asmacdo
Copy link
Copy Markdown
Collaborator Author

asmacdo commented Apr 13, 2026

hit 2 smallish issues, updated description with TODO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants