babs-status: replace pandas with dataclasses by asmacdo · Pull Request #359 · PennLINC/babs

asmacdo · 2026-04-06T15:59:32Z

claude-assisted design doc in this PR:

design/status-without-pandas.md for full context and planning.
no changes to (see more in design doc on scope)
- CLI (ie, does not add new --wait feature from Add --wait flag to babs status #354)
- model (columns) of job_status.csv. TODO: how badly we want to keep as is?
- pandas is still in requirements, as used for other components
changed:
- internal interfaces
- fixes the ghost-job bug where a completed job with results was counted as failed when squeue returned empty, caused by pandas dtype issues with bitwise NOT on object-dtype booleans.
- no longer allows job id to be optional for squeue (it was always used before anyway, and returning all user submitted jobs could cause problems

Similar previous effort

Replace pandas stuff with simple lists of python objects [WIP] #302

Drop DataFrames from the babs status pipeline. Job state is now represented by a SchedulerState enum and a JobStatus dataclass, with is_failed and submitted as derived properties instead of stored values.

TODOs:

make sure CI is green
make sure works "for me"
seek feedback as for plausibility of this development
establish the TODOs if agreed to proceed to finalize.

babs/status.py

yarikoptic · 2026-04-06T16:29:57Z

design/status-without-pandas.md

+sources every run. The CSV persists two bits of state destroyed by later
+operations: `submitted` (scheduler forgets completed jobs) and `has_results`
+(branches deleted after merge). Everything else is convenience artifact.
+Is this the right contract, or should we rethink?


per discussion the question boils down on either it is to constitute some form of "interface" and thus "interface data model" which we should not change.

design/status-without-pandas.md

codecov-commenter · 2026-04-09T15:24:47Z

Codecov Report

❌ Patch coverage is 88.15166% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.51%. Comparing base (25278c6) to head (acaa875).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
babs/base.py	65.21%	6 Missing and 2 partials ⚠️
babs/update.py	56.25%	5 Missing and 2 partials ⚠️
babs/scheduler.py	71.42%	3 Missing and 3 partials ⚠️
babs/status.py	97.20%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #359      +/-   ##
==========================================
+ Coverage   79.36%   79.51%   +0.15%     
==========================================
  Files          16       17       +1     
  Lines        1846     1953     +107     
  Branches      311      330      +19     
==========================================
+ Hits         1465     1553      +88     
- Misses        267      279      +12     
- Partials      114      121       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Drop pandas from the status path. DataFrames become dataclasses + dicts, csv module replaces pandas I/O, magic strings become enums, and is_failed is derived in one place (a property) instead of three. Key changes: - New babs/status.py: JobStatus dataclass, SchedulerState enum, CSV read/write, update_from_branches, update_from_scheduler - run_squeue returns raw text; parsing moves to status.py - run_squeue job_id is now required (prevents accidental full-queue queries and crashes from non-array jobs) - Strict validation of squeue array job format (jobid_taskid) - Uses dataclasses.replace to reduce boilerplate in update functions Bug fix: completed job with results no longer counted as failed when squeue is empty (the ghost-job dtype bug). CSV format is unchanged — no migration needed for existing projects. See design/status-without-pandas.md for full rationale and data flow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test_babs_init_raw_bids[subject] often exceeded the default 300s timeout in CI. Increase to 450s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

asmacdo · 2026-04-09T20:09:43Z

One additional manual test we did: On a dataset with 2 subjects, we initially did a babs submit --count 1, and ran babs status. After that, we ran babs submit --count 1 --skip-running-jobs, then babs status again. This has been buggy before but worked with this PR, both jobs showed up as "running", and later "finished"

final csv state:

sub_id,submitted,is_failed,state,time_used,time_limit,nodes,cpus,partition,name,job_id,task_id,has_results
sub-02,True,False,,1:59,1-00:00:00,1,4,standard,bid,8055585,1,True
sub-13,True,False,,0:12,1-00:00:00,1,4,standard,bid,8055616,1,True

tien-tong · 2026-04-10T16:41:19Z

Issues raised by Codex review

1) High: Removing all inclusion rows can leave stale job_status.csv on disk

BABSUpdate now writes via write_job_status_csv(), but that writer is a no-op for empty dicts.
If an update removes all tracked jobs, the in-memory statuses becomes empty and the old CSV is not overwritten/truncated, so removed jobs remain in job_status.csv.

wrong status accounting after inclusion shrink-to-empty
stale jobs can be re-evaluated as submitted/failed later

2) Medium: ses_id truthiness checks can crash or mis-key on NA values

Several places use if ses_id to decide whether key is subject-level or session-level.
With pd.NA, this raises TypeError; with NaN, it is truthy and creates keys like (sub_id, nan) that never match existing (sub_id, 'ses-*').

base.py (Lines 410–414)

ses_id = row.get('ses_id') if 'ses_id' in merged_zip_df.columns else None
key = (row['sub_id'], ses_id) if ses_id else (row['sub_id'],)
if key in statuses:
    statuses[key] = replace(statuses[key], has_results=True)

update.py (Lines 118-127)

ses_id = row.get('ses_id') if 'ses_id' in removed_rows.columns else None
key = (row['sub_id'], ses_id) if ses_id else (row['sub_id'],)
# ...
ses_id = row.get('ses_id') if 'ses_id' in added_rows.columns else None
sub_id = row['sub_id']
key = (sub_id, ses_id) if ses_id else (sub_id,)

3) Medium: New strict scheduler-state parsing can break on valid non-covered Slurm states

SchedulerState.from_slurm_state only handles PD, R, CG, and CF, but SLURM squeue can report many other compact state codes — S (SUSPENDED), ST (STOPPED), CA (CANCELLED), F (FAILED), TO (TIMEOUT), OOM, RQ (REQUEUED), and others. When update_from_scheduler encounters any of these in live squeue output, it raises ValueError, crashing the entire babs status command. The old code tolerated arbitrary state strings without crashing.

status.py (Lines 27-33)

@classmethod
def from_slurm_state(cls, state_str: str) -> 'SchedulerState':
    """Convert a SLURM squeue state string to a SchedulerState."""
    for member in cls:
        if member.value == state_str:
            return member
    raise ValueError(f'Unknown SLURM scheduler state: {state_str!r}')

status.py (Lines 299-304)

if squeue_info is not None:
    updated[key] = replace(
        job,
        scheduler_state=SchedulerState.from_slurm_state(squeue_info['state']),

Backward-compat angle:

old path tolerated arbitrary state strings (even if imperfectly classified)
new path hard-fails for unrecognized active states

4) Low: Internal helper API compatibility changed

report_job_status signature changed from DataFrame-based inputs to dict-based status model:

scheduler.py (Lines 311-319)

def report_job_status(statuses, analysis_path):
    """
    Print a report that summarizes the overall status of a BABS project.
    Parameters
    ----------
    statuses : dict[tuple, JobStatus]

CLI behavior is updated, but downstream code importing babs.scheduler.report_job_status directly may break.

asmacdo force-pushed the babs-status-no-pd branch from 563f239 to d27680b Compare April 6, 2026 16:03

yarikoptic reviewed Apr 6, 2026

View reviewed changes

babs/status.py Outdated Show resolved Hide resolved

yarikoptic reviewed Apr 6, 2026

View reviewed changes

design/status-without-pandas.md Show resolved Hide resolved

asmacdo mentioned this pull request Apr 6, 2026

Add --wait flag to babs status #354

Draft

asmacdo force-pushed the babs-status-no-pd branch 2 times, most recently from 4714f37 to 9ee19ab Compare April 9, 2026 15:15

asmacdo changed the title ~~[WIP] Replace pandas with dataclasses in babs status~~ babs-status: replace pandas with dataclasses Apr 9, 2026

asmacdo force-pushed the babs-status-no-pd branch from 9ee19ab to 4f9e308 Compare April 9, 2026 16:43

Bump timeout for flaky e2e test

acaa875

test_babs_init_raw_bids[subject] often exceeded the default 300s timeout in CI. Increase to 450s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

asmacdo marked this pull request as ready for review April 9, 2026 18:38

asmacdo mentioned this pull request Apr 9, 2026

[WIP] Optional zipping #364

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

babs-status: replace pandas with dataclasses#359

babs-status: replace pandas with dataclasses#359
asmacdo wants to merge 2 commits intoPennLINC:mainfrom
asmacdo:babs-status-no-pd

asmacdo commented Apr 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

yarikoptic Apr 6, 2026

Uh oh!

Uh oh!

codecov-commenter commented Apr 9, 2026 •

edited

Loading

Uh oh!

asmacdo commented Apr 9, 2026 •

edited

Loading

Uh oh!

tien-tong commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

asmacdo commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

yarikoptic Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov-commenter commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

asmacdo commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tien-tong commented Apr 10, 2026

1) High: Removing all inclusion rows can leave stale job_status.csv on disk

2) Medium: ses_id truthiness checks can crash or mis-key on NA values

3) Medium: New strict scheduler-state parsing can break on valid non-covered Slurm states

4) Low: Internal helper API compatibility changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

asmacdo commented Apr 6, 2026 •

edited

Loading

codecov-commenter commented Apr 9, 2026 •

edited

Loading

asmacdo commented Apr 9, 2026 •

edited

Loading