Skip to content

SDK-91: Refactor CLI into modular subcommands for eval, unlearning, and dataset-qa#227

Open
mishana wants to merge 13 commits into
mainfrom
claude/implement-sdk-commands-vXPAm
Open

SDK-91: Refactor CLI into modular subcommands for eval, unlearning, and dataset-qa#227
mishana wants to merge 13 commits into
mainfrom
claude/implement-sdk-commands-vXPAm

Conversation

@mishana
Copy link
Copy Markdown
Contributor

@mishana mishana commented May 5, 2026

User description

Summary

This PR refactors the CLI structure by extracting functionality into separate, modular subcommand modules. The main CLI now delegates to specialized apps for evaluation, unlearning, and dataset QA operations, improving code organization and maintainability.

Key Changes

  • New _cli_common.py module: Extracted shared CLI utilities including:

    • console object for Rich output
    • hirundo_epilog configuration
    • make_app() helper for creating consistent Typer apps
    • validate_enum() helper for enum validation with user-friendly error messages
  • New cli_eval.py module: Extracted evaluation-related commands:

    • eval run: Launch LLM behavior evaluation runs with preset configurations
    • eval list: List evaluation runs with optional filtering
    • eval check: Monitor evaluation run progress
  • New cli_unlearning.py module: Extracted unlearning-related commands:

    • unlearning run: Launch LLM unlearning runs (bias or hallucination types)
    • unlearning list: List unlearning runs with optional filtering
    • unlearning check: Monitor unlearning run progress
  • New cli_dataset_qa.py module: Extracted dataset QA commands:

    • dataset-qa run: Launch Dataset QA runs
    • dataset-qa list: List Dataset QA runs with optional filtering
    • dataset-qa check: Monitor Dataset QA run progress
  • Updated cli.py:

    • Removed duplicated CLI utilities (moved to _cli_common.py)
    • Registered new subcommand apps with the main Typer app
    • Simplified main CLI file to focus on core functionality

Implementation Details

  • All subcommand modules follow a consistent pattern using the make_app() helper
  • Enum validation is centralized in validate_enum() for consistent error handling
  • Rich console output is shared across all modules via _cli_common.console
  • Each subcommand module is self-contained with its own imports and logic
  • The main CLI acts as a router, delegating to specialized apps via add_typer()

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt


Note

Medium Risk
Moderate risk because it restructures the public CLI surface area by introducing new subcommands and centralized argument validation, which can break existing invocation patterns or expose mismatches in run field names.

Overview
Refactors the hirundo Typer CLI into modular sub-apps and registers new top-level subcommands: hirundo eval, hirundo unlearning, and hirundo dataset-qa.

Adds _cli_common.py to centralize shared CLI setup (console, hirundo_epilog, make_app) and introduces validate_enum() for consistent enum parsing and user-friendly failures.

Implements new run/list/check commands for evaluation, unlearning, and dataset QA, including stricter flag validation (mutual exclusivity/required options) and standardized Rich table output.

Reviewed by Cursor Bugbot for commit 7b375da. Configure here.


Generated description

Below is a concise technical summary of the changes proposed in this PR:
Refactor the CLI structure to route hirundo.cli_eval, hirundo.cli_unlearning, and hirundo.cli_dataset_qa through the main Typer app so each flow gets its own run/list/check commands. Share _cli_common helpers for console output, enum/run validation, and table rendering to keep every subcommand self-contained and consistent.

TopicDetails
Subcommand apps Register hirundo.cli_eval, hirundo.cli_unlearning, and hirundo.cli_dataset_qa Typer apps on the main CLI to route run/list/check flows for evaluation, unlearning, and dataset QA.
Modified files (7)
  • docs/hirundo.cli_dataset_qa.rst
  • docs/hirundo.cli_eval.rst
  • docs/hirundo.cli_unlearning.rst
  • hirundo/cli.py
  • hirundo/cli_dataset_qa.py
  • hirundo/cli_eval.py
  • hirundo/cli_unlearning.py
Latest Contributors(1)
UserCommitDate
noreply@anthropic.comFix CI: ruff, basedpyr...May 09, 2026
CLI utilities Share _cli_common helpers (make_app, validate_enum, validate_run_id, wait_or_notify, print_runs_table, console) to centralize epilog handling, run validation, and table output, and cover them with tests.
Modified files (2)
  • hirundo/_cli_common.py
  • tests/test_cli_common.py
Latest Contributors(1)
UserCommitDate
noreply@anthropic.comrefactor(cli): clean u...May 10, 2026
This pull request is reviewed by Baz. Review like a pro on (Baz).

claude added 2 commits April 8, 2026 00:30
Implement three new typer sub-apps following the skills.py pattern:

- `hirundo eval run/list/check` — launch/list/monitor LLM behavior eval runs
  via LlmBehaviorEval.launch_eval_run / check_run_by_id / list_runs
- `hirundo dataset-qa run/list/check` — launch/list/monitor Dataset QA runs
  via QADataset.launch_qa_run / check_run_by_id / list_runs
- `hirundo unlearning run/list/check` — launch/list/monitor LLM unlearning runs
  via LlmUnlearningRun.launch / check_run_by_id / list

Each `run` command accepts --wait/--no-wait (default: wait) to either stream
tqdm progress or immediately return the run ID for scripting.

Closes #220

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt
Extract repeated docs/epilog/console/Typer setup from the three
CLI sub-apps into hirundo/_cli_common.py:

- hirundo_epilog, docs, console — computed/instantiated once
- make_app() — factory for sub-app Typer instances
- validate_enum() — replaces copy-paste try/except enum coercion

Remove verbose module docstrings that duplicated typer --help output.
Remove local Console() instantiation from cli.py list_runs().

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt
@mishana mishana requested review from a team as code owners May 5, 2026 11:31
Comment thread hirundo/_cli_common.py Outdated
Comment on lines +26 to +30
def validate_enum(value: str, enum_cls, label: str):
try:
return enum_cls(value.upper())
except ValueError:
valid = ", ".join(e.value for e in enum_cls)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shared CLI helper still uses the 1-character variable e and the same short-name refactor shows up in hirundo/cli_eval.py and hirundo/cli_unlearning.py, conflicting with AGENTS.md naming rules—should we rename those loop variables to descriptive identifiers?

Finding type: AI Coding Guidelines | Severity: 🟢 Low


Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
hirundo/_cli_common.py around lines 26-30 within the `validate_enum` function, replace
the 1-character generator variable `e` used in `', '.join(e.value for e in enum_cls)`
with a descriptive name (e.g., `enum_member`/`option`) to comply with `AGENTS.md`’s
`Coding Style & Naming` rule. Then scan `hirundo/cli_eval.py` (`col`, `run`) and
`hirundo/cli_unlearning.py` (`col`, `run`) for the same 1–3 character loop variable
pattern in similar join/comprehension logic and rename those variables to descriptive
identifiers as well, keeping the output formatting and error message text identical.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 9a4feff addressed this comment by replacing the single-character generator variable in validate_enum with a descriptive identifier, aligning the shared helper with the AGENTS naming guidelines even though the CLI-specific files were untouched.

Comment thread hirundo/cli_dataset_qa.py Outdated
Comment on lines +24 to +28
run_id = QADataset.launch_qa_run(dataset_id)
console.print(f"Dataset QA run started. Run ID: [bold]{run_id}[/bold]")

if wait:
QADataset.check_run_by_id(run_id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataset_qa_run duplicates the launch/print/wait branching in eval_run and unlearning_run — should we extract a shared helper, and also capture the return value of QADataset.check_run_by_id(run_id) to surface results.cached_zip_path (matching the check-run output in cli.py:199-200)?

Finding types: Code Dedup and Conventions Breaking Changes | Severity: 🟢 Low


Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. 1. Extract the shared
wait/notification branching logic from `dataset_qa_run`, `eval_run`, and
`unlearning_run` into a centralized helper that accepts launch/monitor functions and a
command name. 2. In `dataset_qa_run` (lines 27-28), capture the return value of
`QADataset.check_run_by_id(run_id)` as `results` and print the legacy completion line
including `results.cached_zip_path`, matching the wording/format used by the existing
`check-run` output in cli.py:199-200. 3. Update `dataset_qa_check` (lines 63-72) to
print the same results-location line when streaming completes, so both commands provide
a stable contract for scripts/users.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 36663cf addressed this comment by introducing the shared wait_or_notify helper so dataset-qa, eval, and unlearning runs reuse the same wait/notify branching. dataset_qa_run now captures the check_run_by_id results and prints the cached_zip_path, and dataset_qa_check also emits the same results-location line for streaming completion, matching the check-run output contract.

Comment thread hirundo/cli_dataset_qa.py Outdated
Comment on lines +52 to +56
for run in runs:
table.add_row(
str(run.name),
str(run.run_id),
str(run.status),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table.add_row(...) iteration block in dataset_qa_list is copy/pasted from eval_list and unlearning_list, so should we extract a shared renderer by passing column labels and a row-mapping callable?

Finding type: Code Dedup and Conventions | Severity: 🟢 Low

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 36663cf addressed this comment by centralizing the run-table rendering into the new print_runs_table helper in _cli_common and wiring dataset_qa_list (alongside eval and unlearning lists) to call it with the column labels and row data.

Comment thread hirundo/cli_dataset_qa.py Outdated
Comment on lines +64 to +72
def dataset_qa_check(
run_id: Annotated[str, typer.Argument(help="The run ID to check.")],
):
"""
Check the status of a Dataset QA run and stream progress.
"""
from hirundo.dataset_qa import QADataset

QADataset.check_run_by_id(run_id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CLI run_id flows into QADataset.check_run_by_id(run_id) and ends up as ~/.hirundo/cache/{run_id}.zip, should we validate/sanitize run_id (e.g., disallow path separators) to prevent writing outside the cache directory?

Finding type: Basic Security Patterns | Severity: 🔴 High


Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
`hirundo/cli_dataset_qa.py` around lines 64-72, in the `dataset_qa_check(run_id:
Annotated[str, ...])` command, validate the CLI-provided `run_id` before calling
`QADataset.check_run_by_id(run_id)`. Reject any values containing path separators (`/`
or `\`), `..`, or any characters outside the expected run-id format (e.g., enforce a
strict regex such as alphanumeric/UUID). If validation fails, print a clear error and
exit without invoking `QADataset` so the downstream cache write cannot be used for path
traversal.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 23e9cc8 addressed this comment by adding a validate_run_id helper that enforces an alphanumeric/_- regex, reports invalid input, and exits before invoking QADataset. The dataset_qa_check command now uses this helper so the CLI-run ID is sanitized before any cache write occurs.

Comment thread hirundo/cli_eval.py Outdated
Comment on lines +100 to +108
def eval_check(
run_id: Annotated[str, typer.Argument(help="The run ID to check.")],
):
"""
Check the status of an LLM behavior evaluation run and stream progress.
"""
from hirundo.llm_behavior_eval import LlmBehaviorEval

LlmBehaviorEval.check_run_by_id(run_id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_id from the CLI is used directly for ~/.hirundo/cache/{run_id}.zip in download_and_extract_llm_behavior_eval_zip(), so should we sanitize/validate it to block path separators before check_run_by_id(run_id)?

Finding type: Basic Security Patterns | Severity: 🔴 High


Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
`hirundo/cli_eval.py` around lines 99-108, in the `eval_check` CLI command, the `run_id`
argument is accepted as an arbitrary string and passed directly to
`LlmBehaviorEval.check_run_by_id(run_id)`, which downstream uses it to form a cache file
path. Add strict validation for `run_id` before calling `check_run_by_id`: reject any
value containing path separators (`/` or `\`), `..`, or other unsafe characters, and
ideally constrain it to the expected run-id format (e.g., UUID/token regex). If
validation fails, print a clear error and exit with code 1 so a crafted `hirundo eval
check` argument cannot traverse outside `~/.hirundo/cache/`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 23e9cc8 addressed this comment by adding a run-id validator in _cli_common.py and wiring eval_check through validate_run_id(run_id) before calling check_run_by_id, ensuring only safe characters are accepted and the CLI exits with an error on invalid input.

Comment thread hirundo/cli.py Outdated
Comment on lines 209 to 214

runs = QADataset.list_runs()

console = Console()
table = Table(
title="Runs:",
expand=True,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy “list-runs” table is rendering the wrong run identifier — should we switch its Rich Table to use runs = QADataset.list_runs() and review the legacy row/column rendering (per hirundo/cli.py:228-232) to ensure it shows run.run_id or clearly labels the numeric DB id?

Finding type: Logical Bugs | Severity: 🟢 Low


Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In `hirundo/cli.py`
around lines 209-214 inside the `list_runs()` function, the Rich `Table` is configured
with a “Run ID” column but the implementation for the row identifier is using the
wrong attribute (currently `run.id`). Refactor the table row construction so the “Run
ID” column uses the correct identifier from `QADataset.list_runs()` (use `run.run_id`
if it exists); if only a numeric DB id is available, explicitly rename the column/header
to match (e.g., “DB ID”) so the output is not misleading. Then verify that the
rendered rows match the identifier shown by the `dataset-qa list` implementation.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 36663cf addressed this comment by moving the Dataset QA list command into hirundo/cli_dataset_qa.py and printing the table with the run.run_id attribute (via the new print_runs_table helper) instead of run.id, so the "Run ID" column now matches the dataset-qa list output.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 63c2723 addressed this comment by updating the Legacy Runs table rows to display run.run_id, ensuring the Run ID column now matches the identifier shown by dataset-qa list.

@mishana mishana changed the title Refactor CLI into modular subcommands for eval, unlearning, and dataset-qa SDK-91: Refactor CLI into modular subcommands for eval, unlearning, and dataset-qa May 9, 2026
claude added 2 commits May 9, 2026 21:43
ruff:
- Rename make_app `help` param to `help_text` (A002 builtin shadow)
- raise typer.Exit(code=1) from None in validate_enum (B904)
- Remove unused Optional imports now replaced by X | None (F401)
- Fix import ordering (I001, auto-fixed)
- Convert Optional[X] to X | None (UP045, auto-fixed)

basedpyright:
- Use elif hallucination_type is not None + exhaustive else to let
  pyright narrow the str | None type before passing to validate_enum

module-docs:
- Add .rst files for the three new public CLI modules

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt
Comment thread hirundo/cli_eval.py Outdated
Comment on lines +68 to +72
if wait:
LlmBehaviorEval.check_run_by_id(run_id)
else:
console.print(
"Use [bold]hirundo eval check[/bold] [italic]<run_id>[/italic] to monitor progress."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The run subcommands duplicate the same --wait/notification branch across hirundo/cli_eval.py, hirundo/cli_unlearning.py, and hirundo/cli_dataset_qa.py, so should we extract a shared helper to centralize the wait handling and messaging?

Finding type: Code Dedup and Conventions | Severity: 🟢 Low

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 36663cf addressed this comment by introducing the shared wait_or_notify helper in hirundo/_cli_common.py and using it in the eval, dataset-qa, and unlearning run commands to centralize the wait/notification logic and messaging.

claude added 2 commits May 9, 2026 21:47
run_id flows into cache_dir / f"{run_id}.zip" in unzip.py.
A malicious value like ../../evil would escape ~/.hirundo/cache/.

Add validate_run_id() to _cli_common which rejects anything outside
[a-zA-Z0-9_-] (covers UUIDs and integer IDs). Apply it in the check
command of each sub-app before passing run_id to the SDK.

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt
Comment thread hirundo/cli_eval.py
Comment on lines +70 to +74
run_info = EvalRunInfo(
model_id=model_id,
source_run_id=source_run_id,
preset_type=preset_type,
name=name,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--source-run-id flows into EvalRunInfo and LlmBehaviorEval.launch_eval_run() without validate_run_id(), so it accepts values eval check/unlearning check reject — should we validate source_run_id with the shared rule before launching?

Finding type: Logical Bugs | Severity: 🟢 Low


Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In hirundo/cli_eval.py
around lines 70-74 where `--source-run-id` is copied into `EvalRunInfo` and passed to
`LlmBehaviorEval.launch_eval_run()`, add validation for `source_run_id` by running it
through the shared `validate_run_id()` (the same one imported from
`hirundo._cli_common`). Refactor so the value is validated before constructing
`EvalRunInfo` (or at least immediately before calling `launch_eval_run`), and ensure any
invalid IDs fail consistently with `eval check <run_id>`/`unlearning check <run_id>`.
After the change, `--source-run-id` should be rejected with the same rules/help behavior
as other run-id parameters.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit cc2e8bf addressed this comment by validating source_run_id via validate_run_id() before constructing EvalRunInfo, ensuring the CLI now rejects invalid source run IDs just like the other run-id parameters.

Comment thread hirundo/_cli_common.py
Comment on lines +29 to +35
def validate_run_id(run_id: str) -> str:
if not _RUN_ID_RE.match(run_id):
console.print(
f"[red]Invalid run ID '{run_id}'. "
"Run IDs may only contain alphanumeric characters, hyphens, and underscores.[/red]"
)
raise typer.Exit(code=1) from None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_run_id() uses match() which can accept trailing newlines — should we switch to fullmatch(), and add unit tests in tests/test_cli_common.py covering valid/invalid inputs?

Finding types: Breaking Changes Add tests for behavior | Severity: 🟢 Low


Prompt for AI Agents:

In hirundo/_cli_common.py around lines 29-35, in the validate_run_id(run_id: str)
function: 1. Replace the `_RUN_ID_RE.match` call with `fullmatch` so the entire string
must consist of exactly the allowed characters, preventing trailing newlines or other
whitespace from slipping through. Ensure the error path still prints the same red error
message and that `validate_run_id` returns the original `run_id` only when it is
strictly canonical. 2. Add a focused unit test in `tests/test_cli_common.py` that
verifies: a valid run ID (containing only letters/numbers/hyphens/underscores) returns
the input unchanged, and an invalid run ID (e.g., containing a space, newline, or other
disallowed character) raises `typer.Exit` with code 1 and that the console output
contains `Invalid run ID '&lt;value&gt;'` and the "may only contain…" text. Use
whatever test utilities the project already uses to capture Rich/console output so the
assertion is deterministic.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 67dc58a addressed this comment by switching validate_run_id to rely on _RUN_ID_RE.fullmatch, ensuring trailing newlines no longer slip through; the requested tests for valid/invalid inputs still need to be added.

claude added 5 commits May 9, 2026 22:04
…ace cached_zip_path

Adds two shared helpers to _cli_common to eliminate copy-paste across the
three CLI sub-apps: wait_or_notify centralises the wait/notify branching,
and print_runs_table centralises Rich table construction. Also captures and
prints results.cached_zip_path for dataset-qa and eval check/run commands,
matching the legacy check-run behaviour.

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt
The "Run ID" column was displaying the numeric DB id (run.id) instead of
the string run identifier (run.run_id), inconsistent with the dataset-qa list command.

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt
Applies the same validate_run_id() check used by eval check/unlearning check
so invalid run ID formats are rejected consistently before launching.

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt
…ines

re.match with $ allows a single trailing newline through; fullmatch requires
the entire string to consist of only the allowed characters.

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt
bare `type` is not iterable; `type[Enum]` lets basedpyright know the class
supports iteration via EnumMeta.

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt
Comment thread hirundo/cli_dataset_qa.py
Comment on lines +52 to +56
runs = QADataset.list_runs(archived=archived)
print_runs_table(
"Dataset QA Runs:",
("Dataset Name", "Run ID", "Status", "Created At", "Run Args"),
[
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataset_qa_list duplicates QADataset.list_runs + table formatting from hirundo/cli.py:list_runs, should we reuse the existing table helper or delegate to list_runs for a single rendering source of truth?

Finding type: Code Dedup and Conventions | Severity: 🟢 Low

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 9a4feff addressed this comment by importing dataset_qa_list into hirundo/cli and delegating the top-level list command to dataset_qa_list(archived=False), ensuring the Dataset QA table rendering now comes from a single shared helper.

Comment thread hirundo/_cli_common.py
Comment on lines +52 to +60
def wait_or_notify(
run_id: str, check_fn: Callable[[str], Any], cmd_name: str, wait: bool
) -> Any:
if wait:
return check_fn(run_id)
console.print(
f"Use [bold]hirundo {cmd_name} check[/bold] [italic]<run_id>[/italic] to monitor progress."
)
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_or_notify changed visible behavior, but the diff lacks focused tests for the wait=True/wait=False branches (including the emitted prompt), so should we add a fast unit test that covers both cases and asserts the return propagation/message?

Finding type: Add tests for behavior | Severity: 🟢 Low


Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
hirundo/_cli_common.py around lines 52-60, the `wait_or_notify` helper introduces new
visible behavior for both `wait=True` (returns `check_fn(run_id)`) and `wait=False`
(prints the “Use hirundo … check …” guidance and returns None), but there are no
focused tests covering these branches. Add a fast unit test that calls `wait_or_notify`
directly with a stub `check_fn` and asserts the return value is propagated when
`wait=True`, and when `wait=False` it asserts the exact printed message (or a stable
substring) is emitted and the return is None. Reuse the repo’s existing test/CLI
helper utilities and rich `console` capturing mechanism if available (e.g., a fixture
that captures stdout/console output), and keep the test isolated from network/IO.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 9a4feff addressed this comment by adding focused tests for wait_or_notify, covering the wait=True return propagation and the wait=False console prompt/hint output.

Replaces inline Rich Table construction with the shared print_runs_table
helper, removing the last duplication between list-runs and dataset-qa list.

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt
Comment thread hirundo/cli.py Outdated
Comment on lines +206 to +217
from hirundo.dataset_qa import QADataset

runs = QADataset.list_runs()

console = Console()
table = Table(
title="Runs:",
expand=True,
)
cols = (
"Dataset name",
"Run ID",
"Status",
"Created At",
"Run Args",
print_runs_table(
"Runs:",
("Dataset name", "Run ID", "Status", "Created At", "Run Args"),
[
(
str(run.name),
str(run.run_id),
str(run.status),
run.created_at.isoformat(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_runs duplicates dataset_qa_list by calling QADataset.list_runs and rendering via print_runs_table — should we extract a shared helper, and add a focused test patching QADataset.list_runs() with deterministic data to assert print_runs_table is called with the expected headers/rows (including the run.run_args is None case)?

Finding types: Code Dedup and Conventions Add tests for behavior | Severity: 🟢 Low


Prompt for AI Agents:

In hirundo/cli.py around lines 206-219, `list_runs` duplicates the listing logic from
`dataset_qa_list`. Consider extracting a shared helper for calling
`QADataset.list_runs()` and rendering via `print_runs_table(...)`. Additionally, add a
focused test that patches `hirundo.cli.QADataset.list_runs()` to return a deterministic
list of run-like objects (with name, status, created_at.isoformat(), and
run_args.model_dump_json() or None), and patches `hirundo.cli.print_runs_table` to
assert it was called once with title "Runs:" and headers ("Dataset name", "Run ID",
"Status", "Created At", "Run Args") and rows matching the expected tuple values. Ensure
the test also covers the case where `run.run_args` is None so the last field is None.
Place the test in the appropriate existing test module for the CLI.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 9a4feff addressed this comment by delegating the CLI list_runs command to dataset_qa_list, so the shared helper now controls fetching the runs and printing the table, which removes the duplicated logic. The focused test patching QADataset.list_runs/print_runs_table mentioned in the comment still needs to be added separately.

- Rename loop variable `e` -> `member` in validate_enum
- Delegate legacy list-runs to dataset_qa_list to eliminate duplicate table rendering
- Add validate_run_id + None guard to legacy check-run for consistency
- Add tests/test_cli_common.py covering validate_run_id (valid/invalid cases)
  and wait_or_notify (both wait=True/False branches)

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants