SDK-91: Refactor CLI into modular subcommands for eval, unlearning, and dataset-qa by mishana · Pull Request #227 · Hirundo-io/hirundo-python-sdk

mishana · 2026-05-05T11:31:24Z

User description

Summary

This PR refactors the CLI structure by extracting functionality into separate, modular subcommand modules. The main CLI now delegates to specialized apps for evaluation, unlearning, and dataset QA operations, improving code organization and maintainability.

Key Changes

New _cli_common.py module: Extracted shared CLI utilities including:
- console object for Rich output
- hirundo_epilog configuration
- make_app() helper for creating consistent Typer apps
- validate_enum() helper for enum validation with user-friendly error messages
New cli_eval.py module: Extracted evaluation-related commands:
- eval run: Launch LLM behavior evaluation runs with preset configurations
- eval list: List evaluation runs with optional filtering
- eval check: Monitor evaluation run progress
New cli_unlearning.py module: Extracted unlearning-related commands:
- unlearning run: Launch LLM unlearning runs (bias or hallucination types)
- unlearning list: List unlearning runs with optional filtering
- unlearning check: Monitor unlearning run progress
New cli_dataset_qa.py module: Extracted dataset QA commands:
- dataset-qa run: Launch Dataset QA runs
- dataset-qa list: List Dataset QA runs with optional filtering
- dataset-qa check: Monitor Dataset QA run progress
Updated cli.py:
- Removed duplicated CLI utilities (moved to _cli_common.py)
- Registered new subcommand apps with the main Typer app
- Simplified main CLI file to focus on core functionality

Implementation Details

All subcommand modules follow a consistent pattern using the make_app() helper
Enum validation is centralized in validate_enum() for consistent error handling
Rich console output is shared across all modules via _cli_common.console
Each subcommand module is self-contained with its own imports and logic
The main CLI acts as a router, delegating to specialized apps via add_typer()

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

Note

Medium Risk
Moderate risk because it restructures the public CLI surface area by introducing new subcommands and centralized argument validation, which can break existing invocation patterns or expose mismatches in run field names.

Overview
Refactors the hirundo Typer CLI into modular sub-apps and registers new top-level subcommands: hirundo eval, hirundo unlearning, and hirundo dataset-qa.

Adds _cli_common.py to centralize shared CLI setup (console, hirundo_epilog, make_app) and introduces validate_enum() for consistent enum parsing and user-friendly failures.

Implements new run/list/check commands for evaluation, unlearning, and dataset QA, including stricter flag validation (mutual exclusivity/required options) and standardized Rich table output.

^{Reviewed by Cursor Bugbot for commit 7b375da. Configure here.}

Generated description

Below is a concise technical summary of the changes proposed in this PR:
Refactor the CLI structure to route hirundo.cli_eval, hirundo.cli_unlearning, and hirundo.cli_dataset_qa through the main Typer app so each flow gets its own run/list/check commands. Share _cli_common helpers for console output, enum/run validation, and table rendering to keep every subcommand self-contained and consistent.

Topic Details

Subcommand apps

Register hirundo.cli_eval, hirundo.cli_unlearning, and hirundo.cli_dataset_qa Typer apps on the main CLI to route run/list/check flows for evaluation, unlearning, and dataset QA.

Modified files (7)

docs/hirundo.cli_dataset_qa.rst
docs/hirundo.cli_eval.rst
docs/hirundo.cli_unlearning.rst
hirundo/cli.py
hirundo/cli_dataset_qa.py
hirundo/cli_eval.py
hirundo/cli_unlearning.py

Latest Contributors(1)

User	Commit	Date
noreply@anthropic.com	Fix CI: ruff, basedpyr...	May 09, 2026

CLI utilities

Share _cli_common helpers (make_app, validate_enum, validate_run_id, wait_or_notify, print_runs_table, console) to centralize epilog handling, run validation, and table output, and cover them with tests.

Modified files (2)

hirundo/_cli_common.py
tests/test_cli_common.py

Latest Contributors(1)

User	Commit	Date
noreply@anthropic.com	refactor(cli): clean u...	May 10, 2026

This pull request is reviewed by Baz. Review like a pro on (Baz).

Implement three new typer sub-apps following the skills.py pattern: - `hirundo eval run/list/check` — launch/list/monitor LLM behavior eval runs via LlmBehaviorEval.launch_eval_run / check_run_by_id / list_runs - `hirundo dataset-qa run/list/check` — launch/list/monitor Dataset QA runs via QADataset.launch_qa_run / check_run_by_id / list_runs - `hirundo unlearning run/list/check` — launch/list/monitor LLM unlearning runs via LlmUnlearningRun.launch / check_run_by_id / list Each `run` command accepts --wait/--no-wait (default: wait) to either stream tqdm progress or immediately return the run ID for scripting. Closes #220 https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

Extract repeated docs/epilog/console/Typer setup from the three CLI sub-apps into hirundo/_cli_common.py: - hirundo_epilog, docs, console — computed/instantiated once - make_app() — factory for sub-app Typer instances - validate_enum() — replaces copy-paste try/except enum coercion Remove verbose module docstrings that duplicated typer --help output. Remove local Console() instantiation from cli.py list_runs(). https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

baz-reviewer · 2026-05-05T11:39:14Z

+def validate_enum(value: str, enum_cls, label: str):
+    try:
+        return enum_cls(value.upper())
+    except ValueError:
+        valid = ", ".join(e.value for e in enum_cls)


The shared CLI helper still uses the 1-character variable e and the same short-name refactor shows up in hirundo/cli_eval.py and hirundo/cli_unlearning.py, conflicting with AGENTS.md naming rules—should we rename those loop variables to descriptive identifiers?

_{Finding type: AI Coding Guidelines | Severity: 🟢 Low}

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In hirundo/_cli_common.py around lines 26-30 within the `validate_enum` function, replace the 1-character generator variable `e` used in `', '.join(e.value for e in enum_cls)` with a descriptive name (e.g., `enum_member`/`option`) to comply with `AGENTS.md`’s `Coding Style & Naming` rule. Then scan `hirundo/cli_eval.py` (`col`, `run`) and `hirundo/cli_unlearning.py` (`col`, `run`) for the same 1–3 character loop variable pattern in similar join/comprehension logic and rename those variables to descriptive identifiers as well, keeping the output formatting and error message text identical.

Commit 9a4feff addressed this comment by replacing the single-character generator variable in validate_enum with a descriptive identifier, aligning the shared helper with the AGENTS naming guidelines even though the CLI-specific files were untouched.

baz-reviewer · 2026-05-05T11:39:15Z

+    run_id = QADataset.launch_qa_run(dataset_id)
+    console.print(f"Dataset QA run started. Run ID: [bold]{run_id}[/bold]")
+
+    if wait:
+        QADataset.check_run_by_id(run_id)


dataset_qa_run duplicates the launch/print/wait branching in eval_run and unlearning_run — should we extract a shared helper, and also capture the return value of QADataset.check_run_by_id(run_id) to surface results.cached_zip_path (matching the check-run output in cli.py:199-200)?

_{Finding types: Code Dedup and Conventions Breaking Changes | Severity: 🟢 Low}

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. 1. Extract the shared wait/notification branching logic from `dataset_qa_run`, `eval_run`, and `unlearning_run` into a centralized helper that accepts launch/monitor functions and a command name. 2. In `dataset_qa_run` (lines 27-28), capture the return value of `QADataset.check_run_by_id(run_id)` as `results` and print the legacy completion line including `results.cached_zip_path`, matching the wording/format used by the existing `check-run` output in cli.py:199-200. 3. Update `dataset_qa_check` (lines 63-72) to print the same results-location line when streaming completes, so both commands provide a stable contract for scripts/users.

Commit 36663cf addressed this comment by introducing the shared wait_or_notify helper so dataset-qa, eval, and unlearning runs reuse the same wait/notify branching. dataset_qa_run now captures the check_run_by_id results and prints the cached_zip_path, and dataset_qa_check also emits the same results-location line for streaming completion, matching the check-run output contract.

baz-reviewer · 2026-05-05T11:39:15Z

+    for run in runs:
+        table.add_row(
+            str(run.name),
+            str(run.run_id),
+            str(run.status),


The table.add_row(...) iteration block in dataset_qa_list is copy/pasted from eval_list and unlearning_list, so should we extract a shared renderer by passing column labels and a row-mapping callable?

_{Finding type: Code Dedup and Conventions | Severity: 🟢 Low}

Commit 36663cf addressed this comment by centralizing the run-table rendering into the new print_runs_table helper in _cli_common and wiring dataset_qa_list (alongside eval and unlearning lists) to call it with the column labels and row data.

baz-reviewer · 2026-05-05T11:39:15Z

+def dataset_qa_check(
+    run_id: Annotated[str, typer.Argument(help="The run ID to check.")],
+):
+    """
+    Check the status of a Dataset QA run and stream progress.
+    """
+    from hirundo.dataset_qa import QADataset
+
+    QADataset.check_run_by_id(run_id)


Since CLI run_id flows into QADataset.check_run_by_id(run_id) and ends up as ~/.hirundo/cache/{run_id}.zip, should we validate/sanitize run_id (e.g., disallow path separators) to prevent writing outside the cache directory?

_{Finding type: Basic Security Patterns | Severity: 🔴 High}

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In `hirundo/cli_dataset_qa.py` around lines 64-72, in the `dataset_qa_check(run_id: Annotated[str, ...])` command, validate the CLI-provided `run_id` before calling `QADataset.check_run_by_id(run_id)`. Reject any values containing path separators (`/` or `\`), `..`, or any characters outside the expected run-id format (e.g., enforce a strict regex such as alphanumeric/UUID). If validation fails, print a clear error and exit without invoking `QADataset` so the downstream cache write cannot be used for path traversal.

Commit 23e9cc8 addressed this comment by adding a validate_run_id helper that enforces an alphanumeric/_- regex, reports invalid input, and exits before invoking QADataset. The dataset_qa_check command now uses this helper so the CLI-run ID is sanitized before any cache write occurs.

baz-reviewer · 2026-05-05T11:39:15Z

+def eval_check(
+    run_id: Annotated[str, typer.Argument(help="The run ID to check.")],
+):
+    """
+    Check the status of an LLM behavior evaluation run and stream progress.
+    """
+    from hirundo.llm_behavior_eval import LlmBehaviorEval
+
+    LlmBehaviorEval.check_run_by_id(run_id)


run_id from the CLI is used directly for ~/.hirundo/cache/{run_id}.zip in download_and_extract_llm_behavior_eval_zip(), so should we sanitize/validate it to block path separators before check_run_by_id(run_id)?

_{Finding type: Basic Security Patterns | Severity: 🔴 High}

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In `hirundo/cli_eval.py` around lines 99-108, in the `eval_check` CLI command, the `run_id` argument is accepted as an arbitrary string and passed directly to `LlmBehaviorEval.check_run_by_id(run_id)`, which downstream uses it to form a cache file path. Add strict validation for `run_id` before calling `check_run_by_id`: reject any value containing path separators (`/` or `\`), `..`, or other unsafe characters, and ideally constrain it to the expected run-id format (e.g., UUID/token regex). If validation fails, print a clear error and exit with code 1 so a crafted `hirundo eval check` argument cannot traverse outside `~/.hirundo/cache/`.

Commit 23e9cc8 addressed this comment by adding a run-id validator in _cli_common.py and wiring eval_check through validate_run_id(run_id) before calling check_run_by_id, ensuring only safe characters are accepted and the CLI exits with an error on invalid input.

baz-reviewer · 2026-05-05T11:39:15Z


    runs = QADataset.list_runs()

-    console = Console()
    table = Table(
        title="Runs:",
        expand=True,


The legacy “list-runs” table is rendering the wrong run identifier — should we switch its Rich Table to use runs = QADataset.list_runs() and review the legacy row/column rendering (per hirundo/cli.py:228-232) to ensure it shows run.run_id or clearly labels the numeric DB id?

_{Finding type: Logical Bugs | Severity: 🟢 Low}

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In `hirundo/cli.py` around lines 209-214 inside the `list_runs()` function, the Rich `Table` is configured with a “Run ID” column but the implementation for the row identifier is using the wrong attribute (currently `run.id`). Refactor the table row construction so the “Run ID” column uses the correct identifier from `QADataset.list_runs()` (use `run.run_id` if it exists); if only a numeric DB id is available, explicitly rename the column/header to match (e.g., “DB ID”) so the output is not misleading. Then verify that the rendered rows match the identifier shown by the `dataset-qa list` implementation.

Commit 36663cf addressed this comment by moving the Dataset QA list command into hirundo/cli_dataset_qa.py and printing the table with the run.run_id attribute (via the new print_runs_table helper) instead of run.id, so the "Run ID" column now matches the dataset-qa list output.

Commit 63c2723 addressed this comment by updating the Legacy Runs table rows to display run.run_id, ensuring the Run ID column now matches the identifier shown by dataset-qa list.

ruff: - Rename make_app `help` param to `help_text` (A002 builtin shadow) - raise typer.Exit(code=1) from None in validate_enum (B904) - Remove unused Optional imports now replaced by X | None (F401) - Fix import ordering (I001, auto-fixed) - Convert Optional[X] to X | None (UP045, auto-fixed) basedpyright: - Use elif hallucination_type is not None + exhaustive else to let pyright narrow the str | None type before passing to validate_enum module-docs: - Add .rst files for the three new public CLI modules https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

baz-reviewer · 2026-05-09T21:45:17Z

+    if wait:
+        LlmBehaviorEval.check_run_by_id(run_id)
+    else:
+        console.print(
+            "Use [bold]hirundo eval check[/bold] [italic]<run_id>[/italic] to monitor progress."


The run subcommands duplicate the same --wait/notification branch across hirundo/cli_eval.py, hirundo/cli_unlearning.py, and hirundo/cli_dataset_qa.py, so should we extract a shared helper to centralize the wait handling and messaging?

_{Finding type: Code Dedup and Conventions | Severity: 🟢 Low}

Commit 36663cf addressed this comment by introducing the shared wait_or_notify helper in hirundo/_cli_common.py and using it in the eval, dataset-qa, and unlearning run commands to centralize the wait/notification logic and messaging.

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

run_id flows into cache_dir / f"{run_id}.zip" in unzip.py. A malicious value like ../../evil would escape ~/.hirundo/cache/. Add validate_run_id() to _cli_common which rejects anything outside [a-zA-Z0-9_-] (covers UUIDs and integer IDs). Apply it in the check command of each sub-app before passing run_id to the SDK. https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

baz-reviewer · 2026-05-09T21:57:29Z

+    run_info = EvalRunInfo(
+        model_id=model_id,
+        source_run_id=source_run_id,
+        preset_type=preset_type,
+        name=name,


--source-run-id flows into EvalRunInfo and LlmBehaviorEval.launch_eval_run() without validate_run_id(), so it accepts values eval check/unlearning check reject — should we validate source_run_id with the shared rule before launching?

_{Finding type: Logical Bugs | Severity: 🟢 Low}

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In hirundo/cli_eval.py around lines 70-74 where `--source-run-id` is copied into `EvalRunInfo` and passed to `LlmBehaviorEval.launch_eval_run()`, add validation for `source_run_id` by running it through the shared `validate_run_id()` (the same one imported from `hirundo._cli_common`). Refactor so the value is validated before constructing `EvalRunInfo` (or at least immediately before calling `launch_eval_run`), and ensure any invalid IDs fail consistently with `eval check <run_id>`/`unlearning check <run_id>`. After the change, `--source-run-id` should be rejected with the same rules/help behavior as other run-id parameters.

Commit cc2e8bf addressed this comment by validating source_run_id via validate_run_id() before constructing EvalRunInfo, ensuring the CLI now rejects invalid source run IDs just like the other run-id parameters.

baz-reviewer · 2026-05-09T21:57:29Z

+def validate_run_id(run_id: str) -> str:
+    if not _RUN_ID_RE.match(run_id):
+        console.print(
+            f"[red]Invalid run ID '{run_id}'. "
+            "Run IDs may only contain alphanumeric characters, hyphens, and underscores.[/red]"
+        )
+        raise typer.Exit(code=1) from None


validate_run_id() uses match() which can accept trailing newlines — should we switch to fullmatch(), and add unit tests in tests/test_cli_common.py covering valid/invalid inputs?

_{Finding types: Breaking Changes Add tests for behavior | Severity: 🟢 Low}

Prompt for AI Agents:

In hirundo/_cli_common.py around lines 29-35, in the validate_run_id(run_id: str) function: 1. Replace the `_RUN_ID_RE.match` call with `fullmatch` so the entire string must consist of exactly the allowed characters, preventing trailing newlines or other whitespace from slipping through. Ensure the error path still prints the same red error message and that `validate_run_id` returns the original `run_id` only when it is strictly canonical. 2. Add a focused unit test in `tests/test_cli_common.py` that verifies: a valid run ID (containing only letters/numbers/hyphens/underscores) returns the input unchanged, and an invalid run ID (e.g., containing a space, newline, or other disallowed character) raises `typer.Exit` with code 1 and that the console output contains `Invalid run ID '<value>'` and the "may only contain…" text. Use whatever test utilities the project already uses to capture Rich/console output so the assertion is deterministic.

Commit 67dc58a addressed this comment by switching validate_run_id to rely on _RUN_ID_RE.fullmatch, ensuring trailing newlines no longer slip through; the requested tests for valid/invalid inputs still need to be added.

…ace cached_zip_path Adds two shared helpers to _cli_common to eliminate copy-paste across the three CLI sub-apps: wait_or_notify centralises the wait/notify branching, and print_runs_table centralises Rich table construction. Also captures and prints results.cached_zip_path for dataset-qa and eval check/run commands, matching the legacy check-run behaviour. https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

The "Run ID" column was displaying the numeric DB id (run.id) instead of the string run identifier (run.run_id), inconsistent with the dataset-qa list command. https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

Applies the same validate_run_id() check used by eval check/unlearning check so invalid run ID formats are rejected consistently before launching. https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

…ines re.match with $ allows a single trailing newline through; fullmatch requires the entire string to consist of only the allowed characters. https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

bare `type` is not iterable; `type[Enum]` lets basedpyright know the class supports iteration via EnumMeta. https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

baz-reviewer · 2026-05-09T22:17:08Z

+    runs = QADataset.list_runs(archived=archived)
+    print_runs_table(
+        "Dataset QA Runs:",
+        ("Dataset Name", "Run ID", "Status", "Created At", "Run Args"),
+        [


dataset_qa_list duplicates QADataset.list_runs + table formatting from hirundo/cli.py:list_runs, should we reuse the existing table helper or delegate to list_runs for a single rendering source of truth?

_{Finding type: Code Dedup and Conventions | Severity: 🟢 Low}

Commit 9a4feff addressed this comment by importing dataset_qa_list into hirundo/cli and delegating the top-level list command to dataset_qa_list(archived=False), ensuring the Dataset QA table rendering now comes from a single shared helper.

baz-reviewer · 2026-05-09T22:17:08Z

+def wait_or_notify(
+    run_id: str, check_fn: Callable[[str], Any], cmd_name: str, wait: bool
+) -> Any:
+    if wait:
+        return check_fn(run_id)
+    console.print(
+        f"Use [bold]hirundo {cmd_name} check[/bold] [italic]<run_id>[/italic] to monitor progress."
+    )
+    return None


wait_or_notify changed visible behavior, but the diff lacks focused tests for the wait=True/wait=False branches (including the emitted prompt), so should we add a fast unit test that covers both cases and asserts the return propagation/message?

_{Finding type: Add tests for behavior | Severity: 🟢 Low}

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In hirundo/_cli_common.py around lines 52-60, the `wait_or_notify` helper introduces new visible behavior for both `wait=True` (returns `check_fn(run_id)`) and `wait=False` (prints the “Use hirundo … check …” guidance and returns None), but there are no focused tests covering these branches. Add a fast unit test that calls `wait_or_notify` directly with a stub `check_fn` and asserts the return value is propagated when `wait=True`, and when `wait=False` it asserts the exact printed message (or a stable substring) is emitted and the return is None. Reuse the repo’s existing test/CLI helper utilities and rich `console` capturing mechanism if available (e.g., a fixture that captures stdout/console output), and keep the test isolated from network/IO.

Commit 9a4feff addressed this comment by adding focused tests for wait_or_notify, covering the wait=True return propagation and the wait=False console prompt/hint output.

Replaces inline Rich Table construction with the shared print_runs_table helper, removing the last duplication between list-runs and dataset-qa list. https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

baz-reviewer · 2026-05-09T22:19:00Z

    from hirundo.dataset_qa import QADataset

    runs = QADataset.list_runs()
-
-    console = Console()
-    table = Table(
-        title="Runs:",
-        expand=True,
-    )
-    cols = (
-        "Dataset name",
-        "Run ID",
-        "Status",
-        "Created At",
-        "Run Args",
+    print_runs_table(
+        "Runs:",
+        ("Dataset name", "Run ID", "Status", "Created At", "Run Args"),
+        [
+            (
+                str(run.name),
+                str(run.run_id),
+                str(run.status),
+                run.created_at.isoformat(),


list_runs duplicates dataset_qa_list by calling QADataset.list_runs and rendering via print_runs_table — should we extract a shared helper, and add a focused test patching QADataset.list_runs() with deterministic data to assert print_runs_table is called with the expected headers/rows (including the run.run_args is None case)?

_{Finding types: Code Dedup and Conventions Add tests for behavior | Severity: 🟢 Low}

Prompt for AI Agents:

In hirundo/cli.py around lines 206-219, `list_runs` duplicates the listing logic from `dataset_qa_list`. Consider extracting a shared helper for calling `QADataset.list_runs()` and rendering via `print_runs_table(...)`. Additionally, add a focused test that patches `hirundo.cli.QADataset.list_runs()` to return a deterministic list of run-like objects (with name, status, created_at.isoformat(), and run_args.model_dump_json() or None), and patches `hirundo.cli.print_runs_table` to assert it was called once with title "Runs:" and headers ("Dataset name", "Run ID", "Status", "Created At", "Run Args") and rows matching the expected tuple values. Ensure the test also covers the case where `run.run_args` is None so the last field is None. Place the test in the appropriate existing test module for the CLI.

Commit 9a4feff addressed this comment by delegating the CLI list_runs command to dataset_qa_list, so the shared helper now controls fetching the runs and printing the table, which removes the duplicated logic. The focused test patching QADataset.list_runs/print_runs_table mentioned in the comment still needs to be added separately.

- Rename loop variable `e` -> `member` in validate_enum - Delegate legacy list-runs to dataset_qa_list to eliminate duplicate table rendering - Add validate_run_id + None guard to legacy check-run for consistency - Add tests/test_cli_common.py covering validate_run_id (valid/invalid cases) and wait_or_notify (both wait=True/False branches) https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

claude added 2 commits April 8, 2026 00:30

mishana requested review from a team as code owners May 5, 2026 11:31

baz-reviewer Bot reviewed May 5, 2026

View reviewed changes

mishana changed the title ~~Refactor CLI into modular subcommands for eval, unlearning, and dataset-qa~~ SDK-91: Refactor CLI into modular subcommands for eval, unlearning, and dataset-qa May 9, 2026

claude added 2 commits May 9, 2026 21:43

Remove extra blank line in cli.py (linter)

d235f6b

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

baz-reviewer Bot reviewed May 9, 2026

View reviewed changes

claude added 2 commits May 9, 2026 21:47

Fix ruff format failures

f1a4e0f

https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

baz-reviewer Bot reviewed May 9, 2026

View reviewed changes

claude added 5 commits May 9, 2026 22:04

fix(cli): use run.run_id instead of run.id in legacy list-runs table

63c2723

The "Run ID" column was displaying the numeric DB id (run.id) instead of the string run identifier (run.run_id), inconsistent with the dataset-qa list command. https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

fix(cli): validate --source-run-id in eval run command

cc2e8bf

Applies the same validate_run_id() check used by eval check/unlearning check so invalid run ID formats are rejected consistently before launching. https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

fix(cli): use fullmatch for run ID validation to reject trailing newl…

67dc58a

…ines re.match with $ allows a single trailing newline through; fullmatch requires the entire string to consist of only the allowed characters. https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

fix(cli): type validate_enum enum_cls as type[Enum] for basedpyright

474d44a

bare `type` is not iterable; `type[Enum]` lets basedpyright know the class supports iteration via EnumMeta. https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

baz-reviewer Bot reviewed May 9, 2026

View reviewed changes

refactor(cli): use print_runs_table in legacy list-runs command

95812ac

Replaces inline Rich Table construction with the shared print_runs_table helper, removing the last duplication between list-runs and dataset-qa list. https://claude.ai/code/session_01QGjQJb8G8DM8UaYt66Jewt

baz-reviewer Bot reviewed May 9, 2026

View reviewed changes

Conversation

mishana commented May 5, 2026 • edited by baz-reviewer Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Key Changes

Implementation Details

Generated description

Uh oh!

baz-reviewer Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 9, 2026

Choose a reason for hiding this comment

mishana commented May 5, 2026 •

edited by baz-reviewer Bot

Loading