Skip to content

Expand CLI to launch eval, dataset-qa, and unlearning runs #220

@mishana

Description

@mishana

Background

The existing hirundo CLI (hirundo/cli.py) can only inspect runs (check-run, list-runs) and manage config (setup, set-api-key, change-remote). It cannot start runs.

This issue tracks adding sub-commands to launch all three run types, making the CLI useful for scripting and CI pipelines.

Proposed commands

hirundo eval

hirundo eval run   --name TEXT --model-id INT  --preset [BBQ_BIAS|BBQ_UNBIAS|UNQOVER_BIAS|HALU_EVAL|MED_HALLU|INJECTION_EVAL]
                   --bias-type TEXT            # optional, for BBQ/Unqover presets
                   --source-run-id TEXT        # alternative to --model-id
                   --wait / --no-wait          # block until completion (default: --wait)
hirundo eval list  [--archived]
hirundo eval check <run-id>

Calls: LlmBehaviorEval.launch_eval_run(), LlmBehaviorEval.check_run_by_id(), LlmBehaviorEval.list_runs()

hirundo dataset-qa

hirundo dataset-qa run   --dataset-id INT      # run QA on an already-registered dataset
                         --wait / --no-wait
hirundo dataset-qa list  [--archived]
hirundo dataset-qa check <run-id>

Calls: QADataset.launch_qa_run(), QADataset.check_run_by_id(), QADataset.list_runs()

hirundo unlearning

hirundo unlearning run   --model-id INT  --name TEXT
                         --bias-type [ALL|RACE|NATIONALITY|GENDER|PHYSICAL_APPEARANCE|RELIGION|AGE]
                         --hallucination-type [GENERAL|MEDICAL|LEGAL|DEFENSE]
                         --security          # flag to include SecurityBehavior
                         --wait / --no-wait
hirundo unlearning list  [--archived]
hirundo unlearning check <run-id>

Calls: LlmUnlearningRun.launch(), LlmUnlearningRun.check_run_by_id(), LlmUnlearningRun.list()

Implementation notes

  • Each group should be a typer.Typer() sub-app added to the main app in cli.py using separate modules and add_typer() registrations to keep the CLI structure modular.
  • --wait (default on) streams progress via the existing tqdm-based check_run_by_id methods. --no-wait just prints the run_id and exits.
  • list commands should render a rich.Table consistent with the existing list-runs command.
  • New modules: hirundo/cli_eval.py, hirundo/cli_dataset_qa.py, hirundo/cli_unlearning.py (keeps cli.py thin).
  • Keep existing check-run / list-runs commands as-is for backward compatibility (they cover dataset-qa runs).

Relevant source files

File Key APIs
hirundo/llm_behavior_eval.py LlmBehaviorEval.launch_eval_run, check_run_by_id, list_runs
hirundo/dataset_qa.py QADataset.launch_qa_run, check_run_by_id, list_runs
hirundo/unlearning_llm.py LlmUnlearningRun.launch, check_run_by_id, list
hirundo/cli.py Main Typer app — add add_typer() calls here

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions