Background
The existing hirundo CLI (hirundo/cli.py) can only inspect runs (check-run, list-runs) and manage config (setup, set-api-key, change-remote). It cannot start runs.
This issue tracks adding sub-commands to launch all three run types, making the CLI useful for scripting and CI pipelines.
Proposed commands
hirundo eval
hirundo eval run --name TEXT --model-id INT --preset [BBQ_BIAS|BBQ_UNBIAS|UNQOVER_BIAS|HALU_EVAL|MED_HALLU|INJECTION_EVAL]
--bias-type TEXT # optional, for BBQ/Unqover presets
--source-run-id TEXT # alternative to --model-id
--wait / --no-wait # block until completion (default: --wait)
hirundo eval list [--archived]
hirundo eval check <run-id>
Calls: LlmBehaviorEval.launch_eval_run(), LlmBehaviorEval.check_run_by_id(), LlmBehaviorEval.list_runs()
hirundo dataset-qa
hirundo dataset-qa run --dataset-id INT # run QA on an already-registered dataset
--wait / --no-wait
hirundo dataset-qa list [--archived]
hirundo dataset-qa check <run-id>
Calls: QADataset.launch_qa_run(), QADataset.check_run_by_id(), QADataset.list_runs()
hirundo unlearning
hirundo unlearning run --model-id INT --name TEXT
--bias-type [ALL|RACE|NATIONALITY|GENDER|PHYSICAL_APPEARANCE|RELIGION|AGE]
--hallucination-type [GENERAL|MEDICAL|LEGAL|DEFENSE]
--security # flag to include SecurityBehavior
--wait / --no-wait
hirundo unlearning list [--archived]
hirundo unlearning check <run-id>
Calls: LlmUnlearningRun.launch(), LlmUnlearningRun.check_run_by_id(), LlmUnlearningRun.list()
Implementation notes
- Each group should be a
typer.Typer() sub-app added to the main app in cli.py using separate modules and add_typer() registrations to keep the CLI structure modular.
--wait (default on) streams progress via the existing tqdm-based check_run_by_id methods. --no-wait just prints the run_id and exits.
list commands should render a rich.Table consistent with the existing list-runs command.
- New modules:
hirundo/cli_eval.py, hirundo/cli_dataset_qa.py, hirundo/cli_unlearning.py (keeps cli.py thin).
- Keep existing
check-run / list-runs commands as-is for backward compatibility (they cover dataset-qa runs).
Relevant source files
| File |
Key APIs |
hirundo/llm_behavior_eval.py |
LlmBehaviorEval.launch_eval_run, check_run_by_id, list_runs |
hirundo/dataset_qa.py |
QADataset.launch_qa_run, check_run_by_id, list_runs |
hirundo/unlearning_llm.py |
LlmUnlearningRun.launch, check_run_by_id, list |
hirundo/cli.py |
Main Typer app — add add_typer() calls here |
Background
The existing
hirundoCLI (hirundo/cli.py) can only inspect runs (check-run,list-runs) and manage config (setup,set-api-key,change-remote). It cannot start runs.This issue tracks adding sub-commands to launch all three run types, making the CLI useful for scripting and CI pipelines.
Proposed commands
hirundo evalCalls:
LlmBehaviorEval.launch_eval_run(),LlmBehaviorEval.check_run_by_id(),LlmBehaviorEval.list_runs()hirundo dataset-qaCalls:
QADataset.launch_qa_run(),QADataset.check_run_by_id(),QADataset.list_runs()hirundo unlearningCalls:
LlmUnlearningRun.launch(),LlmUnlearningRun.check_run_by_id(),LlmUnlearningRun.list()Implementation notes
typer.Typer()sub-app added to the mainappincli.pyusing separate modules andadd_typer()registrations to keep the CLI structure modular.--wait(default on) streams progress via the existingtqdm-basedcheck_run_by_idmethods.--no-waitjust prints therun_idand exits.listcommands should render arich.Tableconsistent with the existinglist-runscommand.hirundo/cli_eval.py,hirundo/cli_dataset_qa.py,hirundo/cli_unlearning.py(keepscli.pythin).check-run/list-runscommands as-is for backward compatibility (they cover dataset-qa runs).Relevant source files
hirundo/llm_behavior_eval.pyLlmBehaviorEval.launch_eval_run,check_run_by_id,list_runshirundo/dataset_qa.pyQADataset.launch_qa_run,check_run_by_id,list_runshirundo/unlearning_llm.pyLlmUnlearningRun.launch,check_run_by_id,listhirundo/cli.pyadd_typer()calls here