LLM-81: Add Excel export for summary comparisons and CLI command by benglewis · Pull Request #111 · Hirundo-io/llm-behavior-eval

benglewis · 2026-03-05T10:33:17Z

User description

Motivation

Provide a convenient way to export per-dataset comparison workbooks (one sheet per dataset) from summary CSVs for analysis and reporting.
Make metric column names consistent across the codebase to avoid string duplication and reduce chance of typos.

Description

Added an export_excel utility (llm_behavior_eval/export_excel.py) that merges two summary_brief.csv files, normalizes metrics, writes an .xlsx workbook with one sheet per dataset, and inserts comparison charts; it uses xlsxwriter and validates input overlap and numeric values.
Introduced metric name constants in llm_behavior_eval/evaluation_utils/metrics.py and updated BaseEvaluator to use these constants instead of hardcoded column header strings.
Exposed the Excel export as a CLI command export-excel in llm_behavior_eval/evaluate.py via the export_excel_command wrapper and registered it on the typer app.
Documented the feature in README.md including usage example and install note to enable the optional excel dependency.
Added an optional dependency group excel with xlsxwriter>=3.2.0 and included excel in the dev dependency group in pyproject.toml.
Added tests: a new tests/test_export_excel.py suite validating sheet sanitization, file output, and overlap checking, and a CLI test in tests/test_evaluate_cli.py that checks the new command forwards options correctly.

Testing

Ran the test suite with pytest including the new tests/test_export_excel.py and modified tests/test_evaluate_cli.py tests, and they passed.
Verified the new CLI unit test test_export_excel_command_passes_new_option_names passed and correctly exercises the export_excel_command argument mapping.

Codex Task

Generated description

Below is a concise technical summary of the changes proposed in this PR:
Add Excel export as a CLI workflow via the new export_excel helper and register the export-excel command so teams can generate comparison workbooks directly from the summary_brief.csv outputs, complete with optional labels and dataset filtering along with docs and dependency updates to install xlsxwriter. Introduce metric column constants in evaluation_utils.metrics and consume them in BaseEvaluator so CSV generation stays consistent while also powering the Excel exporter.

Topic Details

Excel export flow

Implement Excel export workbook generation, CLI wiring, docs, dependency extras, and tests to let users compare pretrained/unlearned summaries via export_excel and export-excel commands with customizable labels and sheet selection.

Modified files (7)

README.md
llm_behavior_eval/evaluate.py
llm_behavior_eval/export_excel.py
pyproject.toml
tests/test_evaluate_cli.py
tests/test_export_excel.py
uv.lock

Latest Contributors(2)

User	Commit	Date
orr@hirundo.io	LLM-75-Fix-Result-Dire...	March 01, 2026
mishana4life@gmail.com	LLM-64-Change-results-...	February 16, 2026

Summary metrics

Standardize summary metric headers by using evaluation_utils.metrics constants inside BaseEvaluator so generated CSVs remain aligned with the export tooling.

Modified files (2)

llm_behavior_eval/evaluation_utils/base_evaluator.py
llm_behavior_eval/evaluation_utils/metrics.py

Latest Contributors(2)

User	Commit	Date
orr@hirundo.io	LLM-76-Add-option-to-r...	March 02, 2026
blewis@hirundo.io	LLM-68-Remove-empty-co...	February 19, 2026

This pull request is reviewed by Baz. Review like a pro on (Baz).

baz-reviewer · 2026-03-05T10:51:27Z

Spec Reviewer Report 📬

Checkout in Baz

2 / 3 requirements met for ticket:

Please add the ability to create an Excel spreadsheet (.xlsx) as an output

1 unmet requirement

#	Requirement	Explanation
1	Dev group installs Excel optional extra - Not Met	pyproject.toml was not modified in the diff to declare an 'excel' optional extra or to include it in the dev dependency group; therefore packaging metadata changes required by this requirement are missing. evidence pyproject.toml:50-53 optional 'excel' extra with xlsxwriter pyproject.toml:64-72 dev dependency group installs llm-behavior-eval[excel,…] attempted:true

2 met requirements

#	Requirement	Explanation
1	Export Excel comparison subcommand	New CLI command forwards dataset filters and model names to export logic that merges summary CSVs, produces one sheet per dataset, and adds a comparison bar chart for the two models. evidence llm_behavior_eval/evaluate.py:584-644 new export-excel CLI command llm_behavior_eval/export_excel.py:117-189 merges data, filters datasets, writes sheets, adds chart tests/test_evaluate_cli.py:463-492 confirms CLI options forward to export function
2	Per-dataset Excel sheet with comparison chart	The new export path groups merged metrics by dataset, writes each dataset’s category table with both model columns, and inserts a horizontal bar chart comparing the two models per sheet. evidence • llm_behavior_eval/export_excel.py:145-189 dataset loop writes tables and adds bar charts per sheet • tests/test_export_excel.py:41-87 ensures the Excel file contains the requested dataset sheet and numeric values for both models

Note: Some optional integrations are missing, so it might not be possible to check some of the requirements.
For best results, make sure the following are integrated: Figma

Used resources:
Hash: 30befef | Ticket: link

To rerun the Spec Reviewer, comment "baz rerun spec review".

shmuelyo

lgtm

Address LLM-81 Excel export review feedback

30befef

benglewis added the codex label Mar 5, 2026 — with ChatGPT Codex Connector

benglewis changed the title ~~Add Excel export for summary comparisons and CLI command~~ LLM-81: Add Excel export for summary comparisons and CLI command Mar 5, 2026

benglewis self-assigned this Mar 5, 2026

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

Update uv.lock for excel extra

ec7f08b

shmuelyo approved these changes May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM-81: Add Excel export for summary comparisons and CLI command#111

LLM-81: Add Excel export for summary comparisons and CLI command#111
benglewis wants to merge 2 commits into
mainfrom
codex/2026-03-05/linear-mention-llm-81-please-add-the-ability-to-create-an

benglewis commented Mar 5, 2026 •

edited by baz-reviewer Bot

Loading

Uh oh!

This comment was marked as resolved.

This comment was marked as outdated.

baz-reviewer Bot commented Mar 5, 2026

Please add the ability to create an Excel spreadsheet (`.xlsx`) as an output

Uh oh!

shmuelyo left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

benglewis commented Mar 5, 2026 • edited by baz-reviewer Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Motivation

Description

Testing

Generated description

Uh oh!

This comment was marked as resolved.

This comment was marked as outdated.

baz-reviewer Bot commented Mar 5, 2026

Spec Reviewer Report 📬

2 / 3 requirements met for ticket:

Please add the ability to create an Excel spreadsheet (.xlsx) as an output

Uh oh!

shmuelyo left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

benglewis commented Mar 5, 2026 •

edited by baz-reviewer Bot

Loading

Please add the ability to create an Excel spreadsheet (`.xlsx`) as an output