Skip to content

LLM-81: Add Excel export for summary comparisons and CLI command#111

Draft
benglewis wants to merge 2 commits into
mainfrom
codex/2026-03-05/linear-mention-llm-81-please-add-the-ability-to-create-an
Draft

LLM-81: Add Excel export for summary comparisons and CLI command#111
benglewis wants to merge 2 commits into
mainfrom
codex/2026-03-05/linear-mention-llm-81-please-add-the-ability-to-create-an

Conversation

@benglewis
Copy link
Copy Markdown
Contributor

@benglewis benglewis commented Mar 5, 2026

User description

Motivation

  • Provide a convenient way to export per-dataset comparison workbooks (one sheet per dataset) from summary CSVs for analysis and reporting.
  • Make metric column names consistent across the codebase to avoid string duplication and reduce chance of typos.

Description

  • Added an export_excel utility (llm_behavior_eval/export_excel.py) that merges two summary_brief.csv files, normalizes metrics, writes an .xlsx workbook with one sheet per dataset, and inserts comparison charts; it uses xlsxwriter and validates input overlap and numeric values.
  • Introduced metric name constants in llm_behavior_eval/evaluation_utils/metrics.py and updated BaseEvaluator to use these constants instead of hardcoded column header strings.
  • Exposed the Excel export as a CLI command export-excel in llm_behavior_eval/evaluate.py via the export_excel_command wrapper and registered it on the typer app.
  • Documented the feature in README.md including usage example and install note to enable the optional excel dependency.
  • Added an optional dependency group excel with xlsxwriter>=3.2.0 and included excel in the dev dependency group in pyproject.toml.
  • Added tests: a new tests/test_export_excel.py suite validating sheet sanitization, file output, and overlap checking, and a CLI test in tests/test_evaluate_cli.py that checks the new command forwards options correctly.

Testing

  • Ran the test suite with pytest including the new tests/test_export_excel.py and modified tests/test_evaluate_cli.py tests, and they passed.
  • Verified the new CLI unit test test_export_excel_command_passes_new_option_names passed and correctly exercises the export_excel_command argument mapping.

Codex Task


Generated description

Below is a concise technical summary of the changes proposed in this PR:
Add Excel export as a CLI workflow via the new export_excel helper and register the export-excel command so teams can generate comparison workbooks directly from the summary_brief.csv outputs, complete with optional labels and dataset filtering along with docs and dependency updates to install xlsxwriter. Introduce metric column constants in evaluation_utils.metrics and consume them in BaseEvaluator so CSV generation stays consistent while also powering the Excel exporter.

TopicDetails
Excel export flow Implement Excel export workbook generation, CLI wiring, docs, dependency extras, and tests to let users compare pretrained/unlearned summaries via export_excel and export-excel commands with customizable labels and sheet selection.
Modified files (7)
  • README.md
  • llm_behavior_eval/evaluate.py
  • llm_behavior_eval/export_excel.py
  • pyproject.toml
  • tests/test_evaluate_cli.py
  • tests/test_export_excel.py
  • uv.lock
Latest Contributors(2)
UserCommitDate
orr@hirundo.ioLLM-75-Fix-Result-Dire...March 01, 2026
mishana4life@gmail.comLLM-64-Change-results-...February 16, 2026
Summary metrics Standardize summary metric headers by using evaluation_utils.metrics constants inside BaseEvaluator so generated CSVs remain aligned with the export tooling.
Modified files (2)
  • llm_behavior_eval/evaluation_utils/base_evaluator.py
  • llm_behavior_eval/evaluation_utils/metrics.py
Latest Contributors(2)
UserCommitDate
orr@hirundo.ioLLM-76-Add-option-to-r...March 02, 2026
blewis@hirundo.ioLLM-68-Remove-empty-co...February 19, 2026
This pull request is reviewed by Baz. Review like a pro on (Baz).

@benglewis benglewis changed the title Add Excel export for summary comparisons and CLI command LLM-81: Add Excel export for summary comparisons and CLI command Mar 5, 2026
@benglewis benglewis self-assigned this Mar 5, 2026
@benglewis

This comment was marked as resolved.

@chatgpt-codex-connector

This comment was marked as outdated.

@baz-reviewer
Copy link
Copy Markdown
Contributor

baz-reviewer Bot commented Mar 5, 2026

Spec Reviewer Report    📬

Checkout in Baz

2 / 3 requirements met for ticket:

Please add the ability to create an Excel spreadsheet (.xlsx) as an output


1 unmet requirement
# Requirement Explanation
1 Dev group installs Excel optional extra - Not Met pyproject.toml was not modified in the diff to declare an 'excel' optional extra or to include it in the dev dependency group; therefore packaging metadata changes required by this requirement are missing.
evidencepyproject.toml:50-53 optional 'excel' extra with xlsxwriter pyproject.toml:64-72 dev dependency group installs llm-behavior-eval[excel,…] attempted:true
2 met requirements
# Requirement Explanation
1 Export Excel comparison subcommand New CLI command forwards dataset filters and model names to export logic that merges summary CSVs, produces one sheet per dataset, and adds a comparison bar chart for the two models.
evidencellm_behavior_eval/evaluate.py:584-644 new export-excel CLI command llm_behavior_eval/export_excel.py:117-189 merges data, filters datasets, writes sheets, adds chart tests/test_evaluate_cli.py:463-492 confirms CLI options forward to export function
2 Per-dataset Excel sheet with comparison chart The new export path groups merged metrics by dataset, writes each dataset’s category table with both model columns, and inserts a horizontal bar chart comparing the two models per sheet.
evidence• llm_behavior_eval/export_excel.py:145-189 dataset loop writes tables and adds bar charts per sheet • tests/test_export_excel.py:41-87 ensures the Excel file contains the requested dataset sheet and numeric values for both models

Note: Some optional integrations are missing, so it might not be possible to check some of the requirements.
For best results, make sure the following are integrated: Figma



Used resources:
Hash: 30befef | Ticket: link

To rerun the Spec Reviewer, comment "baz rerun spec review".

Copy link
Copy Markdown
Contributor

@shmuelyo shmuelyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants