Skip to content

AORTA-23 : Phase 6 : Added plots for comparison pipeline #103

Open
prosenjitdhole wants to merge 1 commit intoprosenj_cli_hq_eval_report_phase_5from
prosenj_cli_hq_eval_report_phase_6
Open

AORTA-23 : Phase 6 : Added plots for comparison pipeline #103
prosenjitdhole wants to merge 1 commit intoprosenj_cli_hq_eval_report_phase_5from
prosenj_cli_hq_eval_report_phase_6

Conversation

@prosenjitdhole
Copy link
Copy Markdown
Collaborator

No description provided.

@prosenjitdhole prosenjitdhole changed the title [WIP] AORTA-23 : Phase 6 : Added plots for comparison pipeline AORTA-23 : Phase 6 : Added plots for comparison pipeline Feb 18, 2026
@prosenjitdhole prosenjitdhole marked this pull request as ready for review February 18, 2026 09:34
@prosenjitdhole prosenjitdhole requested a review from Copilot March 4, 2026 16:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Phase 6 support for generating and wiring up “comparison mode” (Mode C) plots in the HWQueue report pipeline, enabling visual baseline-vs-test analysis across multiple workloads.

Changes:

  • Implemented Mode C plot generation (throughput comparison, delta summary, regression heatmap, latency delta).
  • Hooked comparison plot generation into the HWQueue comparison pipeline step.
  • Exported generate_comparison_plots from the generators package.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/aorta/report/pipelines/hwqueue_pipeline.py Calls the new comparison plot generator and records generated plot outputs in the pipeline result.
src/aorta/report/generators/hwqueue_plots.py Adds plot functions and an orchestrator for Mode C multi-workload baseline/test comparisons.
src/aorta/report/generators/__init__.py Re-exports generate_comparison_plots for public generator access.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

threshold=config.threshold,
verbose=config.verbose,
)
result.files_generated["plots"] = plot_files
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HWQueuePipelineResult.files_generated is annotated as Dict[str, Path], but here it is assigned a List[Path] (plot_files). This also means the CLI output (which only prints entries that are isinstance(path, Path)) will silently omit the generated plots. Consider updating the result type to accept lists (e.g., Path | list[Path]) and/or store a single plots_dir Path plus individual plot paths under separate keys so downstream code can display them consistently.

Suggested change
result.files_generated["plots"] = plot_files
# Record the directory containing plots and each individual plot file.
result.files_generated["plots_dir"] = plots_dir
for plot_file in plot_files:
result.files_generated[f"plot:{plot_file.name}"] = plot_file

Copilot uses AI. Check for mistakes.
Comment on lines +658 to +664
_, b_tp = baseline_data[wl].get_best_throughput()
_, t_tp = test_data[wl].get_best_throughput()
if b_tp > 0:
change_pct = (t_tp - b_tp) / b_tp * 100
else:
change_pct = 0
changes.append((wl, change_pct))
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When b_tp is 0 (or results are empty), this sets change_pct = 0, which can be misleading (e.g., baseline 0 → test >0 shows as 0% change). Consider using np.nan/None (and labeling as N/A), or handling the division-by-zero case explicitly (e.g., treat as +∞ / -100% depending on semantics) so the plot reflects missing/invalid baselines accurately.

Copilot uses AI. Check for mistakes.
for sc in sorted_streams:
b_val = b_by_streams.get(sc)
t_val = t_by_streams.get(sc)
if b_val and t_val and b_val > 0:
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if b_val and t_val and b_val > 0: check treats t_val == 0 as missing data and writes np.nan, which hides real regressions (e.g., throughput dropping to 0 should show as -100%). Use explicit is not None checks (and keep the b_val > 0 guard) so zero values are still plotted.

Suggested change
if b_val and t_val and b_val > 0:
if b_val is not None and t_val is not None and b_val > 0:

Copilot uses AI. Check for mistakes.
Comment on lines +756 to +760
sorted_streams = sorted(all_streams)

# Build heatmap data (% change)
heatmap_data = []
for wl in common_workloads:
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorted_streams can be empty if sweep JSONs contain no results. In that case heatmap_data becomes an array with shape (len(common_workloads), 0) and ax.imshow(...) will raise due to invalid image dimensions. Add an early guard (e.g., if no stream counts or heatmap_data.size == 0) to skip this plot with a clear message or return None.

Copilot uses AI. Check for mistakes.
Comment on lines +881 to +885
if b_p99 and t_p99 and b_p99 > 0:
# Note: For latency, lower is better, so positive change is regression
change_pct = (t_p99 - b_p99) / b_p99 * 100
else:
change_pct = 0
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if b_p99 and t_p99 and b_p99 > 0: condition treats valid zero values as missing (because 0 is falsy), which can hide large improvements/regressions (e.g., test P99 = 0 would currently fall back to change_pct = 0). Use is not None checks for b_p99/t_p99, plus the b_p99 > 0 guard, to compute percentage changes correctly.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants