AORTA-23 : Phase 8 : Added a summary in generated HTML by prosenjitdhole · Pull Request #105 · ROCm/aorta

prosenjitdhole · 2026-02-16T11:17:09Z

No description provided.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T09:58:07Z

+    # Check latency trend (P99 at peak vs P99 at lowest stream count)
+    sorted_results = sorted(data.results, key=lambda r: r.stream_count)
+    if sorted_results:
+        p99_at_1 = sorted_results[0].latency.p99
+        p99_at_peak = data.results[peak_idx].latency.p99 if peak_idx < len(data.results) else p99_at_1


The comment says latency_ratio is "P99 at peak / P99 at 1 stream", but the code uses the lowest available stream count (sorted_results[0]) rather than explicitly using stream_count==1. Either update the wording/variable naming to reflect "lowest stream count" or explicitly select the 1-stream result when present to avoid misleading summaries.

Copilot · 2026-03-06T09:58:07Z

+    return f"""
+<div class="verdict-box verdict-{summary.verdict_class}">
+    <h2>📊 Comparison Summary: {baseline_label} → {test_label}</h2>
+    <div class="verdict-headline {summary.verdict_class}">{summary.verdict}</div>
+    {stats_html}
+    {summary_table}
+</div>


The new verdict HTML interpolates baseline_label, test_label, and workload names (summary.top_improvement[0] / summary.top_regression[0]) directly into HTML. If these values can contain <, &, quotes, etc. (e.g., derived from filenames or user-provided labels), the generated report becomes vulnerable to HTML injection. Escape dynamic text with html.escape(...) (or a centralized escaping helper) before embedding it into the HTML string.

@copilot open a new pull request to apply changes based on this feedback

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T10:49:57Z

+    def __post_init__(self) -> None:
+        """Normalize status fields based on verdicts if not explicitly set."""
+        valid_statuses = {"good", "warning", "poor"}
+
+        if self.scaling_status not in valid_statuses:


SweepSummary.__post_init__() is intended to infer scaling_status/latency_status from the verdict text when they aren't explicitly provided, but the defaults are set to "good" (which is a valid status). As a result the inference branches never run and get_overall_status() will always return "good" unless a caller manually overrides the status fields. Consider defaulting these fields to None/"" (or always inferring from verdicts and only overriding when an explicit non-empty status is passed).

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-03-06T10:49:58Z

+    peak_idx = throughputs.index(max(throughputs))
+    peak_throughput = throughputs[peak_idx]
+    peak_streams = stream_counts[peak_idx]
+    peak_efficiency = efficiencies[peak_idx] if peak_idx < len(efficiencies) else 0


peak_idx is derived from the throughputs list (which may come from data.analysis), but later logic uses that index against stream_counts and data.results (e.g., to derive peak_streams and p99_at_peak). If the analysis arrays are missing/misaligned or ordered differently than data.results, this can raise IndexError or compute the wrong peak latency. Safer approach: derive peak_stream_count from stream_counts[peak_idx] (after validating lengths) and then fetch the corresponding run via data.get_result_by_stream_count(...) (or compute peak from data.results directly).

Copilot · 2026-03-06T10:49:58Z

+        throughputs = [r.throughput for r in data.results]
+        stream_counts = [r.stream_count for r in data.results]
+        # Compute efficiency (actual / ideal where ideal = single_stream * n)
+        base_throughput = throughputs[0] if throughputs else 1
+        efficiencies = [
+            t / (base_throughput * s) if base_throughput > 0 and s > 0 else 0
+            for t, s in zip(throughputs, stream_counts)
+        ]


When computing efficiencies from data.results, base_throughput = throughputs[0] assumes the results are already ordered by increasing stream count and that the first entry represents the baseline. SweepData.from_dict() does not sort results, so efficiency (and downstream optimal-stream selection) can be incorrect if JSON ordering changes. Consider sorting by stream_count (or explicitly selecting the lowest stream count) before choosing the baseline throughput.

Copilot · 2026-03-06T10:49:58Z

+    optimal_idx = 0
+    for i, (eff, tput, sc) in enumerate(zip(efficiencies, throughputs, stream_counts)):
+        if eff >= 0.70 and tput >= 0.80 * peak_throughput:
+            optimal_idx = i
+    optimal_streams = stream_counts[optimal_idx] if stream_counts else peak_streams
+


optimal_idx is updated based on iteration order over zip(efficiencies, throughputs, stream_counts). If those arrays are not sorted by stream count, the chosen optimal_streams may not be the highest stream count meeting the criteria (as the comment states). Consider selecting candidates that meet the thresholds and then choosing max(stream_count) (or explicitly iterating in sorted stream-count order).

Suggested change

optimal_idx = 0

for i, (eff, tput, sc) in enumerate(zip(efficiencies, throughputs, stream_counts)):

if eff >= 0.70 and tput >= 0.80 * peak_throughput:

optimal_idx = i

optimal_streams = stream_counts[optimal_idx] if stream_counts else peak_streams

optimal_idx: Optional[int] = None

optimal_streams = peak_streams

for i, (eff, tput, sc) in enumerate(zip(efficiencies, throughputs, stream_counts)):

if eff >= 0.70 and tput >= 0.80 * peak_throughput:

# Choose the qualifying configuration with the highest stream count,

# independent of the iteration order of the input arrays.

if optimal_idx is None or sc > optimal_streams:

optimal_idx = i

optimal_streams = sc

# If no configuration met the thresholds, fall back to peak_streams

if optimal_idx is None:

optimal_streams = peak_streams

Copilot · 2026-03-06T10:49:58Z

+    num_improved = len(workloads_with_improvement - workloads_with_regression)
+    num_degraded = len(workloads_with_regression - workloads_with_improvement)
+    # Workloads with both are counted as mixed - we'll count them as degraded for simplicity
+    num_mixed = len(workloads_with_regression & workloads_with_improvement)
+    num_degraded += num_mixed
+    num_unchanged = total - num_improved - num_degraded


Mixed workloads (those appearing in both regressions and improvements) are currently added into num_degraded, which makes the displayed counts (and the overall verdict thresholds) misleading: a workload can have both latency regressions and throughput improvements, but it will be reported as "degraded" only. Consider tracking num_mixed separately (and optionally exposing it in ComparisonSummary/HTML) so the summary matches the underlying data.

Suggested change

num_improved = len(workloads_with_improvement - workloads_with_regression)

num_degraded = len(workloads_with_regression - workloads_with_improvement)

# Workloads with both are counted as mixed - we'll count them as degraded for simplicity

num_mixed = len(workloads_with_regression & workloads_with_improvement)

num_degraded += num_mixed

num_unchanged = total - num_improved - num_degraded

# Disjoint categories:

# - improved: workloads with improvements but no regressions

# - degraded: workloads with regressions but no improvements

# - mixed: workloads that appear in both sets

num_improved = len(workloads_with_improvement - workloads_with_regression)

num_degraded = len(workloads_with_regression - workloads_with_improvement)

num_mixed = len(workloads_with_regression & workloads_with_improvement)

num_unchanged = total - num_improved - num_degraded - num_mixed

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot · 2026-03-06T10:54:37Z

@prosenjitdhole I've opened a new pull request, #121, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T10:57:19Z

+    # Determine scaling verdict
+    if peak_efficiency >= 0.80:
+        scaling_verdict = ("✓ Excellent", "Near-linear scaling")
+    elif peak_efficiency >= 0.60:
+        scaling_verdict = ("✓ Good", "Good scaling with some overhead")


The worst scaling verdict text is labeled "⚠ Poor", but the status-mapping logic treats "⚠" as a warning (and "✗" as poor). If you want this case to render as a poor/failed status, the verdict marker should align with the mapping (e.g., use "✗"), or set scaling_status explicitly for this branch.

Copilot · 2026-03-06T10:57:20Z

+    return f"""
+<div class="verdict-box verdict-{status_class}">
+    <h2>📊 Analysis Summary</h2>
+    <table class="verdict-table">
+        <tr>
+            <td>Peak Performance</td>
+            <td class="good">✓ {summary.peak_throughput:.0f} {summary.throughput_unit} at {summary.peak_streams} streams</td>


The Peak Performance row is always rendered with a green "good" class and a "✓" marker. When generate_sweep_summary returns an "Unknown/No data" summary (e.g., empty results), this will still display as a success ("✓ 0 ops/sec at 0 streams"). Consider deriving this row’s marker/class from the summary status (or suppressing the verdict box when peak_streams == 0).

Suggested change

return f"""

<div class="verdict-box verdict-{status_class}">

<h2>📊 Analysis Summary</h2>

<table class="verdict-table">

<tr>

<td>Peak Performance</td>

<td class="good">✓ {summary.peak_throughput:.0f} {summary.throughput_unit} at {summary.peak_streams} streams</td>

# Determine marker and CSS class for the Peak Performance row.

# This avoids showing a green "✓" when there is no data or a non-good status.

if summary.peak_streams == 0:

peak_marker = "⚠"

peak_class = "warning"

else:

if status_class == "good":

peak_marker = "✓"

peak_class = "good"

elif status_class == "warning":

peak_marker = "⚠"

peak_class = "warning"

else:

peak_marker = "✗"

peak_class = "poor"

return f"""

<div class="verdict-box verdict-{status_class}">

<h2>📊 Analysis Summary</h2>

<table class="verdict-table">

<tr>

<td>Peak Performance</td>

<td class="{peak_class}">{peak_marker} {summary.peak_throughput:.0f} {summary.throughput_unit} at {summary.peak_streams} streams</td>

Copilot · 2026-03-06T10:57:20Z

+    # Normalized status fields used for logic/CSS: "good", "warning", "poor"
+    scaling_status: str = "good"
+    latency_status: str = "good"
+
+    def __post_init__(self) -> None:
+        """Normalize status fields based on verdicts if not explicitly set."""
+        valid_statuses = {"good", "warning", "poor"}
+
+        if self.scaling_status not in valid_statuses:
+            self.scaling_status = self._infer_status_from_verdict_text(
+                self.scaling_verdict[0]


SweepSummary.__post_init__ never infers scaling_status/latency_status because both fields default to the valid value "good". As a result get_overall_status() will always return "good" and the verdict box will always render as green even when verdicts indicate warnings/regressions. Consider defaulting these fields to None/"auto" (or field(init=False)) and inferring unconditionally unless explicitly provided.

Copilot · 2026-03-06T10:57:21Z

+        # Compute efficiency (actual / ideal where ideal uses normalized single-stream throughput)
+        base_throughput = throughputs[0] if throughputs else 1
+        baseline_streams = stream_counts[0] if stream_counts else 1
+        if base_throughput > 0 and baseline_streams > 0:
+            single_stream_throughput = base_throughput / baseline_streams
+        else:
+            single_stream_throughput = 0
+        efficiencies = [
+            (t / (single_stream_throughput * s)) if single_stream_throughput > 0 and s > 0 else 0
+            for t, s in zip(throughputs, stream_counts)
+        ]


Efficiency computation uses throughputs[0]/stream_counts[0] as the baseline, but data.results order isn't guaranteed (and may not start at the lowest stream count / 1 stream). This can produce incorrect efficiencies and therefore incorrect scaling/optimal-streams verdicts. Consider selecting the baseline run by the minimum stream count (prefer stream_count==1 if present) and using that throughput to derive single_stream_throughput.

Copilot · 2026-03-06T10:57:33Z

@prosenjitdhole I've opened a new pull request, #122, to work on those changes. Once the pull request is ready, I'll request review from you.

AORTA-23 : Phase 8 : Added a summary report in the generated HTML

2f6d9b0

prosenjitdhole changed the base branch from main to prosenj_cli_hq_eval_report_phase_7 February 16, 2026 11:17

prosenjitdhole changed the title ~~[WIP] AORTA-23 : Phase 8 : Added a summary in generated HTML~~ AORTA-23 : Phase 8 : Added a summary in generated HTML Feb 18, 2026

prosenjitdhole requested a review from amd-vivekag February 18, 2026 14:18

prosenjitdhole marked this pull request as ready for review February 18, 2026 14:18

amd-vivekag requested a review from Copilot February 19, 2026 09:55

Copilot started reviewing on behalf of amd-vivekag February 19, 2026 09:56 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

prosenjitdhole requested a review from Copilot March 6, 2026 09:52

Copilot started reviewing on behalf of prosenjitdhole March 6, 2026 09:54 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Update src/aorta/report/pipelines/hwqueue_pipeline.py

97fd0c7

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 6, 2026 10:45

Copilot started reviewing on behalf of prosenjitdhole March 6, 2026 10:46 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Update src/aorta/report/pipelines/hwqueue_pipeline.py

e602f26

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 6, 2026 10:52

Copilot started reviewing on behalf of prosenjitdhole March 6, 2026 10:53 View session

Update src/aorta/report/pipelines/hwqueue_pipeline.py

aecf4e8

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI mentioned this pull request Mar 6, 2026

Fix HTML injection in hwqueue comparison report generator #121

Draft

Update src/aorta/report/pipelines/hwqueue_pipeline.py

8ea3b20

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Copilot AI mentioned this pull request Mar 6, 2026

Fix SweepSummary status inference never running due to "good" default #122

Draft

-    optimal_idx = 0
-    for i, (eff, tput, sc) in enumerate(zip(efficiencies, throughputs, stream_counts)):
-        if eff >= 0.70 and tput >= 0.80 * peak_throughput:
-            optimal_idx = i
-    optimal_streams = stream_counts[optimal_idx] if stream_counts else peak_streams
+    optimal_idx: Optional[int] = None
+    optimal_streams = peak_streams
+    for i, (eff, tput, sc) in enumerate(zip(efficiencies, throughputs, stream_counts)):
+        if eff >= 0.70 and tput >= 0.80 * peak_throughput:
+            # Choose the qualifying configuration with the highest stream count,
+            # independent of the iteration order of the input arrays.
+            if optimal_idx is None or sc > optimal_streams:
+                optimal_idx = i
+                optimal_streams = sc
+    # If no configuration met the thresholds, fall back to peak_streams
+    if optimal_idx is None:
+        optimal_streams = peak_streams

-    return f"""
-<div class="verdict-box verdict-{status_class}">
-    <h2>📊 Analysis Summary</h2>
-    <table class="verdict-table">
-        <tr>
-            <td>Peak Performance</td>
-            <td class="good">✓ {summary.peak_throughput:.0f} {summary.throughput_unit} at {summary.peak_streams} streams</td>
+    # Determine marker and CSS class for the Peak Performance row.
+    # This avoids showing a green "✓" when there is no data or a non-good status.
+    if summary.peak_streams == 0:
+        peak_marker = "⚠"
+        peak_class = "warning"
+    else:
+        if status_class == "good":
+            peak_marker = "✓"
+            peak_class = "good"
+        elif status_class == "warning":
+            peak_marker = "⚠"
+            peak_class = "warning"
+        else:
+            peak_marker = "✗"
+            peak_class = "poor"
+    return f"""
+<div class="verdict-box verdict-{status_class}">
+    <h2>📊 Analysis Summary</h2>
+    <table class="verdict-table">
+        <tr>
+            <td>Peak Performance</td>
+            <td class="{peak_class}">{peak_marker} {summary.peak_throughput:.0f} {summary.throughput_unit} at {summary.peak_streams} streams</td>

Conversation

prosenjitdhole commented Feb 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

prosenjitdhole Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

prosenjitdhole Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Mar 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants