feat: add web dashboard by rioloc · Pull Request #172 · lightspeed-core/lightspeed-evaluation

rioloc · 2026-02-25T22:43:13Z

Description

lightspeed-eval dashboard

First draft of lightspeed-eval dashboard to visualize lightspeed-eval reports locally.

A browser-based dashboard for visualizing, managing, and running LightSpeed evaluation pipelines. Built with React 19 and Vite, it requires no separate backend — all API routes are served directly by the Vite dev server middleware.

More instructions on how to run it in dashboard/README.md

Type of change

Preview

recording.21.webm

Summary by CodeRabbit

New Features
- Full web evaluation dashboard: interactive charts (trends, percentiles, stacked/metric/exec-time), pageable results table, drill‑down detail modals, graph previews, comparison and PDF export, and a Run page for starting/stopping evaluations with live logs.
- Theming, improved chart legends/interactions, cascading filters, and accessible UI components.
Documentation
- Added dashboard quickstart and developer guide with setup, env overrides, and usage.
Chores
- Added dashboard ignore patterns and Makefile targets for install/dev.

Assisted-by: Claude Code

coderabbitai · 2026-02-25T22:43:18Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds a self-contained React + Vite evaluation dashboard: UI components (charts, tables, modals, theming), client hooks for data/filters/metadata, a Vite plugin exposing /api endpoints (file APIs, run orchestration, SSE streaming), packaging/dev tooling, and supporting docs and configs.

Changes

Cohort / File(s)	Summary
Project config & tooling `/.gitignore`, `dashboard/Makefile`, `dashboard/package.json`, `dashboard/vite.config.js`	New ignore patterns; Makefile with env defaults and dev/install targets; package.json with deps and scripts; Vite config adds an evalDataPlugin implementing many /api endpoints, filesystem/result routes, subprocess run orchestration (spawn/stop), SSE streaming, and deletion APIs.
Docs & entry `dashboard/README.md`, `dashboard/index.html`, `dashboard/src/main.jsx`, `dashboard/eslint.config.js`	New README with quickstart and data conventions; HTML scaffold and React entry wiring ThemeProvider; ESLint config for Vite/React.
Styling & theme `dashboard/src/index.css`, `dashboard/src/App.css`	Global theme variables, dark/light themes, and comprehensive UI styling for dashboard components and responsive layouts.
App shell & core hooks `dashboard/src/App.jsx`, `dashboard/src/hooks/useEvalData.js`, `dashboard/src/hooks/useFilters.js`, `dashboard/src/hooks/useMetricMetadata.js`, `dashboard/src/hooks/useTheme.jsx`	Main App with tabs, data loading, modals, run gating; hooks to load/parse CSV & YAML, cascade filters, metric metadata parsing, and persistent theme/chart helpers.
Chart utilities `dashboard/src/components/ChartSetup.js`	Chart.js integration: colors, zoom options, CHART_DEFAULTS, HTML legend plugin factory, solo-legend behavior and formatting helpers used by charts.
Charts & stats `dashboard/src/components/...` `AvgScoreTrendChart.jsx`, `ScoreTrendChart.jsx`, `PercentilesChart.jsx`, `OverallAvgChart.jsx`, `ExecTimeTrendChart.jsx`, `MetricBarChart.jsx`, `StackedBarChart.jsx`, `ResultsPieChart.jsx`, `StatsCards.jsx`	New chart components and stats cards computing aggregates, with theming, legends, zoom controls, and click interactions for drilldowns.
Data UI components `dashboard/src/components/ResultsTable.jsx`, `dashboard/src/components/FilterBar.jsx`, `dashboard/src/components/DetailModal.jsx`, `dashboard/src/components/CollapsiblePanel.jsx`	Results table with filtering/sorting/pagination/expansion; FilterBar with cascading selects; DetailModal for per-date details (Markdown support); CollapsiblePanel UI element.
Evaluations & run UI `dashboard/src/components/Evaluations.jsx`, `dashboard/src/components/RunPage.jsx`	Evaluations viewer with CSV/YAML/graph handling, compare/export and deletion workflows; RunPage provides run-config, start/stop actions, polling, SSE log streaming, and terminal-like UI.
IDE & examples `dashboard/.vscode/settings.json`, `dashboard/eval.yaml.old`, `dashboard/system.yaml.old`	VSCode setting and example eval/system YAML files added for demonstration/test content.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant App as React App
    participant API as Vite Plugin (/api)
    participant FS as File System

    User->>App: Open dashboard
    App->>API: GET /api/manifest
    API->>FS: List reports/eval files
    FS-->>API: file list
    API-->>App: manifest
    App->>API: GET /api/eval-summaries
    API->>FS: Read summary JSONs
    FS-->>API: summaries
    API-->>App: model map
    App->>API: GET /api/eval-data-content or /api/eval-data
    API->>FS: Read eval YAML / CSVs
    FS-->>API: contents
    API-->>App: data -> App builds entries and renders charts/tables

sequenceDiagram
    participant User
    participant App as React App
    participant API as Vite Plugin (/api)
    participant Proc as Child Process
    participant FS as File System

    User->>App: POST /api/run-eval (tag, config)
    App->>API: request run
    API->>Proc: spawn lightspeed-eval subprocess
    Proc-->>API: stdout/stderr events
    API-->>App: run created (runId)
    App->>API: GET /api/running-evals
    API-->>App: running list
    User->>App: GET /api/eval-stream/{runId} (SSE)
    App->>API: open SSE
    API->>App: SSE output events (streamed logs)
    User->>App: POST /api/stop-eval/{runId}
    App->>API: stop request
    API->>Proc: terminate process
    Proc-->>API: exited
    API->>FS: persist results/assets
    API-->>App: run stopped/completed

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 3.77% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: add web dashboard' clearly and concisely summarizes the main change—adding a new web dashboard feature to the project.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can use your project's `ruff` configuration to improve the quality of Python code reviews.

Add a Ruff configuration file to your project to customize how CodeRabbit runs ruff.

asamal4 · 2026-02-26T01:10:16Z

Thank you @rioloc for adding this..
Just for everyone's awareness - Goal is to add this ASAP with minimal changes, we can improve this incrementally.

coderabbitai

Actionable comments posted: 20

🧹 Nitpick comments (11)

dashboard/src/hooks/useEvalData.js (1)

61-69: Consider parallel CSV loading for large report sets.

Fetching each CSV sequentially will make reload time grow linearly with file count; this is a good candidate for batched/parallel fetch+parse.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/hooks/useEvalData.js` around lines 61 - 69, The loop in
useEvalData.js fetches CSVs sequentially (for const file of files ->
fetchAndParseCsv(file)), causing slow reloads for many reports; change it to
fetch and parse files in parallel (e.g., map files to promises and await
Promise.all or process in controlled batches) while preserving existing logic
that skips files without parseDateFromFilename(file) or uses summaries[ts].model
to populate newModelMap; ensure you still assign newModelMap[date.toISOString()]
for matching ts keys and then iterate the combined parsed rows per file (e.g.,
keep association of parsed rows with their file/date) so downstream code that
currently consumes rows remains correct.

dashboard/vite.config.js (1)

198-199: Avoid execSync in a polled request handler.

/api/running-evals is polled every 3s from the UI; using execSync here blocks the event loop and can stall other API calls.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/vite.config.js` around lines 198 - 199, The handler for
/api/running-evals uses synchronous execSync which blocks the event loop;
replace execSync with a non-blocking child process call (e.g.,
util.promisify(require('child_process').exec) or child_process.spawn) and make
the /api/running-evals request handler async so it awaits the promise, captures
stdout/stderr, and returns the trimmed result; update any references to execSync
in vite.config.js to use the async exec/promise approach and handle errors
without blocking the event loop.

dashboard/src/components/ChartSetup.js (1)

199-209: CHART_DEFAULTS uses hardcoded dark theme colors.

The defaults are set to dark theme colors, but the actual chart components use useChartTheme() for dynamic theming. If CHART_DEFAULTS is intended as a fallback, it won't adapt to light theme.

If this export is unused, consider removing it. Otherwise, document that it's only for dark theme or make it theme-agnostic.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/components/ChartSetup.js` around lines 199 - 209,
CHART_DEFAULTS currently hardcodes dark-theme colors and therefore doesn't adapt
when components use useChartTheme(); either remove CHART_DEFAULTS if it's
unused, or convert it into a theme-aware export (e.g., export a function like
getChartDefaults(themeOrColors) or accept values from useChartTheme()) so
defaults are computed from the dynamic theme, and update any consumers to call
that function (reference CHART_DEFAULTS and useChartTheme() in the change).

dashboard/src/components/AvgScoreTrendChart.jsx (1)

102-104: Unused soloLegendClick configuration.

The legend has display: false (line 86) so the built-in legend isn't shown. The onClick: soloLegendClick handler (line 103) will never be called since the HTML legend plugin handles clicks instead.

🧹 Clean up unused config

           legend: {
             display: false,
             labels: {
               // ... label config
             },
-            onClick: soloLegendClick
           },

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/components/AvgScoreTrendChart.jsx` around lines 102 - 104, The
legend block in AvgScoreTrendChart.jsx includes an unused onClick:
soloLegendClick handler while the legend is set to display: false; remove the
onClick: soloLegendClick entry from the chart options' legend configuration (or,
if you intended the built-in legend to be used, set legend.display to true and
ensure soloLegendClick is wired correctly) so the unused soloLegendClick handler
is not left in the options.

dashboard/src/App.jsx (3)

160-176: Silent error handling may hide configuration issues.

Both fetch calls silently swallow errors with empty catch blocks. Consider logging errors or showing user feedback when configuration fails to load, especially for the system-config which affects UI features.

♻️ Proposed improvement

   useEffect(() => {
     fetch('/api/system-config').then(r => r.json())
       .then(data => setSystemConfig(data))
-      .catch(() => {})
+      .catch(err => console.warn('Failed to load system config:', err))
   }, [])

   const openEvalData = async () => {
     setEvalDataLoading(true)
     try {
       const res = await fetch('/api/eval-data-content')
+      if (!res.ok) throw new Error(`HTTP ${res.status}`)
       const data = await res.json()
       setEvalDataConfig({ path: data.path, content: data.content })
-    } catch {
+    } catch (err) {
+      console.warn('Failed to load eval data:', err)
       setEvalDataConfig(null)
     }
     setEvalDataLoading(false)
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/App.jsx` around lines 160 - 176, The fetch calls in the
useEffect (loading '/api/system-config') and in openEvalData silently swallow
errors; update both to handle failures by logging the error (e.g., using
console.error or an existing logger) and updating state so the UI can show
feedback (for example setSystemConfig to null/error state or set an explicit
error flag and setEvalDataConfig to null with an error state), and ensure
evalDataLoading is cleared in a finally-equivalent path; reference the useEffect
fetch block and the openEvalData function to implement these changes.

46-81: Consider extracting a shared YAML modal component to reduce duplication.

SystemConfigModal and EvalDataModal are nearly identical, differing only in the title. This duplication could be consolidated into a reusable component.

♻️ Proposed refactor

function YamlModal({ title, content, onClose }) {
  const { theme } = useTheme()
  return (
    <div className="modal-overlay" onClick={onClose}>
      <div className="modal-content amended-modal" onClick={e => e.stopPropagation()}>
        <div className="modal-header">
          <div><h2>{title}</h2></div>
          <button className="modal-close" onClick={onClose} aria-label="Close">
            {/* close icon */}
          </button>
        </div>
        <div className="modal-body">
          <SyntaxHighlighter
            language="yaml"
            style={theme === 'dark' ? atomOneDark : atomOneLight}
            customStyle={{ margin: 0, borderRadius: '6px', fontSize: '13px', lineHeight: '1.6' }}
            showLineNumbers
          >
            {content}
          </SyntaxHighlighter>
        </div>
      </div>
    </div>
  )
}

Then use as:

<YamlModal title="System Configuration" content={systemConfig.content} onClose={() => setShowSystemConfig(false)} />

Also applies to: 113-148

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/App.jsx` around lines 46 - 81, SystemConfigModal and
EvalDataModal duplicate the same YAML-rendering modal UI; extract a reusable
YamlModal component that accepts props (title, content, onClose) and internally
uses useTheme and SyntaxHighlighter with the same style selection (atomOneDark /
atomOneLight) and customStyle, then replace usages of SystemConfigModal and
EvalDataModal with YamlModal (e.g., <YamlModal title="System Configuration"
content={config.content} onClose={onClose} />) ensuring the overlay click and
stopPropagation behavior and modal-close button handlers are preserved.

334-366: Consider adding keyboard support for modal dismissal.

The modals support click-outside-to-close but don't handle the Escape key, which is a common accessibility expectation. Consider adding an onKeyDown handler to close modals on Escape.

♻️ Example implementation

Add to modal components:

useEffect(() => {
  const handleEscape = (e) => {
    if (e.key === 'Escape') onClose()
  }
  document.addEventListener('keydown', handleEscape)
  return () => document.removeEventListener('keydown', handleEscape)
}, [onClose])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/App.jsx` around lines 334 - 366, Add Escape-key dismissal to
the modal components by registering a keydown listener that calls the provided
onClose when e.key === 'Escape' and cleaning it up on unmount; implement this in
the modal components referenced here (DetailModal, SystemConfigModal,
EvalDataModal) using a useEffect that adds the document keydown handler and
removes it in the cleanup, with the dependency on onClose.

dashboard/src/components/Evaluations.jsx (3)

1042-1048: Expandable rows lack keyboard accessibility.

Rows with onClick handlers for expansion are not keyboard-accessible. Users should be able to expand/collapse rows using Enter or Space.

♿ Accessibility improvement

                     <tr key={rowKey}
                       className={`expandable-row${isRowExpanded ? ' expanded' : ''}`}
                       onClick={() => { /* toggle */ }}
+                      tabIndex={0}
+                      onKeyDown={(e) => { if (e.key === 'Enter' || e.key === ' ') { e.preventDefault(); /* toggle */ } }}
+                      role="row"
+                      aria-expanded={isRowExpanded}
                     >

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/components/Evaluations.jsx` around lines 1042 - 1048, The
expandable table row (the <tr> with className `expandable-row` that uses
`rowKey`, `isRowExpanded` and the `setExpandedRows` updater) needs keyboard
support: make the row focusable (add tabIndex=0), expose its state (add
aria-expanded={isRowExpanded}), and add an onKeyDown handler that toggles
expansion when Enter or Space is pressed by calling the same `setExpandedRows`
logic used in the onClick; keep the existing onClick intact and ensure the key
handler prevents default for Space so it doesn't scroll.

553-553: Empty catch block silently ignores comparison errors.

The handleCompare function catches errors with catch { /* ignore */ }, which could hide fetch failures or parsing issues from users.

🛡️ Proposed improvement

-    } catch { /* ignore */ }
+    } catch (err) {
+      console.error('Failed to compare evaluations:', err)
+      // Optionally: set an error state to show user feedback
+    }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/components/Evaluations.jsx` at line 553, The empty catch in the
handleCompare function swallows errors; replace it with a catch(error) that logs
the error (e.g. console.error or processLogger.error) and surfaces a user-facing
failure state (e.g. setCompareError or call the existing toast/notification
helper) and ensure any in-progress UI state (like isComparing) is cleared so the
UI doesn't hang; update the catch block inside handleCompare to accept the
error, log it with context ("handleCompare failed") and set the component
error/notification state.

468-488: Fetching all CSV files on load may cause performance issues.

The effect fetches every CSV file in files to compute stats (pass count, conversations). With many evaluation files, this could be slow and bandwidth-intensive.

Consider:

Computing stats server-side and including in the manifest
Lazy-loading stats only for visible files
Caching stats in localStorage

💡 Alternative: Server-side stats in manifest

Have the server include stats in /api/manifest:

{
  "files": [
    { "name": "evaluation_xxx.csv", "passed": 45, "total": 50, "conversations": ["conv1"] }
  ]
}

This avoids N+1 fetches on the client.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/components/Evaluations.jsx` around lines 468 - 488, The current
useEffect fetches every file in files via fetch(`/results/${file}`) and
Papa.parse which causes N+1 network load; change it to lazy-load and cache
per-file stats: create a helper getFileStats(file) that first checks
localStorage (key like `evalStats:${file}`) and returns cached stats if present,
otherwise fetches `/results/${file}`, uses Papa.parse to compute { passed,
total, conversations } and writes that to localStorage before returning; then
update the effect that currently uses Promise.all(files.map(...)) to only call
Promise.all on the visible subset (derive visibleFiles from the component
state/props) and merge results into the stats object before calling
setFileStats(stats); keep existing symbols useEffect, files, setFileStats,
Papa.parse and fetch(`/results/${file}`) so reviewers can find the changes.

dashboard/src/components/ResultsTable.jsx (1)

172-182: Sortable table headers lack keyboard accessibility.

The <th> elements with onClick are not keyboard-accessible. Consider adding tabIndex={0} and onKeyDown handlers for Enter/Space, or use <button> elements inside headers.

♿ Accessibility improvement

                 <th
                   key={col.key}
                   className="sortable-th"
                   onClick={() => handleSort(col.key)}
+                  tabIndex={0}
+                  onKeyDown={(e) => { if (e.key === 'Enter' || e.key === ' ') handleSort(col.key) }}
+                  role="columnheader"
+                  aria-sort={sortKey === col.key ? (sortDir === 'asc' ? 'ascending' : 'descending') : 'none'}
                 >

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/components/ResultsTable.jsx` around lines 172 - 182, The header
cells using className "sortable-th" call handleSort(col.key) on click but are
not keyboard-accessible; update the rendering of the sortable headers (the <th>
mapping that references handleSort, sortKey and sortDir) to either: 1) add
tabIndex=0 and an onKeyDown handler that invokes handleSort(col.key) when Enter
or Space is pressed and also manage role="button" and aria-sort attributes, or
2) replace the clickable content with a native <button> inside the <th> that
calls handleSort(col.key) (and reflect sortKey/sortDir in
aria-sort/aria-pressed) so keyboard and screen-reader users can operate sorting.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dashboard/Makefile`:
- Line 13: The Makefile recipe currently expands and echoes the full command
(including API_KEY) before running npx vite; to avoid leaking the token,
suppress shell echo for that recipe by prefixing the recipe line with @ so the
full command (containing API_KEY) is not printed, e.g. change the line
containing LS_EVAL_SYSTEM_CFG_PATH, LS_EVAL_DATA_PATH, LS_EVAL_REPORTS_PATH,
LS_EVAL_DASHBOARD_RUN_ENABLED, API_KEY and npx vite to a silenced recipe (start
with @); ensure you do not add any separate echo statements that interpolate
API_KEY anywhere else in the same target.

In `@dashboard/README.md`:
- Line 15: Update the quick-start env line to fix placeholder typos and make
names consistent: change LS_EVAL_DATA_PATH=<path_to_eval.yaml.> to
LS_EVAL_DATA_PATH=<path_to_eval.yaml> (remove the stray period) and change
LS_EVAL_REPORTS_PATH=<path_to_reports_dir.yaml> to
LS_EVAL_REPORTS_PATH=<path_to_reports_dir> (use a directory placeholder, not a
.yaml file); keep the other variables LS_EVAL_SYSTEM_CFG_PATH,
LS_EVAL_DASHBOARD_RUN_ENABLED and API_KEY as-is so the final command reads all
env vars followed by npx vite.

In `@dashboard/src/App.css`:
- Line 725: Remove the empty CSS rule block `.explorer-viewer { }` from App.css
to satisfy Stylelint and eliminate noise; if styles were intended for the
explorer viewer, replace the empty block by adding the required declarations
inside the `.explorer-viewer` rule (or add a clear comment explaining why it
must remain) so the file no longer contains a bare/empty selector.
- Line 621: Replace deprecated CSS property usage: locate occurrences of
"word-break: break-word" (e.g., the rule that currently has "white-space:
pre-wrap; word-break: break-word;") and change them to "overflow-wrap:
break-word" to preserve behavior and satisfy Stylelint; update all other
instances of "word-break: break-word" in the file as well so the stylesheet uses
overflow-wrap consistently.

In `@dashboard/src/App.jsx`:
- Around line 27-33: The "details" tab icon and content are defined but missing
from the navigation; update TAB_GROUPS to include a navigation entry for the
"details" tab (key "details") so the UI can reach the details content rendered
around lines 324-326, or if the tab is truly unused, remove TAB_ICONS.details
and the details content instead; locate TAB_ICONS and TAB_GROUPS in App.jsx and
either add a matching entry (with label, key "details", and any group placement)
or delete the unused icon and associated render block.

In `@dashboard/src/components/ChartSetup.js`:
- Around line 181-196: getOrCreateLegendList currently assumes
document.getElementById(id) returns an element; add a null-check in
getOrCreateLegendList so it either creates and appends a new container element
with the given id or returns null (pick one consistent behavior), and then
update the caller (the afterUpdate logic) to guard against a null return (skip
legend creation or wait until the element exists) by checking the returned value
before querying or appending to it; reference getOrCreateLegendList and
afterUpdate to locate where to add the null check and the caller-side guard.

In `@dashboard/src/components/CollapsiblePanel.jsx`:
- Around line 8-10: Replace the non-interactive div used as the toggle (the
element with className "collapsible-header" that calls setOpen(o => !o)) with a
semantic <button type="button"> so keyboard users can toggle via Enter/Space;
preserve the onClick handler, move the "collapsible-header" className onto the
button, bind aria-expanded to the open state and add aria-controls pointing to
the collapsible content id, and make the same change for the second instance
referenced around line 46 to ensure both toggles are keyboard-accessible.

In `@dashboard/src/components/DetailModal.jsx`:
- Around line 13-24: The DetailModal component is missing dialog semantics and
Escape-key support; add role="dialog" and aria-modal="true" to the element
representing the modal container (the element with className "modal-content" or
its wrapper), provide an aria-labelledby that points to the <h2> (create an id
for the heading) and an aria-describedby if needed for the subtitle section, and
ensure the close button retains aria-label="Close"; additionally, implement
Escape-key handling by adding a keydown listener in the component lifecycle
(useEffect for functional component or componentDidMount/componentWillUnmount
for class) that calls the existing onClose prop when Escape is pressed and
cleans up the listener on unmount, and move focus into the modal on open (e.g.,
focus the modal container or first focusable element) to improve keyboard/AT
navigation.

In `@dashboard/src/components/Evaluations.jsx`:
- Around line 852-874: The handleDelete function currently swallows errors and
gives no user feedback; update it to surface failures by checking the fetch
response and catching exceptions: if res.ok is false, read the response body
(res.json() or res.text()) and display the message to the user (e.g., via an
existing toast/error UI or by adding a state like setDeleteError), and in the
catch block show the caught error; ensure you always call setDeleting(false) and
setDeleteConfirm(null) in a finally block so UI state is cleared regardless of
success or failure; reference handleDelete, the fetch('/api/delete-evaluation')
call, and the state setters setDeleting and setDeleteConfirm when making the
changes.

In `@dashboard/src/components/FilterBar.jsx`:
- Around line 5-37: The labels in FilterBar.jsx are not associated with their
corresponding select elements which breaks screen-reader accessibility; update
each label to include an htmlFor that matches a unique id on its related select
(e.g., for the "Conversation" control link label htmlFor="filter-group" to the
select id="filter-group"), and apply the same pattern for "Turn" (id like
"filter-turn"), "Model" (e.g., "filter-model"), "Metric" (e.g.,
"filter-metric"), and "Time Window" (e.g., "filter-timeWindow"); keep the
existing value/onChange handlers (filters.group, setters.setGroup,
availableOptions.groups, availableOptions.turns, setters.setTurn,
setters.setModel, setters.setMetric, setters.setTimeWindow) unchanged—only add
matching id/htmlFor pairs to labels and selects to restore proper accessibility.

In `@dashboard/src/components/ResultsTable.jsx`:
- Around line 103-108: The numeric formatting branch in ResultsTable.jsx (inside
the conditional checking col.type === 'number') calls toFixed on val but only
guards against null/undefined, which can throw if val is a non-numeric string;
update this branch to ensure val is a valid number before calling toFixed (use
Number(val) or parseFloat and check Number.isFinite/typeof to coerce/validate),
handle invalid numbers by rendering '-' and keep the special-case for col.key
=== 'score' to use 4 decimals, referencing the val variable and col.key 'score'
to locate and update the logic in the numeric formatting block.

In `@dashboard/src/components/RunPage.jsx`:
- Around line 241-245: The run row is currently an un-focusable, click-only div;
make it keyboard-accessible by converting the clickable element (the div with
className "run-eval-item" that uses selectedRunId and setSelectedRunId with
run.id) into a semantic interactive control or adding accessibility props:
either replace it with a <button> or add tabIndex={0}, role="button" (or
role="option"/aria-selected if used in a listbox), and wire an onKeyDown handler
that calls setSelectedRunId(run.id) on Enter/Space; also ensure the selected
state is exposed via aria-pressed or aria-selected so screen readers can detect
selection.
- Around line 133-142: startRun currently assumes the POST to '/api/run-eval'
succeeded and immediately reads { id } which can produce invalid state; before
calling await res.json() check res.ok (or throw/read error body) and only call
setSelectedRunId(id) when the response is successful, otherwise surface/log the
error and avoid selecting an invalid run; also ensure the following fetch to
'/api/running-evals' still runs only after a successful start or handle its
errors, and replace the empty catch with logging or user-visible error handling
(referencing the POST response handling around the fetch to '/api/run-eval', the
setSelectedRunId call, and the subsequent setRuns call).

In `@dashboard/src/components/StackedBarChart.jsx`:
- Around line 12-14: The current logic in the StackedBarChart component
increments byDate[e.date].error for any non-PASS/non-FAIL status, which
incorrectly buckets unknown statuses; update the conditional around e.result to
explicitly handle known statuses (e.g., 'PASS', 'FAIL', 'ERROR', 'SKIP' or other
expected values) and add a separate branch or counter for unknown statuses (or
ignore them) instead of defaulting to error—locate the block that mutates byDate
using e.result and change the else to either an explicit else if (e.result ===
'ERROR') ... else if (e.result === 'SKIP') ... else { byDate[e.date].unknown++ }
(or skip increment) so new/invalid statuses are not misclassified.

In `@dashboard/src/hooks/useEvalData.js`:
- Around line 41-47: The hook's load flow never clears the previous error, so
reset the error state at the start of each fetch and after a successful fetch:
inside useEffect before calling load() (or at the top of async function load())
call setError(null) (or clear the error variable) so a stale error doesn't
persist, and ensure after the Promise.all resolves you also clear/setError(null)
before setting refreshing to false; apply the same change to the other load
block referenced around the 89-99 code (the second load() implementation) so
both fetch paths call setError(null) at start and on success.
- Around line 11-14: In fetchManifest (and the similar fetch call around lines
16-18), check the response status before parsing: after awaiting
fetch('/api/manifest') inspect res.ok and if false throw an Error that includes
the response.status and a brief body message (e.g., await res.text() or
res.statusText) so the caller fails fast instead of attempting res.json()/CSV
parsing; update the function(s) to only call res.json() or CSV parse when res.ok
is true and propagate the thrown error.

In `@dashboard/src/hooks/useFilters.js`:
- Around line 31-34: The current afterTurn filtering allows rows with missing
turnId when a specific turn is selected; change the predicate so that when turn
!== ALL you only keep entries with a defined turnId that equals the selected
turn (replace filter(e => !e.turnId || e.turnId === turn) with a predicate that
requires e.turnId && e.turnId === turn), and apply the same change to the other
analogous blocks referenced (the filters around the sections corresponding to
lines 82-89 and 100-105) so all turn-specific filters exclude rows with no
turnId.

In `@dashboard/vite.config.js`:
- Around line 112-113: The POST handler registered at
server.middlewares.use('/api/run-eval', ...) (and the similar handler for stop
endpoints) does not check the LS_EVAL_DASHBOARD_RUN_ENABLED flag; update the
middleware(s) to read the run-mode flag
(process.env.LS_EVAL_DASHBOARD_RUN_ENABLED or the app config accessor) and, when
the flag is falsy/disabled, short-circuit the request with an appropriate HTTP
error (e.g., 403) before any run/stop logic executes; apply the same guard to
the '/api/stop-eval' middleware so direct API calls cannot start/stop runs when
run mode is disabled.
- Around line 150-153: The activeRuns map stores run objects (created with id,
pid, output, listeners, etc.) and currently retains completed runs and their
full output indefinitely; add retention limits by enforcing a maximum number of
activeRuns and a maximum output buffer size per run. Update the logic around
where runs are added/updated (the code that creates the run object and the code
that appends to run.output) to: 1) drop or prune oldest completed runs once
activeRuns.size exceeds a configured maxActiveRuns (e.g., remove entries with
status !== 'running'), and 2) truncate or rotate run.output when it grows beyond
a configured maxOutputBytes (e.g., keep only the tail and/or maintain a capped
buffer), and also ensure listeners (run.listeners) are cleared for removed runs
to avoid leaks; expose sensible defaults (constants like MAX_ACTIVE_RUNS,
MAX_OUTPUT_BYTES) and use them when creating/updating runs.
- Around line 24-31: The middleware registered in
server.middlewares.use('/results', ...) constructs filePath from evalOutputDir
and req.url and is vulnerable to path traversal; instead resolve and normalize
the requested path and verify it is contained within evalOutputDir before
serving. Replace the naive path.join(decodeURIComponent(req.url)) usage with a
safe flow: decode the URL, reject null bytes, compute const resolved =
path.resolve(evalOutputDir, decodedPath) and ensure path.relative(evalOutputDir,
resolved) does not start with '..' (or check
resolved.startsWith(path.resolve(evalOutputDir) + path.sep)); only then check
fs.existsSync/res.statSync and stream the file, otherwise return 404/403.

---

Nitpick comments:
In `@dashboard/src/App.jsx`:
- Around line 160-176: The fetch calls in the useEffect (loading
'/api/system-config') and in openEvalData silently swallow errors; update both
to handle failures by logging the error (e.g., using console.error or an
existing logger) and updating state so the UI can show feedback (for example
setSystemConfig to null/error state or set an explicit error flag and
setEvalDataConfig to null with an error state), and ensure evalDataLoading is
cleared in a finally-equivalent path; reference the useEffect fetch block and
the openEvalData function to implement these changes.
- Around line 46-81: SystemConfigModal and EvalDataModal duplicate the same
YAML-rendering modal UI; extract a reusable YamlModal component that accepts
props (title, content, onClose) and internally uses useTheme and
SyntaxHighlighter with the same style selection (atomOneDark / atomOneLight) and
customStyle, then replace usages of SystemConfigModal and EvalDataModal with
YamlModal (e.g., <YamlModal title="System Configuration"
content={config.content} onClose={onClose} />) ensuring the overlay click and
stopPropagation behavior and modal-close button handlers are preserved.
- Around line 334-366: Add Escape-key dismissal to the modal components by
registering a keydown listener that calls the provided onClose when e.key ===
'Escape' and cleaning it up on unmount; implement this in the modal components
referenced here (DetailModal, SystemConfigModal, EvalDataModal) using a
useEffect that adds the document keydown handler and removes it in the cleanup,
with the dependency on onClose.

In `@dashboard/src/components/AvgScoreTrendChart.jsx`:
- Around line 102-104: The legend block in AvgScoreTrendChart.jsx includes an
unused onClick: soloLegendClick handler while the legend is set to display:
false; remove the onClick: soloLegendClick entry from the chart options' legend
configuration (or, if you intended the built-in legend to be used, set
legend.display to true and ensure soloLegendClick is wired correctly) so the
unused soloLegendClick handler is not left in the options.

In `@dashboard/src/components/ChartSetup.js`:
- Around line 199-209: CHART_DEFAULTS currently hardcodes dark-theme colors and
therefore doesn't adapt when components use useChartTheme(); either remove
CHART_DEFAULTS if it's unused, or convert it into a theme-aware export (e.g.,
export a function like getChartDefaults(themeOrColors) or accept values from
useChartTheme()) so defaults are computed from the dynamic theme, and update any
consumers to call that function (reference CHART_DEFAULTS and useChartTheme() in
the change).

In `@dashboard/src/components/Evaluations.jsx`:
- Around line 1042-1048: The expandable table row (the <tr> with className
`expandable-row` that uses `rowKey`, `isRowExpanded` and the `setExpandedRows`
updater) needs keyboard support: make the row focusable (add tabIndex=0), expose
its state (add aria-expanded={isRowExpanded}), and add an onKeyDown handler that
toggles expansion when Enter or Space is pressed by calling the same
`setExpandedRows` logic used in the onClick; keep the existing onClick intact
and ensure the key handler prevents default for Space so it doesn't scroll.
- Line 553: The empty catch in the handleCompare function swallows errors;
replace it with a catch(error) that logs the error (e.g. console.error or
processLogger.error) and surfaces a user-facing failure state (e.g.
setCompareError or call the existing toast/notification helper) and ensure any
in-progress UI state (like isComparing) is cleared so the UI doesn't hang;
update the catch block inside handleCompare to accept the error, log it with
context ("handleCompare failed") and set the component error/notification state.
- Around line 468-488: The current useEffect fetches every file in files via
fetch(`/results/${file}`) and Papa.parse which causes N+1 network load; change
it to lazy-load and cache per-file stats: create a helper getFileStats(file)
that first checks localStorage (key like `evalStats:${file}`) and returns cached
stats if present, otherwise fetches `/results/${file}`, uses Papa.parse to
compute { passed, total, conversations } and writes that to localStorage before
returning; then update the effect that currently uses
Promise.all(files.map(...)) to only call Promise.all on the visible subset
(derive visibleFiles from the component state/props) and merge results into the
stats object before calling setFileStats(stats); keep existing symbols
useEffect, files, setFileStats, Papa.parse and fetch(`/results/${file}`) so
reviewers can find the changes.

In `@dashboard/src/components/ResultsTable.jsx`:
- Around line 172-182: The header cells using className "sortable-th" call
handleSort(col.key) on click but are not keyboard-accessible; update the
rendering of the sortable headers (the <th> mapping that references handleSort,
sortKey and sortDir) to either: 1) add tabIndex=0 and an onKeyDown handler that
invokes handleSort(col.key) when Enter or Space is pressed and also manage
role="button" and aria-sort attributes, or 2) replace the clickable content with
a native <button> inside the <th> that calls handleSort(col.key) (and reflect
sortKey/sortDir in aria-sort/aria-pressed) so keyboard and screen-reader users
can operate sorting.

In `@dashboard/src/hooks/useEvalData.js`:
- Around line 61-69: The loop in useEvalData.js fetches CSVs sequentially (for
const file of files -> fetchAndParseCsv(file)), causing slow reloads for many
reports; change it to fetch and parse files in parallel (e.g., map files to
promises and await Promise.all or process in controlled batches) while
preserving existing logic that skips files without parseDateFromFilename(file)
or uses summaries[ts].model to populate newModelMap; ensure you still assign
newModelMap[date.toISOString()] for matching ts keys and then iterate the
combined parsed rows per file (e.g., keep association of parsed rows with their
file/date) so downstream code that currently consumes rows remains correct.

In `@dashboard/vite.config.js`:
- Around line 198-199: The handler for /api/running-evals uses synchronous
execSync which blocks the event loop; replace execSync with a non-blocking child
process call (e.g., util.promisify(require('child_process').exec) or
child_process.spawn) and make the /api/running-evals request handler async so it
awaits the promise, captures stdout/stderr, and returns the trimmed result;
update any references to execSync in vite.config.js to use the async
exec/promise approach and handle errors without blocking the event loop.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2d0f5192-4f3a-462c-b152-e596cd7f2757

📥 Commits

Reviewing files that changed from the base of the PR and between 5e8be40 and 48dae78.

⛔ Files ignored due to path filters (1)

dashboard/package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (31)

.gitignore
dashboard/Makefile
dashboard/README.md
dashboard/eslint.config.js
dashboard/index.html
dashboard/package.json
dashboard/src/App.css
dashboard/src/App.jsx
dashboard/src/components/AvgScoreTrendChart.jsx
dashboard/src/components/ChartSetup.js
dashboard/src/components/CollapsiblePanel.jsx
dashboard/src/components/DetailModal.jsx
dashboard/src/components/Evaluations.jsx
dashboard/src/components/ExecTimeTrendChart.jsx
dashboard/src/components/FilterBar.jsx
dashboard/src/components/MetricBarChart.jsx
dashboard/src/components/OverallAvgChart.jsx
dashboard/src/components/PercentilesChart.jsx
dashboard/src/components/ResultsPieChart.jsx
dashboard/src/components/ResultsTable.jsx
dashboard/src/components/RunPage.jsx
dashboard/src/components/ScoreTrendChart.jsx
dashboard/src/components/StackedBarChart.jsx
dashboard/src/components/StatsCards.jsx
dashboard/src/hooks/useEvalData.js
dashboard/src/hooks/useFilters.js
dashboard/src/hooks/useMetricMetadata.js
dashboard/src/hooks/useTheme.jsx
dashboard/src/index.css
dashboard/src/main.jsx
dashboard/vite.config.js

dashboard/Makefile

dashboard/README.md

dashboard/src/App.css

dashboard/src/App.jsx

dashboard/src/hooks/useEvalData.js

dashboard/src/hooks/useFilters.js

dashboard/vite.config.js

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (2)

dashboard/README.md (1)

15-15: Inconsistent placeholder syntax for API_KEY.

The API_KEY=$(API_KEY) uses shell expansion syntax while other placeholders use <...> format. For consistency and clarity, consider using API_KEY=<api_key>.

📝 Suggested fix

-LS_EVAL_SYSTEM_CFG_PATH=<path_to_system.yaml> LS_EVAL_DATA_PATH=<path_to_eval.yaml> LS_EVAL_REPORTS_PATH=<path_to_reports_dir> LS_EVAL_DASHBOARD_RUN_ENABLED=<true|false> API_KEY=$(API_KEY) npx vite
+LS_EVAL_SYSTEM_CFG_PATH=<path_to_system.yaml> LS_EVAL_DATA_PATH=<path_to_eval.yaml> LS_EVAL_REPORTS_PATH=<path_to_reports_dir> LS_EVAL_DASHBOARD_RUN_ENABLED=<true|false> API_KEY=<api_key> npx vite

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/README.md` at line 15, Update the README example to use consistent
placeholder syntax: replace the shell expansion token API_KEY=$(API_KEY) with a
bracketed placeholder like API_KEY=<api_key> so it matches the other variables
(LS_EVAL_SYSTEM_CFG_PATH, LS_EVAL_DATA_PATH, LS_EVAL_REPORTS_PATH,
LS_EVAL_DASHBOARD_RUN_ENABLED) in the example command that ends with npx vite;
keep the same spacing and order so the runtime invocation remains unchanged.

dashboard/src/App.css (1)

294-296: Remove unnecessary quotes from single-word font family names.

Stylelint flags 'Menlo' and 'Consolas' as having unnecessary quotes. Single-word font names don't require quotes per CSS specification.

♻️ Proposed fix

-.run-terminal {
-  margin: 0; padding: 16px; font-family: 'Menlo', 'Consolas', 'Courier New', monospace;
+.run-terminal {
+  margin: 0; padding: 16px; font-family: Menlo, Consolas, 'Courier New', monospace;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/App.css` around lines 294 - 296, The font-family declaration in
App.css currently uses unnecessary quotes for single-word fonts; open the rule
containing "font-family: 'Menlo', 'Consolas', 'Courier New', monospace;" and
remove the quotes around Menlo and Consolas so it reads font-family: Menlo,
Consolas, 'Courier New', monospace; keep quotes for "Courier New" (multi-word)
and adjust any other occurrences of 'Menlo'/'Consolas' elsewhere in the file
similarly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dashboard/src/App.jsx`:
- Around line 167-176: The openEvalData function treats any fetch response as
success; update it to check the response status before parsing JSON: after
awaiting fetch('/api/eval-data-content') inspect res.ok (or res.status) and if
false throw or handle an error (e.g., setEvalDataConfig(null) and surface an
error) instead of calling res.json(); only call res.json() when res.ok is true
and then setEvalDataConfig({ path: data.path, content: data.content }); also
ensure setEvalDataLoading(false) runs in a finally-equivalent path so the
spinner is always cleared on success or error.
- Around line 234-245: The tab items implemented with a clickable <div> (see
TAB_GROUPS mapping and the element rendering TAB_ICONS, t.label, using activeTab
and setActiveTab) are not keyboard-accessible; replace the interactive <div>
with a semantic <button> (or add role="tab", tabindex="0") and implement
keyboard handlers to mirror click behavior: handle Enter/Space to call
setActiveTab(t.id) and handle ArrowLeft/ArrowRight (and ArrowUp/ArrowDown if
vertical) to move focus/activate the previous/next tab within TAB_GROUPS,
ensuring the active tab gets aria-selected and focus is managed; update
className logic (uses activeTab) to reflect the active state and include
appropriate ARIA attributes (aria-selected, role="tablist" on the container) to
complete accessibility.

In `@dashboard/src/components/Evaluations.jsx`:
- Around line 554-557: The catch block in handleCompare is currently swallowing
all errors (it uses an empty catch) which hides CSV fetch/compare failures;
update handleCompare to catch the error as a variable, set a visible error state
(e.g., create/use setCompareError or dispatch a user-visible toast/message) and
log the error for diagnostics, ensure you still clear selection via
setCompareSelection([]) and always clear loading via setCompareLoading(false)
(use finally or rework try/catch) so users see an error message when CSV
fetching or comparison fails.

In `@dashboard/src/components/ResultsTable.jsx`:
- Around line 173-181: Replace the clickable TH behavior so keyboard users can
focus and activate sorting: move the onClick from the <th
className="sortable-th"> into a native <button> inside the TH (use
handleSort(col.key) as the button's onClick), remove the TH-level click handler,
and keep the visual label and sort indicator inside that button; expose the
current state by setting aria-sort on the header (use 'ascending' when sortKey
=== col.key && sortDir === 'asc', 'descending' when sortKey === col.key &&
sortDir === 'desc', otherwise 'none'); ensure the button is keyboard-focusable
and retains existing styling via the sortable-th class and the sort-indicator
element.
- Around line 191-193: The row currently uses onClick on the <tr> (in
ResultsTable.jsx) which makes the entire row the expander; remove the onClick
from the <tr> and add a dedicated toggle button inside a cell (e.g., first or a
new "expander" <td>) that calls toggleRow(rowKey) and includes
aria-expanded={isExpanded}; ensure the button is keyboard-focusable and visible
to assistive tech, and update any interactive cell rendering
(renderCell/COLUMNS) so their events do not rely on row-level clicks (stop
relying on tr click and avoid interference by not calling toggleRow from inside
other interactive elements).
- Around line 35-39: The compare function mishandles the case where both values
are missing: currently it returns 1 when both a and b are null/undefined,
violating the comparator contract; update compare (the compare function in
ResultsTable.jsx) so that it first checks if both a and b are null or undefined
and returns 0 in that case, then keep the existing single-missing checks (return
1 if only a is missing, return -1 if only b is missing), followed by the numeric
subtraction branch and fallback to String(a).localeCompare(String(b)).
- Around line 187-193: The current rowKey (const rowKey = safePage * PAGE_SIZE +
i) uses a visible index so keys and expansion state jump when sorting/filtering;
change to use a stable immutable identifier from the entry (e.g., const rowId =
e.id || e._id || another unique field) and use that as the React key, the value
stored/checked in expandedRows, and the argument passed to toggleRow; update any
code that assumes numeric rowKey (expandedRows, toggleRow, and any lookups) to
work with the entry id type.

In `@dashboard/src/components/RunPage.jsx`:
- Around line 309-334: The icon-only download and fullscreen buttons in
RunPage.jsx (the <button className="run-terminal-fullscreen"> that creates the
Blob/download and the <button ... onClick={() => setFullscreen(f => !f)}>
toggle) lack reliable accessible names; add explicit accessible names by adding
aria-label attributes (e.g., aria-label="Download log" on the download button
and aria-label={fullscreen ? "Exit fullscreen" : "Enter fullscreen"} on the
fullscreen toggle) or include visually hidden text inside the buttons to provide
a programmatic name for screen readers while keeping the icons visible.
- Around line 255-283: The Stop button inside the option div causes the parent
onKeyDown (the option's handler that checks e.key === 'Enter' || ' ') to run
first and change selection; modify the Stop button (the element with className
"run-eval-stop" that calls stopRun(e, run.id)) to add an onKeyDown handler that
intercepts Enter and Space by calling e.stopPropagation() and e.preventDefault()
so the key event does not bubble to the parent option (the div rendering each
run with role="option" and onKeyDown toggling selectedRunId), ensuring the
button's click/stopRun behavior executes without changing selection.

---

Nitpick comments:
In `@dashboard/README.md`:
- Line 15: Update the README example to use consistent placeholder syntax:
replace the shell expansion token API_KEY=$(API_KEY) with a bracketed
placeholder like API_KEY=<api_key> so it matches the other variables
(LS_EVAL_SYSTEM_CFG_PATH, LS_EVAL_DATA_PATH, LS_EVAL_REPORTS_PATH,
LS_EVAL_DASHBOARD_RUN_ENABLED) in the example command that ends with npx vite;
keep the same spacing and order so the runtime invocation remains unchanged.

In `@dashboard/src/App.css`:
- Around line 294-296: The font-family declaration in App.css currently uses
unnecessary quotes for single-word fonts; open the rule containing "font-family:
'Menlo', 'Consolas', 'Courier New', monospace;" and remove the quotes around
Menlo and Consolas so it reads font-family: Menlo, Consolas, 'Courier New',
monospace; keep quotes for "Courier New" (multi-word) and adjust any other
occurrences of 'Menlo'/'Consolas' elsewhere in the file similarly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a4bdf69f-3026-4eb5-9350-f42373f19b52

📥 Commits

Reviewing files that changed from the base of the PR and between 48dae78 and 4ede703.

📒 Files selected for processing (15)

dashboard/Makefile
dashboard/README.md
dashboard/src/App.css
dashboard/src/App.jsx
dashboard/src/components/ChartSetup.js
dashboard/src/components/CollapsiblePanel.jsx
dashboard/src/components/DetailModal.jsx
dashboard/src/components/Evaluations.jsx
dashboard/src/components/FilterBar.jsx
dashboard/src/components/ResultsTable.jsx
dashboard/src/components/RunPage.jsx
dashboard/src/components/StackedBarChart.jsx
dashboard/src/hooks/useEvalData.js
dashboard/src/hooks/useFilters.js
dashboard/vite.config.js

🚧 Files skipped from review as they are similar to previous changes (4)

dashboard/src/components/FilterBar.jsx
dashboard/src/components/StackedBarChart.jsx
dashboard/src/components/CollapsiblePanel.jsx
dashboard/src/components/DetailModal.jsx

dashboard/src/App.jsx

dashboard/src/components/Evaluations.jsx

dashboard/src/components/ResultsTable.jsx

dashboard/src/components/RunPage.jsx

coderabbitai

Actionable comments posted: 10

♻️ Duplicate comments (1)

dashboard/README.md (1)
15-15: ⚠️ Potential issue | 🟡 Minor

Fix the API key example so it works when pasted into a shell.

Line 15 is presented as a bash command, but $(API_KEY) is command substitution there, not variable expansion. Copy/pasting this will try to execute a command named API_KEY instead of passing the token. Use API_KEY=<api_key> for a placeholder or API_KEY="$API_KEY" if you want to reuse an exported variable. (gnu.org)
Suggested README fix
-LS_EVAL_SYSTEM_CFG_PATH=<path_to_system.yaml> LS_EVAL_DATA_PATH=<path_to_eval.yaml> LS_EVAL_REPORTS_PATH=<path_to_reports_dir> LS_EVAL_DASHBOARD_RUN_ENABLED=<true|false> API_KEY=$(API_KEY) npx vite
+LS_EVAL_SYSTEM_CFG_PATH=<path_to_system.yaml> LS_EVAL_DATA_PATH=<path_to_eval.yaml> LS_EVAL_REPORTS_PATH=<path_to_reports_dir> LS_EVAL_DASHBOARD_RUN_ENABLED=<true|false> API_KEY=<api_key> npx vite
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dashboard/README.md` at line 15, The example shell command using
LS_EVAL_SYSTEM_CFG_PATH, LS_EVAL_DATA_PATH, LS_EVAL_REPORTS_PATH,
LS_EVAL_DASHBOARD_RUN_ENABLED and API_KEY incorrectly uses command substitution
`$(API_KEY)` so a shell will try to run a command named API_KEY; change the
example to use variable assignment instead (e.g., use API_KEY=<api_key> as a
placeholder or API_KEY="$API_KEY" to reuse an exported variable) so the API
token is passed correctly when copy/pasted into a shell.

🧹 Nitpick comments (1)

dashboard/src/components/ChartSetup.js (1)

113-141: Consider extracting duplicated solo legend logic.

The solo legend click logic (lines 119-138) duplicates the exported soloLegendClick function (lines 48-69). While both work correctly, consolidating would reduce maintenance burden.

♻️ Proposed refactor to reuse soloLegendClick logic

Extract the core logic into a shared helper:

+function applySoloLegendLogic(chart, clickedIndex, allIndices) {
+  const visibleIndices = allIndices.filter(i => chart.isDatasetVisible(i))
+  const allVisible = visibleIndices.length === allIndices.length
+  const clickedIsVisible = chart.isDatasetVisible(clickedIndex)
+
+  if (allVisible) {
+    for (const i of allIndices) {
+      chart.setDatasetVisibility(i, i === clickedIndex)
+    }
+  } else if (clickedIsVisible && visibleIndices.length === 1) {
+    for (const i of allIndices) {
+      chart.setDatasetVisibility(i, true)
+    }
+  } else {
+    chart.setDatasetVisibility(clickedIndex, !clickedIsVisible)
+  }
+}
+
 export function soloLegendClick(event, legendItem, legend) {
   const chart = legend.chart
   const clickedIndex = legendItem.datasetIndex
   const legendIndices = legend.legendItems.map(item => item.datasetIndex)
-  const visibleIndices = legendIndices.filter(i => chart.isDatasetVisible(i))
-  const allVisible = visibleIndices.length === legendIndices.length
-  const clickedIsVisible = chart.isDatasetVisible(clickedIndex)
-
-  if (allVisible) {
-    for (const i of legendIndices) {
-      chart.setDatasetVisibility(i, i === clickedIndex)
-    }
-  } else if (clickedIsVisible && visibleIndices.length === 1) {
-    for (const i of legendIndices) {
-      chart.setDatasetVisibility(i, true)
-    }
-  } else {
-    chart.setDatasetVisibility(clickedIndex, !clickedIsVisible)
-  }
-
+  applySoloLegendLogic(chart, clickedIndex, legendIndices)
   chart.update()
 }

Then in createHtmlLegendPlugin, the click handler can call:

} else {
  const allIndices = items.map(i => i.datasetIndex)
  applySoloLegendLogic(chart, item.datasetIndex, allIndices)
}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/components/ChartSetup.js` around lines 113 - 141, Replace the
duplicated "solo legend" logic in the click handler with a call to the existing
exported helper soloLegendClick: compute allIndices as items.map(i =>
i.datasetIndex) and call soloLegendClick(chart, item.datasetIndex, allIndices)
(or adjust argument order to match the exported function signature), then remove
the inline block (lines 119-138) and leave chart.update() after the call so
behavior is unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dashboard/README.md`:
- Line 8: Update the README references that currently state "Node.js 18+" to the
correct Vite 7 engine requirement by replacing both occurrences of the Node.js
prerequisite text with "Node.js 20.19+ or 22.12+" (and remove mention of Node.js
18 support); ensure both places where Node.js is documented in the README are
changed so users see the accurate requirements.

In `@dashboard/src/App.css`:
- Line 294: Remove the unnecessary quotation marks around single-word font names
in the CSS font stack: update the font-family declaration that currently
contains "'Menlo', 'Consolas', 'Courier New', monospace" to remove quotes from
Menlo and Consolas so only multi-word fonts (e.g., Courier New) remain quoted;
locate the font-family line in App.css (the rule containing
margin/padding/font-family) and change the entries accordingly to satisfy
stylelint's font-family-name-quotes rule.

In `@dashboard/src/App.jsx`:
- Around line 250-253: The header refresh currently only re-runs useEvalData()
and doesn't tell the Evaluations screens to reload, so thread a reload token
from the header into Evaluations (or consolidate these screens onto the shared
loader): add a refreshKey state/prop updated by the header refresh button (the
refresh function and refreshing state) and pass it into the Evaluations
component (and any child tabs rendered at the header tabs), then update
Evaluations' internal useEffect that fetches manifest/amended/graph data to
include that refreshKey in its dependency array (or move those data loads into
the shared useEvalData loader and remove the separate useEffect). Ensure the
prop name matches the header component and the useEffect references the same
symbol so the explorer views are invalidated when header refresh fires.

In `@dashboard/src/components/Evaluations.jsx`:
- Around line 1055-1065: The expansion state is keyed by a transient index
(rowKey = start + i), which shifts with sorting/paging and causes expansion to
move to the wrong record; change to use a stable identifier from the row (e.g.,
row.id or row.uuid) instead of start + i when computing rowKey and when toggling
expandedRows (referenced in pageRows, rowKey, expandedRows, setExpandedRows),
and ensure you clear expandedRows (setExpandedRows(new Set())) when switching
files or changing the current file identifier so expansions do not persist
across different file contexts.
- Around line 881-884: The failure branch reads only .message from the JSON
response so the backend's `{ error: ... }` payload is ignored; update the delete
error handling (the block that setsDeleteError after awaiting res.json()) to
parse JSON and prefer json.error || json.message, falling back to res.text() and
then to a generic `Delete failed (${res.status})` string so real validation
errors from the backend are shown to users.
- Around line 1133-1135: When filteredFiles shrinks the current listPage can
become out-of-range causing pageFiles to be empty; clamp listPage to the new max
page index whenever filteredFiles (or its length) changes. After computing
listTotalPages (using LIST_PAGE_SIZE), ensure you compute a safe page index like
Math.min(listPage, Math.max(0, listTotalPages - 1)) and update the listPage
state (or use that clamped value for computing pageFiles) so pageFiles always
reflects a valid page; reference listTotalPages, listPage, filteredFiles,
LIST_PAGE_SIZE and pageFiles when applying the change.
- Around line 1211-1229: The file-card div with className "explorer-file-card"
is not keyboard-accessible; replace or update it so keyboard users can focus and
activate the same behavior as the onClick handler: either convert the div into a
semantic control (e.g., a <button> or a <label> tied to the checkbox) or add
tabIndex="0" plus an onKeyDown handler that triggers the same logic used in the
onClick (the compareMode branch that calls setCompareSelection and the else
branch that calls openFile). Also make the checkbox (class "compare-checkbox")
interactive instead of readOnly if you keep it visible, and add focus styles
(via :focus/:focus-visible) to "explorer-file-card" to provide a visible focus
indicator for accessibility.

In `@dashboard/vite.config.js`:
- Around line 45-50: The middleware for routes '/results', '/api/eval-stream',
and '/api/stop-eval' calls decodeURIComponent on req.url without guarding
against malformed percent-encodings; wrap each decodeURIComponent(...) call in a
try-catch and, on catching a URIError (or any error), set res.statusCode = 400,
call res.end(), and return so the exception cannot escape the middleware (update
the server.middlewares.use('/results', the '/api/eval-stream' handler, and the
'/api/stop-eval' handler accordingly).
- Around line 437-454: The cleanup currently deletes any .yml/.yaml/.json files
whose filename contains the timestamp string (variable ts) under evalOutputDir
and graphsDir, which is too broad; change the matching to only remove the known
generated companion filenames (use the evaluation's base name or explicit
suffixes instead of a substring match) — e.g., derive the evaluation base name
from ts or the original result filename and only unlink files that exactly match
the expected patterns like "<base>.amended.yaml", "<base>.yaml",
"<base>.summary.json" (and their graphs counterparts) when iterating
fs.readdirSync(evalOutputDir) or fs.readdirSync(graphsDir), leaving other files
untouched and still pushing deleted entries into deleted. Ensure you keep the
existing try/catch blocks around the directory reads (using evalOutputDir,
graphsDir, ts, deleted) while replacing the broad f.includes(ts) checks with
strict pattern/regex matching for those owned filenames.
- Around line 42-45: The plugin returned by the 'eval-data' setup only
implements configureServer so the dashboard middleware (server.middlewares.use
for '/results' and any '/api/*' routes) isn't registered for vite preview; add a
matching configurePreviewServer implementation that mirrors the middleware
registration (same route handlers used in configureServer) so preview serves
those routes, or alternatively update README/usage to state the dashboard is
dev-only; locate the middleware registration inside the plugin (name:
'eval-data', function configureServer) and duplicate the same route wiring
inside configurePreviewServer to fix the preview behavior.

---

Duplicate comments:
In `@dashboard/README.md`:
- Line 15: The example shell command using LS_EVAL_SYSTEM_CFG_PATH,
LS_EVAL_DATA_PATH, LS_EVAL_REPORTS_PATH, LS_EVAL_DASHBOARD_RUN_ENABLED and
API_KEY incorrectly uses command substitution `$(API_KEY)` so a shell will try
to run a command named API_KEY; change the example to use variable assignment
instead (e.g., use API_KEY=<api_key> as a placeholder or API_KEY="$API_KEY" to
reuse an exported variable) so the API token is passed correctly when
copy/pasted into a shell.

---

Nitpick comments:
In `@dashboard/src/components/ChartSetup.js`:
- Around line 113-141: Replace the duplicated "solo legend" logic in the click
handler with a call to the existing exported helper soloLegendClick: compute
allIndices as items.map(i => i.datasetIndex) and call soloLegendClick(chart,
item.datasetIndex, allIndices) (or adjust argument order to match the exported
function signature), then remove the inline block (lines 119-138) and leave
chart.update() after the call so behavior is unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b347fae2-68d9-42c0-96ea-21acc0595c3b

📥 Commits

Reviewing files that changed from the base of the PR and between 4ede703 and a4b0541.

📒 Files selected for processing (15)

dashboard/Makefile
dashboard/README.md
dashboard/src/App.css
dashboard/src/App.jsx
dashboard/src/components/ChartSetup.js
dashboard/src/components/CollapsiblePanel.jsx
dashboard/src/components/DetailModal.jsx
dashboard/src/components/Evaluations.jsx
dashboard/src/components/FilterBar.jsx
dashboard/src/components/ResultsTable.jsx
dashboard/src/components/RunPage.jsx
dashboard/src/components/StackedBarChart.jsx
dashboard/src/hooks/useEvalData.js
dashboard/src/hooks/useFilters.js
dashboard/vite.config.js

🚧 Files skipped from review as they are similar to previous changes (5)

dashboard/src/components/ResultsTable.jsx
dashboard/src/components/StackedBarChart.jsx
dashboard/src/components/RunPage.jsx
dashboard/src/components/FilterBar.jsx
dashboard/src/components/CollapsiblePanel.jsx

dashboard/README.md

dashboard/src/App.css

dashboard/src/App.jsx

dashboard/src/components/Evaluations.jsx

dashboard/vite.config.js

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (5)

dashboard/vite.config.js (1)

476-591: ⚠️ Potential issue | 🟠 Major

Preview mode still omits the run/manage API surface.

This preview block only exposes the read-only routes. The frontend still depends on /api/run-config, /api/run-eval, /api/running-evals, /api/eval-stream, /api/stop-eval, and /api/delete-evaluation, so vite preview will still break those flows. Extract the middleware registration into one shared helper and call it from both hooks so the two modes cannot drift again.

Run this read-only script to diff the route sets registered in each hook. Expected result: the configureServer-only routes list includes the missing run/delete endpoints.

#!/bin/bash
python - <<'PY'
from pathlib import Path
import re

text = Path('dashboard/vite.config.js').read_text()

def extract_block(name: str) -> str:
    start = text.index(f'{name}(server)')
    brace = text.index('{', start)
    depth = 0
    for i in range(brace, len(text)):
        ch = text[i]
        if ch == '{':
            depth += 1
        elif ch == '}':
            depth -= 1
            if depth == 0:
                return text[brace:i + 1]
    raise RuntimeError(f'Could not parse {name}')

server_block = extract_block('configureServer')
preview_block = extract_block('configurePreviewServer')

route_re = re.compile(r"server\.middlewares\.use\('([^']+)'")
server_routes = set(route_re.findall(server_block))
preview_routes = set(route_re.findall(preview_block))

print('configureServer-only routes:')
for route in sorted(server_routes - preview_routes):
    print(f'  {route}')

print('\nconfigurePreviewServer-only routes:')
for route in sorted(preview_routes - server_routes):
    print(f'  {route}')
PY

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/vite.config.js` around lines 476 - 591, The preview hook
(configurePreviewServer) only registers read-only routes while configureServer
registers additional run/manage endpoints (e.g., /api/run-config, /api/run-eval,
/api/running-evals, /api/eval-stream, /api/stop-eval, /api/delete-evaluation),
causing preview mode to break; to fix, extract the common middleware
registrations into a shared function (e.g., registerPreviewAndApiRoutes(server)
or attachCommonMiddlewares) and move all route.server.middlewares.use(...) calls
currently duplicated into that helper, then call that helper from both
configureServer and configurePreviewServer so both hooks register the same full
API surface.

dashboard/src/components/Evaluations.jsx (1)

1059-1069: ⚠️ Potential issue | 🟡 Minor

Remove start + i from the viewer row key.

Appending the page-local index means the same record gets a new key after sorting or filtering, so expansion state is lost and the row remounts unnecessarily. The (conversation_group_id, turn_id, metric_identifier) triplet is already treated as the stable identity in handleCompare.

💡 Suggested fix

-                  const rowKey = `${row.conversation_group_id || ''}||${row.turn_id || ''}||${row.metric_identifier || ''}||${start + i}`
+                  const rowKey = `${row.conversation_group_id || ''}||${row.turn_id || ''}||${row.metric_identifier || ''}`
                   const isRowExpanded = expandedRows.has(rowKey)
                   return (
                     <tr key={rowKey} className={`expandable-row${isRowExpanded ? ' expanded' : ''}`} onClick={() => {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/components/Evaluations.jsx` around lines 1059 - 1069, The row
key currently includes the page-local index (`start + i`) which causes keys to
change on sorting/filtering and drops expansion state; update the key generation
inside the pageRows.map so rowKey is built only from the stable identity triplet
(use `${row.conversation_group_id || ''}||${row.turn_id ||
''}||${row.metric_identifier || ''}`) and remove the `start + i` suffix, leaving
expandedRows, setExpandedRows and handleCompare logic unchanged.

dashboard/src/components/ResultsTable.jsx (1)

188-200: ⚠️ Potential issue | 🟠 Major

Use a stable row ID for keys and expansion state.

rowKey is still the row’s visible position, so sorting or date filtering can expand the wrong entry and gives React unstable keys. The data already carries immutable fields you can combine for a stable ID; keep the display number separate.

💡 Suggested fix

             {pageEntries.map((e, i) => {
-              const rowKey = safePage * PAGE_SIZE + i
-              const isExpanded = expandedRows.has(rowKey)
+              const rowId = [e.file, e.conversationGroupId, e.turnId, e.metric].join('||')
+              const rowNumber = safePage * PAGE_SIZE + i + 1
+              const isExpanded = expandedRows.has(rowId)
               return (
-                <tr key={rowKey} className={`expandable-row${isExpanded ? ' expanded' : ''}`}>
+                <tr key={rowId} className={`expandable-row${isExpanded ? ' expanded' : ''}`}>
                   <td className="row-num">
                     <button
+                      type="button"
                       className="row-toggle"
                       aria-expanded={isExpanded}
-                      aria-label={`${isExpanded ? 'Collapse' : 'Expand'} row ${rowKey + 1}`}
-                      onClick={() => toggleRow(rowKey)}
+                      aria-label={`${isExpanded ? 'Collapse' : 'Expand'} row ${rowNumber}`}
+                      onClick={() => toggleRow(rowId)}
                     >
-                      {isExpanded ? '\u25BC' : '\u25B6'} {rowKey + 1}
+                      {isExpanded ? '\u25BC' : '\u25B6'} {rowNumber}
                     </button>

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/components/ResultsTable.jsx` around lines 188 - 200, The
current key and expansion identifier (rowKey) is derived from visible position
(safePage * PAGE_SIZE + i), causing unstable keys when sorting/filtering;
instead build a stable ID from the row data (e.g., combine a unique immutable
field(s) on each entry in pageEntries such as entry.id or entry.createdAt) and
use that stableId for the React key and for expandedRows/toggleRow lookups,
while keeping the displayed index (rowKey + 1) purely for UI; update usages of
rowKey, expandedRows, toggleRow and any aria-labels to reference stableId for
identity and retain the numeric display separately.

dashboard/src/App.jsx (2)

168-177: ⚠️ Potential issue | 🟡 Minor

Don't open the eval-data modal on API errors.

fetch only throws on network failures. Lines 171-173 still treat 4xx/5xx JSON bodies as success, so /api/eval-data-content can open a blank modal instead of surfacing the failure.

🔧 Suggested fix

   const openEvalData = async () => {
     setEvalDataLoading(true)
     try {
       const res = await fetch('/api/eval-data-content')
       const data = await res.json()
+      if (!res.ok) {
+        throw new Error(data?.error || `Failed to load eval data (${res.status})`)
+      }
       setEvalDataConfig({ path: data.path, content: data.content })
     } catch {
       setEvalDataConfig(null)
+    } finally {
+      setEvalDataLoading(false)
     }
-    setEvalDataLoading(false)
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/App.jsx` around lines 168 - 177, The openEvalData function
treats any fetch response as success and can open the modal on 4xx/5xx
responses; update openEvalData to check res.ok after await
fetch('/api/eval-data-content') and only parse JSON and call setEvalDataConfig({
path: ..., content: ... }) when res.ok is true, otherwise
setEvalDataConfig(null) (and optionally log or surface the error); ensure
setEvalDataLoading(false) still runs in finally-style flow so loading state is
cleared.

235-247: ⚠️ Potential issue | 🟠 Major

Use real tab controls for the dashboard navigation.

Lines 240-246 are still click-only divs, so keyboard users cannot reach the main sections from the primary nav. Please render these as buttons/tabs and expose the selected state.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/App.jsx` around lines 235 - 247, Replace the click-only <div>
tab items with accessible tab controls: render each tab using a semantic button
or an element with role="tab" (inside a container with role="tablist") instead
of the current divs referenced by TAB_GROUPS / TAB_ICONS; ensure the element
uses type="button" (if button) or role="tab", update className logic that checks
activeTab to remain, and expose selection with aria-selected={activeTab ===
t.id} and proper tabIndex (0 for selected, -1 for others); keep the onClick
calling setActiveTab(t.id) and add keyboard handling
(ArrowLeft/ArrowRight/Home/End or at minimum Enter/Space activating the tab) so
keyboard users can focus and change tabs.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dashboard/Makefile`:
- Line 13: The dev Makefile command exports environment variables unquoted which
breaks when values contain spaces or shell metacharacters; update the dev target
so each assignment (LS_EVAL_SYSTEM_CFG_PATH, LS_EVAL_DATA_PATH,
LS_EVAL_REPORTS_PATH, LS_EVAL_DASHBOARD_RUN_ENABLED, API_KEY) is quoted when
invoking npx vite (e.g. LS_EVAL_SYSTEM_CFG_PATH="$(LS_EVAL_SYSTEM_CFG_PATH)") to
ensure values are passed whole and safe to the npx vite invocation.

In `@dashboard/src/components/Evaluations.jsx`:
- Around line 446-469: The useEffect Promise.all currently collapses any
bootstrap fetch failure into the empty state by only calling setLoading(false)
in the catch; update the bootstrap logic in the useEffect so that critical
failures are surfaced and non-critical endpoints are best-effort: either switch
to Promise.allSettled and handle each result (or add per-fetch .catch handlers)
so manifest/amended-files/eval-data/graphs/eval-summaries are parsed
independently,
setFiles/setAmendedMap/setEvalDataGroups/setGraphMap/setSummaryMap only when
their fetch succeeded, and introduce or use an error state (e.g. setError or
setBootstrapError) to display real errors instead of silently falling back to an
empty view while still ensuring setLoading(false) is called.
- Around line 1216-1259: The checkbox's click is bubbling to the parent card
handlers causing the selection to immediately flip; update the compare checkbox
input (the JSX input inside the compareMode block) to stop event propagation
before running its selection logic: add handlers that call e.stopPropagation()
(for click and key events as needed) so the input's onChange/setCompareSelection
logic runs without triggering the parent div's onClick/onKeyDown (which call
openFile and the same toggle logic). Ensure you still respect isIncompatible and
keep using setCompareSelection for the toggle behavior.

In `@dashboard/src/components/RunPage.jsx`:
- Around line 43-48: The fetch in the useEffect poll function should validate
the response before calling setRuns: check response.ok, await r.json(), and
verify Array.isArray(data) before calling setRuns(data); if the response is not
ok or the parsed JSON is not an array, log the error and setRuns([]) (or skip
updating) to avoid converting runs into an object that breaks runs.some(...) and
runs.map(...). Update the poll implementation (the fetch('/api/running-evals')
block) to perform these checks and handle non-array payloads gracefully.
- Around line 31-40: The fetch in the useEffect handling '/api/run-config'
treats any JSON response (including 4xx/5xx error bodies) as valid config;
update the logic in the useEffect that calls fetch('/api/run-config') so it
first checks response.ok before using r.json(), and for non-ok responses read
the error text/json and surface it (e.g., set an error state and/or log it)
instead of calling setSystemConfig/setApiKeySet/setTags with error payload;
ensure setLoading(false) runs in a finally path (or both success and error
branches) and also handle JSON parse errors separately so only valid config
populates setSystemConfig, setApiKeySet, and setTags.

---

Duplicate comments:
In `@dashboard/src/App.jsx`:
- Around line 168-177: The openEvalData function treats any fetch response as
success and can open the modal on 4xx/5xx responses; update openEvalData to
check res.ok after await fetch('/api/eval-data-content') and only parse JSON and
call setEvalDataConfig({ path: ..., content: ... }) when res.ok is true,
otherwise setEvalDataConfig(null) (and optionally log or surface the error);
ensure setEvalDataLoading(false) still runs in finally-style flow so loading
state is cleared.
- Around line 235-247: Replace the click-only <div> tab items with accessible
tab controls: render each tab using a semantic button or an element with
role="tab" (inside a container with role="tablist") instead of the current divs
referenced by TAB_GROUPS / TAB_ICONS; ensure the element uses type="button" (if
button) or role="tab", update className logic that checks activeTab to remain,
and expose selection with aria-selected={activeTab === t.id} and proper tabIndex
(0 for selected, -1 for others); keep the onClick calling setActiveTab(t.id) and
add keyboard handling (ArrowLeft/ArrowRight/Home/End or at minimum Enter/Space
activating the tab) so keyboard users can focus and change tabs.

In `@dashboard/src/components/Evaluations.jsx`:
- Around line 1059-1069: The row key currently includes the page-local index
(`start + i`) which causes keys to change on sorting/filtering and drops
expansion state; update the key generation inside the pageRows.map so rowKey is
built only from the stable identity triplet (use `${row.conversation_group_id ||
''}||${row.turn_id || ''}||${row.metric_identifier || ''}`) and remove the
`start + i` suffix, leaving expandedRows, setExpandedRows and handleCompare
logic unchanged.

In `@dashboard/src/components/ResultsTable.jsx`:
- Around line 188-200: The current key and expansion identifier (rowKey) is
derived from visible position (safePage * PAGE_SIZE + i), causing unstable keys
when sorting/filtering; instead build a stable ID from the row data (e.g.,
combine a unique immutable field(s) on each entry in pageEntries such as
entry.id or entry.createdAt) and use that stableId for the React key and for
expandedRows/toggleRow lookups, while keeping the displayed index (rowKey + 1)
purely for UI; update usages of rowKey, expandedRows, toggleRow and any
aria-labels to reference stableId for identity and retain the numeric display
separately.

In `@dashboard/vite.config.js`:
- Around line 476-591: The preview hook (configurePreviewServer) only registers
read-only routes while configureServer registers additional run/manage endpoints
(e.g., /api/run-config, /api/run-eval, /api/running-evals, /api/eval-stream,
/api/stop-eval, /api/delete-evaluation), causing preview mode to break; to fix,
extract the common middleware registrations into a shared function (e.g.,
registerPreviewAndApiRoutes(server) or attachCommonMiddlewares) and move all
route.server.middlewares.use(...) calls currently duplicated into that helper,
then call that helper from both configureServer and configurePreviewServer so
both hooks register the same full API surface.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5f008140-ecfa-42a8-8a94-61b3bac4e8d1

📥 Commits

Reviewing files that changed from the base of the PR and between a4b0541 and b9d3e4f.

📒 Files selected for processing (18)

dashboard/.vscode/settings.json
dashboard/Makefile
dashboard/README.md
dashboard/eval.yaml.old
dashboard/src/App.css
dashboard/src/App.jsx
dashboard/src/components/ChartSetup.js
dashboard/src/components/CollapsiblePanel.jsx
dashboard/src/components/DetailModal.jsx
dashboard/src/components/Evaluations.jsx
dashboard/src/components/FilterBar.jsx
dashboard/src/components/ResultsTable.jsx
dashboard/src/components/RunPage.jsx
dashboard/src/components/StackedBarChart.jsx
dashboard/src/hooks/useEvalData.js
dashboard/src/hooks/useFilters.js
dashboard/system.yaml.old
dashboard/vite.config.js

🚧 Files skipped from review as they are similar to previous changes (7)

dashboard/src/hooks/useFilters.js
dashboard/src/components/StackedBarChart.jsx
dashboard/README.md
dashboard/src/components/DetailModal.jsx
dashboard/src/components/CollapsiblePanel.jsx
dashboard/src/App.css
dashboard/src/components/FilterBar.jsx

dashboard/Makefile

dashboard/src/components/Evaluations.jsx

dashboard/src/components/RunPage.jsx

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (2)

dashboard/vite.config.js (1)

473-475: ⚠️ Potential issue | 🟠 Major

Preview mode still skips the run API.

configurePreviewServer() only wires registerSharedRoutes(server), so /api/run-config, /api/run-eval, /api/running-evals, /api/eval-stream, and /api/stop-eval remain dev-only. npm run preview will still render the Run tab, but those requests 404.

Run this to confirm that preview only registers the shared routes while RunPage.jsx depends on the run endpoints:

#!/bin/bash
printf '=== vite hooks ===\n'
sed -n '267,475p' dashboard/vite.config.js | rg -n 'configureServer|configurePreviewServer|registerSharedRoutes|/api/run-|/api/eval-stream|/api/stop-eval'

printf '\n=== RunPage callers ===\n'
sed -n '31,175p' dashboard/src/components/RunPage.jsx | rg -n '/api/run-config|/api/run-eval|/api/running-evals|/api/eval-stream|/api/stop-eval'

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/vite.config.js` around lines 473 - 475, configurePreviewServer
currently only calls registerSharedRoutes(server) so the run-related endpoints
used by RunPage.jsx (/api/run-config, /api/run-eval, /api/running-evals,
/api/eval-stream, /api/stop-eval) are not registered in preview; update
configurePreviewServer to also register the run endpoints (the same route
registration invoked by configureServer) — e.g., call the existing function that
registers run routes (or invoke registerRunRoutes/registerRunEndpoints if
present) or factor out and call the route-registration logic used for dev so
preview and dev both register the run API endpoints.

dashboard/src/components/RunPage.jsx (1)

159-160: ⚠️ Potential issue | 🟠 Major

Validate /api/running-evals before storing it in runs.

These refresh paths can still replace runs with an error object, and the next render then fails on runs.some(...) / runs.map(...). The polling effect already has the right Array.isArray(...) guard; mirror it here.

💡 Suggested fix

       const { id } = await res.json()
       setSelectedRunId(id)
       const runsRes = await fetch('/api/running-evals')
-      if (runsRes.ok) setRuns(await runsRes.json())
+      const runsPayload = await runsRes.json().catch(() => null)
+      if (runsRes.ok && Array.isArray(runsPayload)) setRuns(runsPayload)
@@
     try {
       await fetch(`/api/stop-eval/${id}`, { method: 'POST' })
       const runsRes = await fetch('/api/running-evals')
-      setRuns(await runsRes.json())
+      const runsPayload = await runsRes.json().catch(() => null)
+      if (runsRes.ok && Array.isArray(runsPayload)) setRuns(runsPayload)
     } catch { /* ignore */ }
   }

Also applies to: 172-173

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/components/RunPage.jsx` around lines 159 - 160, When fetching
'/api/running-evals' in the RunPage component, don't directly setRuns with the
raw JSON response — validate it's an array first; change the code around the
fetch that uses runsRes and setRuns so you parse await runsRes.json() into a
variable (e.g., data) and only call setRuns(data) if Array.isArray(data),
otherwise leave runs unchanged (or handle error); apply the same Array.isArray
guard for the other fetch block around the later fetch at the lines referencing
runsRes / setRuns (the second fetch at 172-173).

🧹 Nitpick comments (6)

dashboard/src/components/Evaluations.jsx (5)

38-73: Consider adding Escape key handler for modal accessibility.

The modal closes on overlay click and has an accessible close button, but keyboard users expect Escape to close modals. This pattern repeats across all modals in this file.

💡 Example enhancement for modals

 function AmendedModal({ content, onClose }) {
   const { theme } = useTheme()
+  useEffect(() => {
+    const handleEscape = (e) => { if (e.key === 'Escape') onClose() }
+    document.addEventListener('keydown', handleEscape)
+    return () => document.removeEventListener('keydown', handleEscape)
+  }, [onClose])
   return (

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/components/Evaluations.jsx` around lines 38 - 73, Add an Escape
key handler to the modal components (e.g., AmendedModal) so keyboard users can
close the modal: inside the component use useEffect to add a window keydown
listener that calls onClose when event.key === 'Escape', and clean up the
listener on unmount; ensure the listener is added only while the modal is
mounted and avoid interfering with the existing onClick overlay handler and
modal-close button by simply invoking the same onClose prop.

113-118: Consider adding image error handling.

If a graph image fails to load (e.g., file deleted), the thumbnail will show a broken image. A fallback or error state would improve UX.

💡 Example fix

               <div key={g} className="graph-thumb" onClick={() => onPreview(g)}>
-                <img src={`/results/graphs/${g}`} alt={labelFromFilename(g)} />
+                <img
+                  src={`/results/graphs/${g}`}
+                  alt={labelFromFilename(g)}
+                  onError={(e) => { e.target.style.display = 'none' }}
+                />
                 <div className="graph-thumb-label">{labelFromFilename(g)}</div>

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/components/Evaluations.jsx` around lines 113 - 118, Add image
error handling to the thumbnail rendering inside the graphs.map loop so broken
images show a fallback UI; update the <img> used in the graph-thumb for each g
(the img output of graphs.map) to handle load errors by swapping to a fallback
image URL or replacing the <img> with a placeholder element and ensure the
labelFromFilename(g) and onPreview(g) behavior still work; implement this by
adding an onError handler (or local state per thumbnail) that sets a fallback
src or toggles an error flag used to render a placeholder with the existing
graph-thumb-label.

502-522: Consider batching requests for large file counts.

When there are many evaluation files, Promise.all fires all fetch requests simultaneously. This could overwhelm the browser's connection limit or the server.

💡 Optional: Batch requests

For a dashboard with potentially many evaluations, consider limiting concurrent requests:

// Example: Process files in batches of 5
const BATCH_SIZE = 5
for (let i = 0; i < files.length; i += BATCH_SIZE) {
  const batch = files.slice(i, i + BATCH_SIZE)
  await Promise.all(batch.map(file => /* fetch and parse */))
}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/components/Evaluations.jsx` around lines 502 - 522, The current
useEffect fires all fetches at once via Promise.all(files.map(...)) which can
overwhelm connections; change the effect to run an async function (or async
IIFE) that processes files in bounded batches (e.g., const BATCH_SIZE = 5) and
for each batch await Promise.all(batch.map(file =>
fetch(`/results/${file}`).then(...).catch(...))) so requests are limited
concurrently, accumulate results into the existing stats object (the same stats
variable used for Papa.parse results) across batches, and call
setFileStats(stats) after all batches complete; keep existing parsing with
Papa.parse, error handling, and references to files, setFileStats, and
Papa.parse intact.

305-310: The "total rows" stat toggles but has no filtering effect.

Clicking "total rows" sets deltaFilter to 'total', which is styled as active, but the filter logic at lines 179-186 doesn't recognize 'total' and falls through to return true (showing all rows). This is effectively correct behavior, but visually toggling it as "active" suggests a filter is applied when it isn't.

💡 Suggestion: Remove toggle for "total" or make it explicit

Either remove the onClick toggle from the total stat (it's informational), or add explicit handling:

-          <span className={`compare-stat total${deltaFilter === 'total' ? ' active' : ''}`} onClick={() => { setDeltaFilter(f => f === 'total' ? null : 'total'); setPage(0) }}>{data.rows.length} total rows</span>
+          <span className="compare-stat total">{data.rows.length} total rows</span>

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/components/Evaluations.jsx` around lines 305 - 310, The "total
rows" stat currently toggles deltaFilter to 'total' and shows as active but the
filtering code (the function that reads deltaFilter to decide which rows to
include) doesn't handle 'total', so the UI misleads users; fix by making the
total stat purely informational: remove its onClick that calls setDeltaFilter
and remove the active class logic for the compare-stat total span (leave the
displayed count using data.rows.length), or alternatively update the filter
logic that reads deltaFilter (the same function referencing deltaFilter, e.g.,
the block around the existing improved/regressed/unchanged checks) to explicitly
handle 'total' as equivalent to no filter; reference symbols: deltaFilter,
setDeltaFilter, compare-stat total, data.rows.length, and the filtering function
that currently checks improved/regressed/unchanged.

1305-1314: Yellow color is unreachable for odd totals.

The condition passed === half uses strict equality, but half = total / 2 is not an integer for odd totals. For example, with 5 total items, half = 2.5, and passed (an integer) can never equal it.

💡 Consider range-based coloring

 const { passed, total } = fileStats[file]
-const half = total / 2
-const color = passed > half ? 'var(--green)' : passed === half ? 'var(--yellow)' : 'var(--red)'
+const ratio = total > 0 ? passed / total : 0
+const color = ratio >= 0.6 ? 'var(--green)' : ratio >= 0.4 ? 'var(--yellow)' : 'var(--red)'

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/components/Evaluations.jsx` around lines 1305 - 1314, The color
logic is unreachable for odd totals because half = total / 2 can be non-integer;
update the comparison to use integer midpoint logic: compute half = total / 2 as
before but compare passed against Math.ceil(total / 2) for the yellow case (or
use range-based checks), e.g. set color = passed > half ? 'var(--green)' :
passed === Math.ceil(total / 2) ? 'var(--yellow)' : 'var(--red)'; apply this
change inside the inline renderer that reads fileStats[file] and destructures
passed and total to ensure odd totals can hit the yellow branch.

dashboard/src/App.jsx (1)

47-82: Consider consolidating duplicate modal components.

SystemConfigModal and EvalDataModal are nearly identical—only the title differs. A single reusable YamlModal component accepting a title prop would reduce duplication.

♻️ Suggested refactor

-function SystemConfigModal({ config, onClose }) {
+function YamlModal({ title, config, onClose }) {
   const { theme } = useTheme()
 
   return (
     <div className="modal-overlay" onClick={onClose}>
       <div className="modal-content amended-modal" onClick={e => e.stopPropagation()}>
         <div className="modal-header">
           <div>
-            <h2>System Configuration</h2>
+            <h2>{title}</h2>
           </div>
           ...
         </div>
         ...
       </div>
     </div>
   )
 }
-
-function EvalDataModal({ config, onClose }) {
-  // ... duplicate code removed
-}

Usage:

<YamlModal title="System Configuration" config={systemConfig} onClose={() => setShowSystemConfig(false)} />
<YamlModal title="Evaluation Data" config={evalDataConfig} onClose={() => setEvalDataConfig(null)} />

Also applies to: 114-149

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dashboard/src/App.jsx` around lines 47 - 82, SystemConfigModal and
EvalDataModal are duplicated; create a single reusable YamlModal component that
accepts props (title, config, onClose) and renders the same structure and
SyntaxHighlighter used in SystemConfigModal (preserve useTheme and theme-based
style selection), then replace both SystemConfigModal and EvalDataModal usages
with <YamlModal title="..." config={...} onClose={...}> to remove duplication
and keep behavior identical. Ensure YamlModal stops propagation on the inner
modal container, forwards the aria-label close button, and uses config.content
for the SyntaxHighlighter children so existing callers need only pass title and
config.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dashboard/src/App.jsx`:
- Around line 162-166: The fetch in the useEffect for '/api/system-config'
treats any HTTP response as success and swallows errors; update the fetch
handling inside the useEffect so you first check response.ok before calling
response.json(), handle non-ok responses by logging the status/text (using
console.error or a logger) and avoid calling setSystemConfig with error
payloads, and improve the catch block to log unexpected fetch/parsing errors;
locate the useEffect that calls fetch('/api/system-config') and adjust the
promise chain (or switch to async/await) to perform the response.ok check, then
setSystemConfig(data) only on success and log errors otherwise.

In `@dashboard/src/components/Evaluations.jsx`:
- Around line 646-881: The exportPdf function currently swallows all errors in
its try/finally and never notifies the user; update exportPdf to catch errors
(around the main work: CSV/YAML fetches, image loading, and PDF generation) and
surface them by calling the existing state updater or UI feedback mechanism
(e.g., setPdfExporting to null plus set an error state or call a notify/alert
function) so users see meaningful error messages; specifically wrap the body
inside exportPdf (the work after setting setPdfExporting) with try { ... } catch
(err) { /* record/log err and present a message using the component's error
state or notification helper */ } finally { setPdfExporting(null) } and include
error details (err.message) in the displayed message.
- Around line 597-616: openFile currently swallows fetch/parse errors by only
toggling setCsvLoading(false); update openFile to capture and surface errors: in
the try/catch and in Papa.parse's error handler, capture the thrown/parse error
and call a UI-facing error updater (e.g., setCsvError or a toast notifier) with
the error message and setCsvLoading(false); ensure setCsvData is not set on
failure and include the error text when fetching (res.ok check) before parsing
so failures from fetch also produce an error via the same setCsvError or
notification path.

In `@dashboard/src/components/ResultsTable.jsx`:
- Around line 104-108: The current numeric rendering coerces null/empty-string
to 0 via Number(val), causing absent numeric cells to show "0.000" instead of
"-". Update the block handling col.type === 'number' (where const num =
Number(val) and formatted is computed) to first treat null/undefined/'' as
missing and set formatted = '-' for those cases; otherwise parse Number(val) and
keep the existing Number.isFinite check and score-specific toFixed logic (use
col.key === 'score' as before).

In `@dashboard/src/components/RunPage.jsx`:
- Around line 37-40: The command preview is hard-coded to "eval.yaml" and
"results" even though /api/run-config returns evalDataPath and evalReportsPath;
update the code that handles the /api/run-config response (where
setSystemConfig, setApiKeySet, setTags are set) to also read and store
data.evalDataPath and data.evalReportsPath into state, and then change the
JSX/func that builds the preview command (the command preview rendering at the
current spot and the other occurrence around lines 138-140) to use those state
values instead of the literal "eval.yaml" and "results" so the UI shows the
exact paths the server will use.

---

Duplicate comments:
In `@dashboard/src/components/RunPage.jsx`:
- Around line 159-160: When fetching '/api/running-evals' in the RunPage
component, don't directly setRuns with the raw JSON response — validate it's an
array first; change the code around the fetch that uses runsRes and setRuns so
you parse await runsRes.json() into a variable (e.g., data) and only call
setRuns(data) if Array.isArray(data), otherwise leave runs unchanged (or handle
error); apply the same Array.isArray guard for the other fetch block around the
later fetch at the lines referencing runsRes / setRuns (the second fetch at
172-173).

In `@dashboard/vite.config.js`:
- Around line 473-475: configurePreviewServer currently only calls
registerSharedRoutes(server) so the run-related endpoints used by RunPage.jsx
(/api/run-config, /api/run-eval, /api/running-evals, /api/eval-stream,
/api/stop-eval) are not registered in preview; update configurePreviewServer to
also register the run endpoints (the same route registration invoked by
configureServer) — e.g., call the existing function that registers run routes
(or invoke registerRunRoutes/registerRunEndpoints if present) or factor out and
call the route-registration logic used for dev so preview and dev both register
the run API endpoints.

---

Nitpick comments:
In `@dashboard/src/App.jsx`:
- Around line 47-82: SystemConfigModal and EvalDataModal are duplicated; create
a single reusable YamlModal component that accepts props (title, config,
onClose) and renders the same structure and SyntaxHighlighter used in
SystemConfigModal (preserve useTheme and theme-based style selection), then
replace both SystemConfigModal and EvalDataModal usages with <YamlModal
title="..." config={...} onClose={...}> to remove duplication and keep behavior
identical. Ensure YamlModal stops propagation on the inner modal container,
forwards the aria-label close button, and uses config.content for the
SyntaxHighlighter children so existing callers need only pass title and config.

In `@dashboard/src/components/Evaluations.jsx`:
- Around line 38-73: Add an Escape key handler to the modal components (e.g.,
AmendedModal) so keyboard users can close the modal: inside the component use
useEffect to add a window keydown listener that calls onClose when event.key ===
'Escape', and clean up the listener on unmount; ensure the listener is added
only while the modal is mounted and avoid interfering with the existing onClick
overlay handler and modal-close button by simply invoking the same onClose prop.
- Around line 113-118: Add image error handling to the thumbnail rendering
inside the graphs.map loop so broken images show a fallback UI; update the <img>
used in the graph-thumb for each g (the img output of graphs.map) to handle load
errors by swapping to a fallback image URL or replacing the <img> with a
placeholder element and ensure the labelFromFilename(g) and onPreview(g)
behavior still work; implement this by adding an onError handler (or local state
per thumbnail) that sets a fallback src or toggles an error flag used to render
a placeholder with the existing graph-thumb-label.
- Around line 502-522: The current useEffect fires all fetches at once via
Promise.all(files.map(...)) which can overwhelm connections; change the effect
to run an async function (or async IIFE) that processes files in bounded batches
(e.g., const BATCH_SIZE = 5) and for each batch await Promise.all(batch.map(file
=> fetch(`/results/${file}`).then(...).catch(...))) so requests are limited
concurrently, accumulate results into the existing stats object (the same stats
variable used for Papa.parse results) across batches, and call
setFileStats(stats) after all batches complete; keep existing parsing with
Papa.parse, error handling, and references to files, setFileStats, and
Papa.parse intact.
- Around line 305-310: The "total rows" stat currently toggles deltaFilter to
'total' and shows as active but the filtering code (the function that reads
deltaFilter to decide which rows to include) doesn't handle 'total', so the UI
misleads users; fix by making the total stat purely informational: remove its
onClick that calls setDeltaFilter and remove the active class logic for the
compare-stat total span (leave the displayed count using data.rows.length), or
alternatively update the filter logic that reads deltaFilter (the same function
referencing deltaFilter, e.g., the block around the existing
improved/regressed/unchanged checks) to explicitly handle 'total' as equivalent
to no filter; reference symbols: deltaFilter, setDeltaFilter, compare-stat
total, data.rows.length, and the filtering function that currently checks
improved/regressed/unchanged.
- Around line 1305-1314: The color logic is unreachable for odd totals because
half = total / 2 can be non-integer; update the comparison to use integer
midpoint logic: compute half = total / 2 as before but compare passed against
Math.ceil(total / 2) for the yellow case (or use range-based checks), e.g. set
color = passed > half ? 'var(--green)' : passed === Math.ceil(total / 2) ?
'var(--yellow)' : 'var(--red)'; apply this change inside the inline renderer
that reads fileStats[file] and destructures passed and total to ensure odd
totals can hit the yellow branch.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 09852592-e4ba-4555-bbfb-54894210c123

📥 Commits

Reviewing files that changed from the base of the PR and between b9d3e4f and 46fd846.

📒 Files selected for processing (18)

dashboard/.vscode/settings.json
dashboard/Makefile
dashboard/README.md
dashboard/eval.yaml.old
dashboard/src/App.css
dashboard/src/App.jsx
dashboard/src/components/ChartSetup.js
dashboard/src/components/CollapsiblePanel.jsx
dashboard/src/components/DetailModal.jsx
dashboard/src/components/Evaluations.jsx
dashboard/src/components/FilterBar.jsx
dashboard/src/components/ResultsTable.jsx
dashboard/src/components/RunPage.jsx
dashboard/src/components/StackedBarChart.jsx
dashboard/src/hooks/useEvalData.js
dashboard/src/hooks/useFilters.js
dashboard/system.yaml.old
dashboard/vite.config.js

🚧 Files skipped from review as they are similar to previous changes (8)

dashboard/system.yaml.old
dashboard/src/components/StackedBarChart.jsx
dashboard/src/components/DetailModal.jsx
dashboard/eval.yaml.old
dashboard/src/components/CollapsiblePanel.jsx
dashboard/src/components/FilterBar.jsx
dashboard/src/hooks/useFilters.js
dashboard/src/hooks/useEvalData.js

dashboard/src/App.jsx

dashboard/src/components/Evaluations.jsx

dashboard/src/components/ResultsTable.jsx

dashboard/src/components/RunPage.jsx

rioloc force-pushed the feat/web-dashboard branch 2 times, most recently from b802697 to 3d98fd3 Compare February 25, 2026 23:51

rioloc changed the title ~~feat: add dashboard~~ [WIP] feat: add dashboard Feb 26, 2026

rioloc force-pushed the feat/web-dashboard branch 9 times, most recently from 157e742 to e6390ee Compare March 5, 2026 11:31

rioloc added 4 commits March 5, 2026 12:33

feat: add dashboard

b9cc092

feat: add flag to enable/disable run mode + show model tag

c23fc71

feat: added overall avg chart

36c6f2b

feat: updated README for dashboard

48dae78

rioloc force-pushed the feat/web-dashboard branch from e6390ee to 48dae78 Compare March 5, 2026 11:33

rioloc changed the title ~~[WIP] feat: add dashboard~~ feat: add web dashboard Mar 5, 2026

rioloc marked this pull request as ready for review March 5, 2026 11:35

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

rioloc marked this pull request as draft March 5, 2026 11:46

rioloc marked this pull request as ready for review March 6, 2026 17:06

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

rioloc force-pushed the feat/web-dashboard branch from 4ede703 to a4b0541 Compare March 6, 2026 17:22

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

rioloc force-pushed the feat/web-dashboard branch from a4b0541 to b9d3e4f Compare March 9, 2026 09:11

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

rioloc force-pushed the feat/web-dashboard branch from b9d3e4f to 46fd846 Compare March 9, 2026 15:44

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

rioloc added 2 commits March 9, 2026 17:29

feat(dashboard): addressed coderabbit suggestions

c11b1a0

feat: don't coerce missing numeric cells to zero

62f1f0d

rioloc force-pushed the feat/web-dashboard branch from f2fe08f to 62f1f0d Compare March 9, 2026 16:30

feat: default mode is api key disabled

0d77f6a

Conversation

rioloc commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

lightspeed-eval dashboard

Type of change

Preview

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

asamal4 commented Feb 26, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

rioloc commented Feb 25, 2026 •

edited

Loading

coderabbitai bot commented Feb 25, 2026 •

edited

Loading