Regenerate benchmarks/results.json during Pages build so the benchmarks page shows real data

## Problem

`https://silver-funicular-2qkwekw.pages.github.io/benchmarks.html` renders **"📊 No benchmark data available yet"** even though `benchmarks/` contains 600+ TS/Python benchmark pairs.

Root cause: `benchmarks/results.json` on `main` has been the 40-byte stub `{ "benchmarks": [], "timestamp": null }` since 2026-04-12. `.github/workflows/pages.yml` copies that stub into `playground/benchmarks/results.json`, and `benchmarks.html` renders the "no data" message when the `benchmarks` array is empty. Nothing in CI runs `benchmarks/run_benchmarks.sh` to populate the file.

## Why PR #154 didn't fix it

PR #154 moved benchmark execution into the autoloop program's Evaluation step, so every iteration was supposed to regenerate `results.json` and commit it. That fails in practice because the autoloop agent sandbox doesn't have `bun` and the `curl | bash` fallback doesn't produce a usable binary there. Every post-#154 iteration evaluates to metric = 0 → rejected → nothing commits → the perf-comparison autoloop is now stuck and can't ratchet at all (see run 24696210026 for the agent's own diagnosis).

## Fix

Regenerate `benchmarks/results.json` during the Pages build — not in the autoloop. Pages already triggers on push to `main`, so any benchmark change auto-publishes fresh data. No new workflow, no commit-back-to-main, no autoloop-agent plumbing needed.

### Change 1 — `.github/workflows/pages.yml`

Move the Python setup earlier, run the benchmark suite, then copy the regenerated `results.json` into the playground artifact. Drop the `if [ -f ... ]` guard since the file will always exist post-step.

```diff
       - name: Install dependencies
         run: bun install

       - name: Build library for browser
         run: bun build ./src/index.ts --outdir ./playground/dist --target browser --minify

       - name: Bundle TypeScript compiler for offline playground
         run: cp node_modules/typescript/lib/typescript.js ./playground/dist/typescript.js

+      - name: Setup Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      - name: Install Python dependencies
+        run: pip install pandas numpy
+
+      - name: Run benchmarks
+        run: bash benchmarks/run_benchmarks.sh
+
       - name: Copy benchmark results to playground
         run: |
           mkdir -p ./playground/benchmarks
-          if [ -f benchmarks/results.json ]; then
-            cp benchmarks/results.json ./playground/benchmarks/results.json
-          fi
+          cp benchmarks/results.json ./playground/benchmarks/results.json

-      - name: Setup Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: "3.12"
-
-      - name: Install Python dependencies
-        run: pip install pandas numpy
-
       - name: Validate Python playground examples
         run: python scripts/validate-python-examples.py playground/
```

### Change 2 — revert PR #154's evaluation changes in `.autoloop/programs/perf-comparison/program.md`

Restore the pre-#154 metric (file-count based) so the autoloop can ratchet again. Execution belongs in the Pages workflow, not the agent sandbox. Keep the per-iteration checklist changes from #154 that are orthogonal (e.g., any prompt clarifications); just revert the Evaluation section.

## Trade-offs

- **Pages builds get slower.** 600 pairs × (warmup + measured iterations) — plausibly 10–30 min per build. Runs only on push to `main`, so it doesn't block PRs. If it becomes painful, split with a matrix later.
- **Broken benchmarks surface late** (on merge to `main`) instead of in-PR. Acceptable because the autoloop still validates that benchmark files are syntactically valid Python/TS before landing.

## Acceptance

- `benchmarks.html` on the Pages site shows a populated benchmarks table after a merge to `main`.
- The perf-comparison autoloop accepts iterations again (visible in issue #130's experiment log).
- No new workflows added; no commits pushed back to `main` from CI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regenerate benchmarks/results.json during Pages build so the benchmarks page shows real data #158

Problem

Why PR #154 didn't fix it

Fix

Change 1 — `.github/workflows/pages.yml`

Change 2 — revert PR #154's evaluation changes in `.autoloop/programs/perf-comparison/program.md`

Trade-offs

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Regenerate benchmarks/results.json during Pages build so the benchmarks page shows real data #158

Description

Problem

Why PR #154 didn't fix it

Fix

Change 1 — .github/workflows/pages.yml

Change 2 — revert PR #154's evaluation changes in .autoloop/programs/perf-comparison/program.md

Trade-offs

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Change 1 — `.github/workflows/pages.yml`

Change 2 — revert PR #154's evaluation changes in `.autoloop/programs/perf-comparison/program.md`