From b109eceaf79a95977bf5f1addde017010c9a172e Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 21 Apr 2026 03:55:24 +0000 Subject: [PATCH 1/2] Initial plan From 549a1b237d39c937325196874a15bda1620e0938 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 21 Apr 2026 03:57:48 +0000 Subject: [PATCH 2/2] Regenerate benchmarks/results.json during Pages build; revert perf-comparison eval to file-count Agent-Logs-Url: https://github.com/githubnext/tsessebe/sessions/bc8c8dec-26dd-46a4-89e0-62e7f4d245c3 Co-authored-by: mrjf <180956+mrjf@users.noreply.github.com> --- .autoloop/programs/perf-comparison/program.md | 63 +++++-------------- .github/workflows/pages.yml | 15 ++--- 2 files changed, 22 insertions(+), 56 deletions(-) diff --git a/.autoloop/programs/perf-comparison/program.md b/.autoloop/programs/perf-comparison/program.md index 1108380d..8b15b72a 100644 --- a/.autoloop/programs/perf-comparison/program.md +++ b/.autoloop/programs/perf-comparison/program.md @@ -24,7 +24,7 @@ This is an open-ended program — it runs continuously, always adding the next b - Outputs the same JSON format 5. **Update `playground/benchmarks.html`** if needed to display the new function's comparison metrics. -The evaluation step (below) runs `benchmarks/run_benchmarks.sh` to execute **every** TS/Python benchmark pair and regenerates `benchmarks/results.json` with the real timing data. That regenerated file is what gets committed on a successful iteration, so when the autoloop branch is merged to `main`, the pages workflow (`.github/workflows/pages.yml`) picks up the real results and `playground/benchmarks.html` renders real comparison data instead of "No benchmark data available yet." +The autoloop iteration only needs to add the benchmark scripts; it does **not** need to run them or update `benchmarks/results.json`. The pages workflow (`.github/workflows/pages.yml`) executes `benchmarks/run_benchmarks.sh` on every push to `main` and publishes the regenerated `results.json` to the playground site, so real benchmark data appears on `playground/benchmarks.html` once the autoloop branch is merged. ### Key constraints @@ -50,58 +50,23 @@ Do NOT modify: ## Evaluation -The evaluation runs `benchmarks/run_benchmarks.sh`, which executes every TS/Python -benchmark pair and writes real timing data to `benchmarks/results.json`. The metric -is the number of benchmarks that appear in that regenerated file — i.e. the number -of function pairs whose benchmarks actually ran to completion and produced valid -JSON output. This means a benchmark pair is only "counted" if it truly runs, and -the committed `benchmarks/results.json` always reflects real data that the -`pages.yml` workflow will copy to the playground on merge to `main`. - ```bash -set -euo pipefail - -# Ensure Python and pandas are available +# Set up Python environment if needed if ! command -v python3 &>/dev/null; then - echo "ERROR: python3 is required but not found" >&2 - exit 1 -fi -python3 -c "import pandas" 2>/dev/null || pip3 install pandas --quiet - -# Ensure Bun is available (install if missing — autoloop runners may not have it). -# Failure to install Bun is logged but does not abort the script, because we must -# still emit the final metric line for autoloop to parse. -if ! command -v bun &>/dev/null; then - curl -fsSL https://bun.sh/install | bash || echo "WARN: bun install script failed" >&2 - export PATH="$HOME/.bun/bin:$PATH" -fi -if ! command -v bun &>/dev/null; then - echo "ERROR: bun is not available after install attempt; benchmarks will fail" >&2 + echo "Python3 not found, skipping" fi +pip3 install pandas --quiet 2>/dev/null || true + +# Count the number of benchmark pairs (functions with both TS and Python benchmarks) +ts_benchmarks=$(ls benchmarks/tsb/bench_*.ts 2>/dev/null | wc -l | tr -d ' ') +py_benchmarks=$(ls benchmarks/pandas/bench_*.py 2>/dev/null | wc -l | tr -d ' ') -# Install JS/TS dependencies so benchmark scripts can import from src/. -# `|| true` keeps the script alive so the final metric is still emitted; any -# errors are visible in the autoloop logs for debugging. -bun install --silent || echo "WARN: bun install failed; benchmarks may fail to import src/" >&2 - -# Run every benchmark pair and regenerate benchmarks/results.json with real data. -# This is the file .github/workflows/pages.yml copies into the playground, so -# committing it here is what makes real benchmark data appear on the pages site -# once the autoloop branch is merged to main. Output is left visible so -# per-benchmark failures can be diagnosed from autoloop logs; `|| true` ensures -# we still reach the metric emission below if the script exits nonzero. -bash benchmarks/run_benchmarks.sh || echo "WARN: run_benchmarks.sh exited nonzero" >&2 - -# Metric: number of benchmark entries in the regenerated results.json. -count=$(python3 -c " -import json -try: - with open('benchmarks/results.json') as f: - data = json.load(f) - print(len(data.get('benchmarks', []))) -except Exception: - print(0) -") +# The metric is the minimum of the two (both must exist for a complete benchmark) +if [ "$ts_benchmarks" -lt "$py_benchmarks" ]; then + count=$ts_benchmarks +else + count=$py_benchmarks +fi echo "{\"benchmarked_functions\": ${count:-0}}" ``` diff --git a/.github/workflows/pages.yml b/.github/workflows/pages.yml index 127a90d6..d5354bf3 100644 --- a/.github/workflows/pages.yml +++ b/.github/workflows/pages.yml @@ -36,13 +36,6 @@ jobs: - name: Bundle TypeScript compiler for offline playground run: cp node_modules/typescript/lib/typescript.js ./playground/dist/typescript.js - - name: Copy benchmark results to playground - run: | - mkdir -p ./playground/benchmarks - if [ -f benchmarks/results.json ]; then - cp benchmarks/results.json ./playground/benchmarks/results.json - fi - - name: Setup Python uses: actions/setup-python@v5 with: @@ -51,6 +44,14 @@ jobs: - name: Install Python dependencies run: pip install pandas numpy + - name: Run benchmarks + run: bash benchmarks/run_benchmarks.sh + + - name: Copy benchmark results to playground + run: | + mkdir -p ./playground/benchmarks + cp benchmarks/results.json ./playground/benchmarks/results.json + - name: Validate Python playground examples run: python scripts/validate-python-examples.py playground/