perf(startup): add report-only startup measurement by humancto · Pull Request #149 · humancto/forge-lang

humancto · 2026-05-03T18:54:24Z

Summary

Adds a custom process-level startup timing harness for source, bytecode, native source-runtime, and bytecode AOT modes.
Wires a report-only CI job that builds release forge, builds libforge_lang.a, and measures startup without enforcing the <10ms target yet.
Documents measurement scope and why the roadmap checkbox remains unchecked until a hard native startup gate is justified.

Roadmap

Starts ROADMAP.md: Startup time: < 10ms (vs ~100ms for interpreter).
This PR establishes measurement and CI visibility only; it does not mark the roadmap item done.

Local startup sample

startup.source_run median_ms=4.169 p95_ms=4.667
startup.bytecode_run median_ms=3.858 p95_ms=4.088
startup.native_source_runtime median_ms=4.193 p95_ms=7.165
startup.aot_bytecode median_ms=4.309 p95_ms=4.547

Test plan

cargo fmt -- --check
rustfmt --check tools/startup_time.rs
cargo test
cargo clippy --all-targets -- -A clippy::approx_constant -A clippy::result_large_err -A clippy::only_used_in_recursion -A clippy::len_zero
cargo run -- --allow-run test tests/
cargo audit
rustc tools/startup_time.rs -O -o target/startup_time
FORGE_LIB_DIR=target/release ./target/startup_time --forge ./target/release/forge --warmups 1 --reps 5

Made with Cursor

Summary by CodeRabbit

New Features
- Added a startup time measurement tool to benchmark Forge across source, bytecode, native, and AOT execution modes.
Documentation
- Added a guide describing measurement modes, correctness checks, local usage, and CI reporting behavior.
Chores
- Integrated a report-only startup benchmark into CI to run measurements and surface results.
- Updated the changelog to document the startup measurement capability.

Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai · 2026-05-03T18:54:36Z

📝 Walkthrough

Walkthrough

Adds a startup-time measurement harness (Rust CLI), documentation and planning files, and a report-only CI job that builds Forge artifacts and runs the harness to measure process spawn-to-exit startup latency across four execution modes.

Changes

Startup Time Measurement Harness

Layer / File(s)	Summary
Data / Config `.planning/startup-time-under-10ms.plan.md`, `CHANGELOG.md`	Defines measurement scope, modes, correctness requirements, CI report-only policy, and records the new harness in the changelog.
Documentation `docs/performance/startup.md`	Documents measurement goals, modes, FORGE_LIB_DIR requirement, local run commands, and CI report-only behavior.
Core Implementation `tools/startup_time.rs`	New Rust CLI: parses `--forge`, `--reps`, `--warmups`; validates `FORGE_LIB_DIR`/`libforge_lang.a`; writes fixture; builds bytecode/native/AOT artifacts; runs warmups and timed child executions with 5s timeout; validates child stdout equals `ok`; computes min/median/p95/max (ms) per mode and reports results; cleans up temp workdir; exits nonzero on harness/child errors.
CI Wiring `.github/workflows/ci.yml`	Adds `startup-benchmark` job: checks out repo, installs Rust with caching, builds `cargo build --release --lib --bin forge`, compiles `tools/startup_time.rs` to `target/startup_time` (`rustc -O`), and runs it with `FORGE_LIB_DIR=target/release --warmups 2 --reps 20`.

Sequence Diagram

sequenceDiagram
  autonumber
  participant CI as CI Job / Dev Shell
  participant Harness as startup_time (parent)
  participant FS as Filesystem
  participant ForgeBuilder as cargo/rustc
  participant Forge as Forge binary (child)
  CI->>Harness: run ./target/startup_time --forge <path> --warmups 2 --reps 20
  Harness->>FS: create temp workdir + write fixture.fg
  Harness->>ForgeBuilder: run `forge build` (bytecode/native/AOT) with FORGE_LIB_DIR as needed
  ForgeBuilder->>FS: write built artifacts (bytecode, native, AOT)
  loop for each mode
    Harness->>Forge: spawn child process (mode-specific command)
    Forge-->>Harness: child exits with stdout "ok"
    Harness->>Harness: record duration
  end
  Harness->>CI: print per-mode stats (min/median/p95/max) and exit status

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat(native): support standalone HTTP server binaries #148 — Related: introduces/affects the standalone/native source-runtime behavior and artifacts (libforge_lang.a) exercised by the harness.

"🐰 I hop to time each tiny start,
A little ok warms my heart.
Warmups counted, reps in line,
Milliseconds fall in time.
Benchmarks hum — the meadow's bright!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding startup measurement harness as a report-only feature, matching the primary objective of providing CI visibility without enforcing the <10ms target yet.
Description check	✅ Passed	The description covers summary, changes, and includes a test plan. However, it deviates from the template by omitting the standard 'Changes', 'Testing', and 'Related Issues' sections in favor of custom sections like 'Roadmap' and 'Local startup sample'.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/startup-time-measurement

_{Review rate limit: 6/10 reviews remaining, refill in 23 minutes and 4 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Co-authored-by: Cursor <cursoragent@cursor.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d2d5070860

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-03T18:57:16Z

+
+fn checked_command(command: &mut Command, label: &str) -> Result<Output, String> {
+    let output = command
+        .output()


Enforce child-process timeout in benchmark harness

checked_command uses Command::output(), which waits indefinitely for the child to exit. If any measured mode hangs (for example from a startup deadlock/regression), the new startup-benchmark CI job will block until GitHub Actions kills the job at its global timeout instead of failing fast, which defeats the intended report signal and burns CI time. Please run children with an explicit deadline and terminate them on timeout.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Line 13: Update the CHANGELOG entry for "Startup time measurement harness" by
appending the required PR reference suffix in the format " ([`#PR`](link))" so it
reads "- **Startup time measurement harness** — `tools/startup_time.rs` …
correctness checks. CI runs it as a report-only signal before the `<10ms` native
startup target becomes a hard gate. ([`#PR`](link))"; replace "PR" and "link" with
the actual pull request number and its GitHub URL and ensure the entry remains
under the [Unreleased] section.

In `@tools/startup_time.rs`:
- Around line 219-223: The checked_command function currently uses
Command::output() which can block; replace it with Command::spawn() to get a
Child, then poll the child using Child::try_wait in a loop with a chosen timeout
duration (e.g. const TIMEOUT: Duration = Duration::from_secs(30)); if try_wait
returns Some(status) return the captured output via Child::wait and
Child::stdout/stderr handling, but if the timeout elapses call Child::kill(),
wait for termination with Child::wait(), and return an Err including the label
and that the process timed out; use Child::kill, Child::try_wait, and
Child::wait to implement the logic inside checked_command.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 77834cc7-57f2-40bd-b648-ea090700afee

📥 Commits

Reviewing files that changed from the base of the PR and between 7d1d295 and d2d5070.

📒 Files selected for processing (5)

.github/workflows/ci.yml
.planning/startup-time-under-10ms.plan.md
CHANGELOG.md
docs/performance/startup.md
tools/startup_time.rs

coderabbitai · 2026-05-03T18:58:00Z

 ### Added

 - **Standalone source-runtime native binaries for Forge servers** — `forge build --native` now links against `libforge_lang.a` when available and emits a single executable that embeds Forge source and starts interpreter-only runtime features like `@server` without shelling out to the `forge` CLI. `--aot` remains bytecode/VM-only and continues to reject decorator-driven servers with guidance to use `--native`.
+- **Startup time measurement harness** — `tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add the PR reference link to this changelog entry.

This entry is missing the required ([#PR](link)) suffix.

Suggested fix

-- **Startup time measurement harness** — `tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate. +- **Startup time measurement harness** — `tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate. ([`#149`](https://github.com/humancto/forge-lang/pull/149))

As per coding guidelines, `CHANGELOG.md` entries must use the format `- Description of change ([`#PR`](link))` under the `[Unreleased]` section.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- **Startup time measurement harness** — `tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate.

- **Startup time measurement harness** — `tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate. ([`#149`](https://github.com/humancto/forge-lang/pull/149))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` at line 13, Update the CHANGELOG entry for "Startup time measurement harness" by appending the required PR reference suffix in the format " ([`#PR`](link))" so it reads "- **Startup time measurement harness** — `tools/startup_time.rs` … correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate. ([`#PR`](link))"; replace "PR" and "link" with the actual pull request number and its GitHub URL and ensure the entry remains under the [Unreleased] section.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/startup_time.rs`:
- Line 290: The poll loop currently sleeps with
std::thread::sleep(Duration::from_millis(5)), which is too coarse for a <10ms
benchmark and can bias timings; change the backoff in that wait loop to a much
shorter or adaptive interval (e.g., start with microsecond sleeps and
progressively increase up to at most 1ms) so that the call site using try_wait()
doesn't introduce ~5ms of artificial latency—locate the sleep call
(std::thread::sleep(Duration::from_millis(5))) and replace it with a
short/adaptive backoff strategy (microseconds-first with an upper bound of 1ms
or exponential backoff capped at 1ms).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4fb60961-6dd1-4e2d-8d85-3518cc6e9327

📥 Commits

Reviewing files that changed from the base of the PR and between d2d5070 and e6bff65.

📒 Files selected for processing (2)

.planning/startup-time-under-10ms.plan.md
tools/startup_time.rs

coderabbitai · 2026-05-03T19:01:20Z

+                CHILD_TIMEOUT.as_secs_f64()
+            ));
+        }
+        std::thread::sleep(Duration::from_millis(5));


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Polling granularity is too coarse for a <10ms benchmark.

Line 290 sleeps for 5ms between try_wait() polls, which can add up to ~5ms artificial latency per run and distort results in the exact range this benchmark targets. Use a shorter/adaptive backoff (e.g., microseconds to 1ms max).

Suggested change

- std::thread::sleep(Duration::from_millis(5)); + // Keep timeout polling overhead below the measured startup budget. + std::thread::sleep(Duration::from_micros(200));

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/startup_time.rs` at line 290, The poll loop currently sleeps with std::thread::sleep(Duration::from_millis(5)), which is too coarse for a <10ms benchmark and can bias timings; change the backoff in that wait loop to a much shorter or adaptive interval (e.g., start with microsecond sleeps and progressively increase up to at most 1ms) so that the call site using try_wait() doesn't introduce ~5ms of artificial latency—locate the sleep call (std::thread::sleep(Duration::from_millis(5))) and replace it with a short/adaptive backoff strategy (microseconds-first with an upper bound of 1ms or exponential backoff capped at 1ms).

Archith and others added 2 commits May 3, 2026 11:48

plan: define startup time measurement pass

7652d13

Co-authored-by: Cursor <cursoragent@cursor.com>

perf(startup): add report-only startup measurement

d2d5070

Co-authored-by: Cursor <cursoragent@cursor.com>

fix(startup): add child timeout to measurement harness

e6bff65

Co-authored-by: Cursor <cursoragent@cursor.com>

chatgpt-codex-connector Bot reviewed May 3, 2026

View reviewed changes

coderabbitai Bot reviewed May 3, 2026

View reviewed changes

humancto merged commit b6ca67f into main May 3, 2026
13 checks passed

humancto deleted the perf/startup-time-measurement branch May 3, 2026 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(startup): add report-only startup measurement#149

perf(startup): add report-only startup measurement#149
humancto merged 3 commits into
mainfrom
perf/startup-time-measurement

humancto commented May 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 3, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 3, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 3, 2026

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	- Startup time measurement harness — `tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate.
	- Startup time measurement harness — `tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate. ([`#149`](https://github.com/humancto/forge-lang/pull/149))

Conversation

humancto commented May 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Roadmap

Local startup sample

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

humancto commented May 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 3, 2026 •

edited

Loading