Skip to content

perf(startup): add report-only startup measurement#149

Merged
humancto merged 3 commits into
mainfrom
perf/startup-time-measurement
May 3, 2026
Merged

perf(startup): add report-only startup measurement#149
humancto merged 3 commits into
mainfrom
perf/startup-time-measurement

Conversation

@humancto
Copy link
Copy Markdown
Owner

@humancto humancto commented May 3, 2026

Summary

  • Adds a custom process-level startup timing harness for source, bytecode, native source-runtime, and bytecode AOT modes.
  • Wires a report-only CI job that builds release forge, builds libforge_lang.a, and measures startup without enforcing the <10ms target yet.
  • Documents measurement scope and why the roadmap checkbox remains unchecked until a hard native startup gate is justified.

Roadmap

  • Starts ROADMAP.md: Startup time: < 10ms (vs ~100ms for interpreter).
  • This PR establishes measurement and CI visibility only; it does not mark the roadmap item done.

Local startup sample

  • startup.source_run median_ms=4.169 p95_ms=4.667
  • startup.bytecode_run median_ms=3.858 p95_ms=4.088
  • startup.native_source_runtime median_ms=4.193 p95_ms=7.165
  • startup.aot_bytecode median_ms=4.309 p95_ms=4.547

Test plan

  • cargo fmt -- --check
  • rustfmt --check tools/startup_time.rs
  • cargo test
  • cargo clippy --all-targets -- -A clippy::approx_constant -A clippy::result_large_err -A clippy::only_used_in_recursion -A clippy::len_zero
  • cargo run -- --allow-run test tests/
  • cargo audit
  • rustc tools/startup_time.rs -O -o target/startup_time
  • FORGE_LIB_DIR=target/release ./target/startup_time --forge ./target/release/forge --warmups 1 --reps 5

Made with Cursor

Summary by CodeRabbit

  • New Features

    • Added a startup time measurement tool to benchmark Forge across source, bytecode, native, and AOT execution modes.
  • Documentation

    • Added a guide describing measurement modes, correctness checks, local usage, and CI reporting behavior.
  • Chores

    • Integrated a report-only startup benchmark into CI to run measurements and surface results.
    • Updated the changelog to document the startup measurement capability.

Archith and others added 2 commits May 3, 2026 11:48
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

📝 Walkthrough

Walkthrough

Adds a startup-time measurement harness (Rust CLI), documentation and planning files, and a report-only CI job that builds Forge artifacts and runs the harness to measure process spawn-to-exit startup latency across four execution modes.

Changes

Startup Time Measurement Harness

Layer / File(s) Summary
Data / Config
.planning/startup-time-under-10ms.plan.md, CHANGELOG.md
Defines measurement scope, modes, correctness requirements, CI report-only policy, and records the new harness in the changelog.
Documentation
docs/performance/startup.md
Documents measurement goals, modes, FORGE_LIB_DIR requirement, local run commands, and CI report-only behavior.
Core Implementation
tools/startup_time.rs
New Rust CLI: parses --forge, --reps, --warmups; validates FORGE_LIB_DIR/libforge_lang.a; writes fixture; builds bytecode/native/AOT artifacts; runs warmups and timed child executions with 5s timeout; validates child stdout equals ok; computes min/median/p95/max (ms) per mode and reports results; cleans up temp workdir; exits nonzero on harness/child errors.
CI Wiring
.github/workflows/ci.yml
Adds startup-benchmark job: checks out repo, installs Rust with caching, builds cargo build --release --lib --bin forge, compiles tools/startup_time.rs to target/startup_time (rustc -O), and runs it with FORGE_LIB_DIR=target/release --warmups 2 --reps 20.

Sequence Diagram

sequenceDiagram
  autonumber
  participant CI as CI Job / Dev Shell
  participant Harness as startup_time (parent)
  participant FS as Filesystem
  participant ForgeBuilder as cargo/rustc
  participant Forge as Forge binary (child)
  CI->>Harness: run ./target/startup_time --forge <path> --warmups 2 --reps 20
  Harness->>FS: create temp workdir + write fixture.fg
  Harness->>ForgeBuilder: run `forge build` (bytecode/native/AOT) with FORGE_LIB_DIR as needed
  ForgeBuilder->>FS: write built artifacts (bytecode, native, AOT)
  loop for each mode
    Harness->>Forge: spawn child process (mode-specific command)
    Forge-->>Harness: child exits with stdout "ok"
    Harness->>Harness: record duration
  end
  Harness->>CI: print per-mode stats (min/median/p95/max) and exit status
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

"🐰 I hop to time each tiny start,
A little ok warms my heart.
Warmups counted, reps in line,
Milliseconds fall in time.
Benchmarks hum — the meadow's bright!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding startup measurement harness as a report-only feature, matching the primary objective of providing CI visibility without enforcing the <10ms target yet.
Description check ✅ Passed The description covers summary, changes, and includes a test plan. However, it deviates from the template by omitting the standard 'Changes', 'Testing', and 'Related Issues' sections in favor of custom sections like 'Roadmap' and 'Local startup sample'.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/startup-time-measurement

Review rate limit: 6/10 reviews remaining, refill in 23 minutes and 4 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d2d5070860

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/startup_time.rs

fn checked_command(command: &mut Command, label: &str) -> Result<Output, String> {
let output = command
.output()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce child-process timeout in benchmark harness

checked_command uses Command::output(), which waits indefinitely for the child to exit. If any measured mode hangs (for example from a startup deadlock/regression), the new startup-benchmark CI job will block until GitHub Actions kills the job at its global timeout instead of failing fast, which defeats the intended report signal and burns CI time. Please run children with an explicit deadline and terminate them on timeout.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Line 13: Update the CHANGELOG entry for "Startup time measurement harness" by
appending the required PR reference suffix in the format " ([`#PR`](link))" so it
reads "- **Startup time measurement harness** — `tools/startup_time.rs` …
correctness checks. CI runs it as a report-only signal before the `<10ms` native
startup target becomes a hard gate. ([`#PR`](link))"; replace "PR" and "link" with
the actual pull request number and its GitHub URL and ensure the entry remains
under the [Unreleased] section.

In `@tools/startup_time.rs`:
- Around line 219-223: The checked_command function currently uses
Command::output() which can block; replace it with Command::spawn() to get a
Child, then poll the child using Child::try_wait in a loop with a chosen timeout
duration (e.g. const TIMEOUT: Duration = Duration::from_secs(30)); if try_wait
returns Some(status) return the captured output via Child::wait and
Child::stdout/stderr handling, but if the timeout elapses call Child::kill(),
wait for termination with Child::wait(), and return an Err including the label
and that the process timed out; use Child::kill, Child::try_wait, and
Child::wait to implement the logic inside checked_command.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 77834cc7-57f2-40bd-b648-ea090700afee

📥 Commits

Reviewing files that changed from the base of the PR and between 7d1d295 and d2d5070.

📒 Files selected for processing (5)
  • .github/workflows/ci.yml
  • .planning/startup-time-under-10ms.plan.md
  • CHANGELOG.md
  • docs/performance/startup.md
  • tools/startup_time.rs

Comment thread CHANGELOG.md
### Added

- **Standalone source-runtime native binaries for Forge servers** — `forge build --native` now links against `libforge_lang.a` when available and emits a single executable that embeds Forge source and starts interpreter-only runtime features like `@server` without shelling out to the `forge` CLI. `--aot` remains bytecode/VM-only and continues to reject decorator-driven servers with guidance to use `--native`.
- **Startup time measurement harness** — `tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add the PR reference link to this changelog entry.

This entry is missing the required ([#PR](link)) suffix.

Suggested fix
-- **Startup time measurement harness** — `tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate.
+- **Startup time measurement harness** — `tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate. ([`#149`](https://github.com/humancto/forge-lang/pull/149))
As per coding guidelines, `CHANGELOG.md` entries must use the format `- Description of change ([`#PR`](link))` under the `[Unreleased]` section.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **Startup time measurement harness**`tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate.
- **Startup time measurement harness**`tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate. ([`#149`](https://github.com/humancto/forge-lang/pull/149))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` at line 13, Update the CHANGELOG entry for "Startup time
measurement harness" by appending the required PR reference suffix in the format
" ([`#PR`](link))" so it reads "- **Startup time measurement harness** —
`tools/startup_time.rs` … correctness checks. CI runs it as a report-only signal
before the `<10ms` native startup target becomes a hard gate. ([`#PR`](link))";
replace "PR" and "link" with the actual pull request number and its GitHub URL
and ensure the entry remains under the [Unreleased] section.

Comment thread tools/startup_time.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/startup_time.rs`:
- Line 290: The poll loop currently sleeps with
std::thread::sleep(Duration::from_millis(5)), which is too coarse for a <10ms
benchmark and can bias timings; change the backoff in that wait loop to a much
shorter or adaptive interval (e.g., start with microsecond sleeps and
progressively increase up to at most 1ms) so that the call site using try_wait()
doesn't introduce ~5ms of artificial latency—locate the sleep call
(std::thread::sleep(Duration::from_millis(5))) and replace it with a
short/adaptive backoff strategy (microseconds-first with an upper bound of 1ms
or exponential backoff capped at 1ms).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4fb60961-6dd1-4e2d-8d85-3518cc6e9327

📥 Commits

Reviewing files that changed from the base of the PR and between d2d5070 and e6bff65.

📒 Files selected for processing (2)
  • .planning/startup-time-under-10ms.plan.md
  • tools/startup_time.rs

Comment thread tools/startup_time.rs
CHILD_TIMEOUT.as_secs_f64()
));
}
std::thread::sleep(Duration::from_millis(5));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Polling granularity is too coarse for a <10ms benchmark.

Line 290 sleeps for 5ms between try_wait() polls, which can add up to ~5ms artificial latency per run and distort results in the exact range this benchmark targets. Use a shorter/adaptive backoff (e.g., microseconds to 1ms max).

Suggested change
-        std::thread::sleep(Duration::from_millis(5));
+        // Keep timeout polling overhead below the measured startup budget.
+        std::thread::sleep(Duration::from_micros(200));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/startup_time.rs` at line 290, The poll loop currently sleeps with
std::thread::sleep(Duration::from_millis(5)), which is too coarse for a <10ms
benchmark and can bias timings; change the backoff in that wait loop to a much
shorter or adaptive interval (e.g., start with microsecond sleeps and
progressively increase up to at most 1ms) so that the call site using try_wait()
doesn't introduce ~5ms of artificial latency—locate the sleep call
(std::thread::sleep(Duration::from_millis(5))) and replace it with a
short/adaptive backoff strategy (microseconds-first with an upper bound of 1ms
or exponential backoff capped at 1ms).

@humancto humancto merged commit b6ca67f into main May 3, 2026
13 checks passed
@humancto humancto deleted the perf/startup-time-measurement branch May 3, 2026 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant