From 03de742eb2f0365b288035ce76f6e7ac8e2cbb3b Mon Sep 17 00:00:00 2001 From: Manuel Saelices Date: Sun, 22 Mar 2026 20:51:15 +0100 Subject: [PATCH 1/7] [Skills] Add mojo-stdlib-contributing skill Add a skill capturing patterns and pitfalls for contributing to the Mojo standard library, distilled from 30+ reviewed PRs. Covers process, assertion semantics, optimization verification with compile_info, benchmarks, testing, SIMD safety, and Writable trait patterns. Signed-off-by: Manuel Saelices --- README.md | 9 ++ mojo-stdlib-contributing/SKILL.md | 169 ++++++++++++++++++++++++++++++ 2 files changed, 178 insertions(+) create mode 100644 mojo-stdlib-contributing/SKILL.md diff --git a/README.md b/README.md index 6775602..100ce0e 100644 --- a/README.md +++ b/README.md @@ -82,6 +82,15 @@ triggered when Python types are used Mojo or a Python module needs to interact with Mojo code. Many capabilities of Mojo - Python interoperability are fairly new, and existing coding agents don't handle them correctly without guidance. +### `mojo-stdlib-contributing` + +[This skill](mojo-stdlib-contributing/SKILL.md) captures patterns and pitfalls +for contributing to the Mojo standard library, distilled from 30+ reviewed PRs. +It covers process (GitHub issues before new APIs, draft PRs), assertion +semantics (`assert_mode="safe"` is intentional), benchmark patterns, testing +requirements, SIMD/memory safety, and common reviewer feedback. Use when +modifying code under `mojo/stdlib/`. + ## Examples Once these skills are installed, you can use them for many common tasks. diff --git a/mojo-stdlib-contributing/SKILL.md b/mojo-stdlib-contributing/SKILL.md new file mode 100644 index 0000000..8b1ce67 --- /dev/null +++ b/mojo-stdlib-contributing/SKILL.md @@ -0,0 +1,169 @@ +--- +name: mojo-stdlib-contributing +description: Patterns and pitfalls for contributing to the Mojo standard library. Use when modifying code under mojo/stdlib/, writing tests or benchmarks for stdlib, or preparing PRs to the modular/modular repository. Distilled from 30+ reviewed PRs. +--- + + + +## Process — before writing code + +- **New APIs require a GitHub issue first.** Do not add methods to existing types (String, List, Deque, etc.) without prior consensus. Python parity alone is not justification. Open the issue, then create a draft PR linking it. +- **Keep PRs minimal and focused.** One logical change per PR. Never mix unrelated changes (e.g. don't add `@always_inline` to Set/Dict in a Deque fix PR). Benchmark utilities and their usage belong in separate PRs. +- **Always create draft PRs.** Never mark a PR as ready for review yourself. +- **Branch from `upstream/main`.** Never work on `main` directly. Each PR branch must contain only commits for that specific change. + +## Assertion semantics — critical + +The `assert` statement desugars to `debug_assert()` with `assert_mode="none"`. The default `ASSERT_MODE` is `"safe"` (from `get_defined_string["ASSERT", "safe"]()`). This means: + +| Form | Runs when `ASSERT_MODE="safe"` (default) | Runs when `-D ASSERT=all` | +|---|---|---| +| `debug_assert[assert_mode="safe"](...)` | Yes | Yes | +| `debug_assert(...)` (default `assert_mode="none"`) | No | Yes | +| `assert cond, msg` (desugars to above) | No | Yes | + +**Do NOT downgrade `debug_assert[assert_mode="safe"]` to `debug_assert` or `assert`.** These are intentional safety invariants. The maintainers want aggressive checking on operations like `byte=` indexing. Users who need to bypass safety should use the existing unsafe escape hatches (e.g. `.as_bytes().unsafe_get(idx)` for byte access). + +## Optimizations — verify codegen and benchmark before submitting + +**Every optimization PR must include both IR evidence and benchmark results.** + +### Verify codegen with `compile_info` + +Use `std.compile.compile_info` to inspect the generated IR *before* and *after* your change. If the IR is identical, the optimization is a no-op regardless of what the source looks like. + +```mojo +from std.compile import compile_info + +# Define the function to inspect +def my_function(x: SIMD[DType.float32, 4]) -> SIMD[DType.float32, 4]: + return x + x + +# Inspect optimized LLVM IR +comptime info = compile_info[my_function, emission_kind="llvm-opt"]() +print(info) # prints optimized IR + +# Check for specific instructions +assert "fadd" in compile_info[my_function, emission_kind="llvm-opt"]() + +# Write IR to file for detailed comparison +compile_info[my_function, emission_kind="llvm-opt"]().write_text("after.ll") +``` + +Supported `emission_kind` values: + +| Kind | Output | +|---|---| +| `"asm"` | Assembly (default) | +| `"llvm"` | Unoptimized LLVM IR | +| `"llvm-opt"` | Optimized LLVM IR (use this to compare) | +| `"llvm-bitcode"` | LLVM bitcode | + +**Workflow for optimization PRs:** +1. Write a small test that calls `compile_info` on the function before your change. Save the IR. +2. Apply your change. +3. Compare IR. If identical, the optimization does nothing. Do not submit. +4. If IR differs, run benchmarks to confirm the improvement is measurable. + +### Common pitfalls + +- **Do not add manual fast paths that LLVM already optimizes.** For example, `1 if b < 128 else _utf8_first_byte_sequence_length(b)` produces identical IR to just calling `_utf8_first_byte_sequence_length(b)` because LLVM sees through the branch. +- **Measure before optimizing compile-time-evaluated paths.** Format strings may be evaluated at compile time, so runtime SIMD scanning provides no benefit. +- **Verify alignment claims with data.** Cache-line alignment (e.g. 64-byte `@align`) requires benchmark evidence. Do not apply speculatively. +- **Include before/after comparisons in the PR description.** Show the benchmark numbers and, if relevant, the IR diff. + +## Code design + +- **Reuse existing stdlib primitives.** For SIMD byte search, use `_memchr`/`_memchr_impl` rather than writing a new loop. If a new algorithm is better, update the existing primitive so all callers benefit. Use `clamp`, `Int64.__xor__`, etc. instead of reimplementing. +- **Implement fast paths on the lowest-level type** (`Span`), not higher-level types (`List`). Then delegate: `List.__contains__` calls `Span.__contains__`. +- **Generalize type constraints.** Don't hard-code a single type (e.g. `Byte`). Use trait conformance like `TrivialRegisterPassable` to cover all applicable types. +- **Use move semantics (`^`)** when transferring ownership, not `.copy()`. +- **Use `uninit_copy_n`, `uninit_move_n`, `destroy_n`** for bulk operations on trivial types instead of manual loops. +- **Prefer lazy evaluation (iterators) over eager allocation** for collection operations like slicing. E.g., `deque[0:1:2]` should return an iterator, not allocate a new `Deque`. +- **Check if traits synthesize default methods** before writing explicit implementations. E.g., `Equatable` may already synthesize `__ne__` from `__eq__`. +- **Lift constants into traits** when they vary across implementations (e.g., `MAX_NAME_SIZE` into a `_DirentLike` trait). +- **Keep t-string expressions simple.** Assign complex sub-expressions to local variables before interpolating. +- **Do not add redundant logic** just for Python API parity if Mojo already has a better primitive. +- **Question two-pass patterns.** A "collect then delete" approach with an extra allocation may be slower than a single-pass approach. Always benchmark to confirm. +- **Ensure PR description matches reality.** If `discard()` internally uses try/except, don't claim "avoids exception overhead". + +## Benchmarks + +```mojo +# Correct benchmark pattern: +@parameter +def bench_something(mut b: Bencher) raises: + var data = setup_data() + @always_inline + def call_fn() unified {read}: + var result = black_box(data).some_operation(black_box(arg)) + keep(result) + b.iter(call_fn) +``` + +- **Always wrap inputs with `black_box()` and results with `keep()`** to prevent the compiler from optimizing away the benchmark. +- **Do not include construction inside the hot loop.** Setup goes before `call_fn`. Use `iter_with_setup` for destructive benchmarks. +- **Data must match the described scenario.** If the docstring says "element in the middle", verify it is actually there. +- **Use `comptime` instead of `@parameter`** for compile-time tuples/loops in benchmark `main()`. (`@parameter` is still used as a function decorator on benchmark functions themselves.) +- **Setup functions passed to `iter_with_setup` must not raise.** +- **Provide benchmarks for performance PRs**, covering both small and large inputs. + +## Testing + +- **Add unicode test cases** for any string/byte operation. +- **Cover both hit and miss paths** in search/contains tests. +- **Add edge cases:** `start > end`, out-of-bounds indices, empty input, exact-width (no padding). +- **Use `debug_assert`** for internal preconditions on values that must be non-negative by invariant. +- **Tests for `Writable` must use `check_write_to`**, not `String(x)` or `repr(x)`. + +## SIMD / memory safety + +- **Validate pointer arithmetic** before low-level memory access. Clamp/normalize `[start, end]` into `[0, len]` before `memcmp`. +- **Check for negative `pos_in_block`** when computing leading/trailing zeros in reverse SIMD scans. +- **Verify `vectorized_end`** accounts for full SIMD block width to prevent OOB reads. +- **SIMD test comments must be platform-agnostic.** Write "exercises both SIMD path and scalar tail" instead of "one full 64-byte SIMD block + scalar tail". +- **Consider endianness** when working with bit-level SIMD operations (e.g. `count_trailing_zeros` on packed bitmasks). Implementations may need a branch for big-endian targets. + +## Writable trait patterns + +```mojo +# Correct signature (not generic [W: Writer]): +def write_to(self, mut writer: Some[Writer]): + writer.write_string("literal") # not writer.write("literal") + writer.write(self.field) # non-literals use .write() + +def write_repr_to(self, mut writer: Some[Writer]): + writer.write("TypeName.", self) # include type name for enums +``` + +- **`write_to` and `write_repr_to` must not allocate.** No `repr()`, `String(x)`, or heap-allocating calls. +- **Use `FormatStruct`** from `format._utils` for consistent repr formatting of structured types. +- **Add an explicit `else` fallback** in enum-style `write_to`. +- **`Stringable` and `Representable` are deprecated.** Only implement `Writable`. Do not add `__str__` or `__repr__`. + +## Changelog + +- **Do NOT add changelog entries for NFC/implementation-detail changes.** Only user-facing behavior changes belong in the changelog. Internal optimization, refactoring, or performance improvements that don't change the public API are not changelisted. + +## Build and lint + +```bash +# Build stdlib +./bazelw build //mojo/stdlib/std + +# Run specific tests +./bazelw test //mojo/stdlib/test/collections/string:test_string_slice.mojo.test + +# Run all stdlib tests +./bazelw test mojo/stdlib/test/... + +# Format check (run before pushing) +./bazelw run format +``` + +- **Always run `./bazelw run format` before pushing.** The CI lint check will reject unformatted code. +- **Sign commits** with `git commit -s`. +- **Include `Assisted-by: AI`** as a trailer in every PR description per `AI_TOOL_POLICY.md`. From cd08f91a63371a0b590c027c0815e63e874f97df Mon Sep 17 00:00:00 2001 From: Manuel Saelices Date: Sun, 22 Mar 2026 20:55:59 +0100 Subject: [PATCH 2/7] [Skills] Ask human to review before marking PR as ready Maintainer review cycles are expensive. The skill should instruct the agent to ask the human to review the diff, PR description, and benchmarks before converting from draft to ready for review. Signed-off-by: Manuel Saelices --- mojo-stdlib-contributing/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mojo-stdlib-contributing/SKILL.md b/mojo-stdlib-contributing/SKILL.md index 8b1ce67..7dc33c0 100644 --- a/mojo-stdlib-contributing/SKILL.md +++ b/mojo-stdlib-contributing/SKILL.md @@ -12,7 +12,7 @@ patterns that are non-obvious or that a model would get wrong. --> - **New APIs require a GitHub issue first.** Do not add methods to existing types (String, List, Deque, etc.) without prior consensus. Python parity alone is not justification. Open the issue, then create a draft PR linking it. - **Keep PRs minimal and focused.** One logical change per PR. Never mix unrelated changes (e.g. don't add `@always_inline` to Set/Dict in a Deque fix PR). Benchmark utilities and their usage belong in separate PRs. -- **Always create draft PRs.** Never mark a PR as ready for review yourself. +- **Always create draft PRs.** Never mark a PR as ready for review yourself. Before the human marks it as ready, ask them to review the diff, PR description, and benchmark results carefully. Maintainer review cycles are expensive -- catching issues before requesting review avoids wasted rounds. - **Branch from `upstream/main`.** Never work on `main` directly. Each PR branch must contain only commits for that specific change. ## Assertion semantics — critical From 6469dd93b1f19c99f3e080a0618cf38188a9dc96 Mon Sep 17 00:00:00 2001 From: Manuel Saelices Date: Sun, 22 Mar 2026 21:00:23 +0100 Subject: [PATCH 3/7] [Skills] Simplify and align with existing skill conventions - Use standard editorial comment template from other skills - Add opening paragraph consistent with mojo-syntax/gpu-fundamentals - Remove content duplicated from mojo-syntax (Writable basics, deprecated APIs, comptime vs @parameter, move semantics) - Remove general git/build commands already in CLAUDE.md - Consolidate compile_info section, cut niche bullets - Merge changelog into process section - Reorder: process, assertions, optimizations, design, benchmarks, testing, SIMD, writable - Reduce from 169 to 111 lines (34% smaller) Signed-off-by: Manuel Saelices --- mojo-stdlib-contributing/SKILL.md | 154 ++++++++++-------------------- 1 file changed, 48 insertions(+), 106 deletions(-) diff --git a/mojo-stdlib-contributing/SKILL.md b/mojo-stdlib-contributing/SKILL.md index 7dc33c0..8a6ec3c 100644 --- a/mojo-stdlib-contributing/SKILL.md +++ b/mojo-stdlib-contributing/SKILL.md @@ -4,99 +4,78 @@ description: Patterns and pitfalls for contributing to the Mojo standard library --- +This file is loaded into an agent's context window as a correction layer for +pretrained contribution knowledge. Every line costs context. When editing: +- Be terse. Use tables and inline code over prose where possible. +- Never duplicate information from the mojo-syntax skill. +- Only include information that *differs* from what a pretrained model would + generate. Don't document things models already get right. +- Prefer one consolidated code block over multiple small ones. +- If adding a new section, ask: "Would a model get this wrong?" If not, skip it. +These same principles apply to any files this skill references. +--> -## Process — before writing code +Contributing to the Mojo stdlib has non-obvious patterns that differ from typical open-source projects. **Always follow this skill to avoid common rejection reasons.** -- **New APIs require a GitHub issue first.** Do not add methods to existing types (String, List, Deque, etc.) without prior consensus. Python parity alone is not justification. Open the issue, then create a draft PR linking it. -- **Keep PRs minimal and focused.** One logical change per PR. Never mix unrelated changes (e.g. don't add `@always_inline` to Set/Dict in a Deque fix PR). Benchmark utilities and their usage belong in separate PRs. -- **Always create draft PRs.** Never mark a PR as ready for review yourself. Before the human marks it as ready, ask them to review the diff, PR description, and benchmark results carefully. Maintainer review cycles are expensive -- catching issues before requesting review avoids wasted rounds. -- **Branch from `upstream/main`.** Never work on `main` directly. Each PR branch must contain only commits for that specific change. +## Process -## Assertion semantics — critical +- **New APIs require a GitHub issue first.** Do not implement new methods on existing types without prior consensus. Open the issue, then create a draft PR linking it. +- **Keep PRs focused.** One logical change per PR. Benchmark utilities and their usage belong in separate PRs. +- **Always create draft PRs.** Never mark as ready yourself. Ask the human to review the diff, description, and benchmarks before requesting maintainer review. +- **No changelog entries for internal changes.** Only user-facing behavior changes belong in the changelog. -The `assert` statement desugars to `debug_assert()` with `assert_mode="none"`. The default `ASSERT_MODE` is `"safe"` (from `get_defined_string["ASSERT", "safe"]()`). This means: +## Assertion semantics -| Form | Runs when `ASSERT_MODE="safe"` (default) | Runs when `-D ASSERT=all` | +The default `ASSERT_MODE` is `"safe"` (from `get_defined_string["ASSERT", "safe"]()`). + +| Form | Runs by default (`"safe"`) | Runs with `-D ASSERT=all` | |---|---|---| | `debug_assert[assert_mode="safe"](...)` | Yes | Yes | -| `debug_assert(...)` (default `assert_mode="none"`) | No | Yes | -| `assert cond, msg` (desugars to above) | No | Yes | +| `debug_assert(...)` / `assert cond, msg` | No | Yes | -**Do NOT downgrade `debug_assert[assert_mode="safe"]` to `debug_assert` or `assert`.** These are intentional safety invariants. The maintainers want aggressive checking on operations like `byte=` indexing. Users who need to bypass safety should use the existing unsafe escape hatches (e.g. `.as_bytes().unsafe_get(idx)` for byte access). +**Do NOT downgrade `debug_assert[assert_mode="safe"]`.** These are intentional safety invariants, not performance bugs. -## Optimizations — verify codegen and benchmark before submitting +## Optimizations **Every optimization PR must include both IR evidence and benchmark results.** -### Verify codegen with `compile_info` - -Use `std.compile.compile_info` to inspect the generated IR *before* and *after* your change. If the IR is identical, the optimization is a no-op regardless of what the source looks like. +Use `std.compile.compile_info` to compare generated IR before and after your change: ```mojo from std.compile import compile_info -# Define the function to inspect def my_function(x: SIMD[DType.float32, 4]) -> SIMD[DType.float32, 4]: return x + x -# Inspect optimized LLVM IR +# Inspect optimized LLVM IR ("llvm" for unoptimized, "asm" for assembly) comptime info = compile_info[my_function, emission_kind="llvm-opt"]() -print(info) # prints optimized IR +print(info) -# Check for specific instructions +# Pattern-match on IR content assert "fadd" in compile_info[my_function, emission_kind="llvm-opt"]() - -# Write IR to file for detailed comparison -compile_info[my_function, emission_kind="llvm-opt"]().write_text("after.ll") ``` -Supported `emission_kind` values: - -| Kind | Output | -|---|---| -| `"asm"` | Assembly (default) | -| `"llvm"` | Unoptimized LLVM IR | -| `"llvm-opt"` | Optimized LLVM IR (use this to compare) | -| `"llvm-bitcode"` | LLVM bitcode | - -**Workflow for optimization PRs:** -1. Write a small test that calls `compile_info` on the function before your change. Save the IR. -2. Apply your change. -3. Compare IR. If identical, the optimization does nothing. Do not submit. -4. If IR differs, run benchmarks to confirm the improvement is measurable. +**Workflow:** save IR before your change, apply change, compare. If IR is identical, the optimization is a no-op. Do not submit. If IR differs, run benchmarks to confirm measurable improvement. -### Common pitfalls - -- **Do not add manual fast paths that LLVM already optimizes.** For example, `1 if b < 128 else _utf8_first_byte_sequence_length(b)` produces identical IR to just calling `_utf8_first_byte_sequence_length(b)` because LLVM sees through the branch. -- **Measure before optimizing compile-time-evaluated paths.** Format strings may be evaluated at compile time, so runtime SIMD scanning provides no benefit. -- **Verify alignment claims with data.** Cache-line alignment (e.g. 64-byte `@align`) requires benchmark evidence. Do not apply speculatively. -- **Include before/after comparisons in the PR description.** Show the benchmark numbers and, if relevant, the IR diff. +- **Do not add manual fast paths that LLVM already optimizes.** E.g., `1 if b < 128 else _utf8_first_byte_sequence_length(b)` produces identical IR to just calling the function. +- **Verify alignment claims with benchmarks.** Cache-line alignment requires evidence. +- **Include before/after benchmark numbers in the PR description.** ## Code design -- **Reuse existing stdlib primitives.** For SIMD byte search, use `_memchr`/`_memchr_impl` rather than writing a new loop. If a new algorithm is better, update the existing primitive so all callers benefit. Use `clamp`, `Int64.__xor__`, etc. instead of reimplementing. -- **Implement fast paths on the lowest-level type** (`Span`), not higher-level types (`List`). Then delegate: `List.__contains__` calls `Span.__contains__`. -- **Generalize type constraints.** Don't hard-code a single type (e.g. `Byte`). Use trait conformance like `TrivialRegisterPassable` to cover all applicable types. -- **Use move semantics (`^`)** when transferring ownership, not `.copy()`. -- **Use `uninit_copy_n`, `uninit_move_n`, `destroy_n`** for bulk operations on trivial types instead of manual loops. -- **Prefer lazy evaluation (iterators) over eager allocation** for collection operations like slicing. E.g., `deque[0:1:2]` should return an iterator, not allocate a new `Deque`. -- **Check if traits synthesize default methods** before writing explicit implementations. E.g., `Equatable` may already synthesize `__ne__` from `__eq__`. -- **Lift constants into traits** when they vary across implementations (e.g., `MAX_NAME_SIZE` into a `_DirentLike` trait). -- **Keep t-string expressions simple.** Assign complex sub-expressions to local variables before interpolating. -- **Do not add redundant logic** just for Python API parity if Mojo already has a better primitive. -- **Question two-pass patterns.** A "collect then delete" approach with an extra allocation may be slower than a single-pass approach. Always benchmark to confirm. -- **Ensure PR description matches reality.** If `discard()` internally uses try/except, don't claim "avoids exception overhead". +- **Reuse existing stdlib primitives.** Use `_memchr`/`_memchr_impl`, `clamp`, etc. rather than reimplementing. If a new algorithm is better, update the existing primitive. +- **Implement fast paths on `Span`**, not `List`. Then delegate upward. +- **Generalize type constraints.** Use trait conformance (e.g. `TrivialRegisterPassable`) rather than hard-coding a single type. +- **Use move semantics (`^`)**, not `.copy()`. Use `uninit_copy_n`, `uninit_move_n`, `destroy_n` for bulk operations. +- **Question two-pass patterns.** A "collect then delete" approach may be slower than single-pass. Benchmark to confirm. ## Benchmarks ```mojo -# Correct benchmark pattern: +# CORRECT @parameter def bench_something(mut b: Bencher) raises: - var data = setup_data() + var data = setup_data() # setup outside hot loop @always_inline def call_fn() unified {read}: var result = black_box(data).some_operation(black_box(arg)) @@ -104,66 +83,29 @@ def bench_something(mut b: Bencher) raises: b.iter(call_fn) ``` -- **Always wrap inputs with `black_box()` and results with `keep()`** to prevent the compiler from optimizing away the benchmark. -- **Do not include construction inside the hot loop.** Setup goes before `call_fn`. Use `iter_with_setup` for destructive benchmarks. -- **Data must match the described scenario.** If the docstring says "element in the middle", verify it is actually there. -- **Use `comptime` instead of `@parameter`** for compile-time tuples/loops in benchmark `main()`. (`@parameter` is still used as a function decorator on benchmark functions themselves.) -- **Setup functions passed to `iter_with_setup` must not raise.** +- **Always `black_box()` inputs and `keep()` results.** Otherwise the compiler may optimize away the benchmark. +- **Setup goes outside `call_fn`.** Use `iter_with_setup` for destructive benchmarks. Setup must not raise. +- **Data must match the described scenario.** If the docstring says "element in the middle", verify it. - **Provide benchmarks for performance PRs**, covering both small and large inputs. ## Testing - **Add unicode test cases** for any string/byte operation. - **Cover both hit and miss paths** in search/contains tests. -- **Add edge cases:** `start > end`, out-of-bounds indices, empty input, exact-width (no padding). -- **Use `debug_assert`** for internal preconditions on values that must be non-negative by invariant. +- **Add edge cases:** `start > end`, out-of-bounds indices, empty input, exact-width. - **Tests for `Writable` must use `check_write_to`**, not `String(x)` or `repr(x)`. ## SIMD / memory safety -- **Validate pointer arithmetic** before low-level memory access. Clamp/normalize `[start, end]` into `[0, len]` before `memcmp`. -- **Check for negative `pos_in_block`** when computing leading/trailing zeros in reverse SIMD scans. +- **Validate pointer arithmetic** before low-level memory access. Clamp `[start, end]` into `[0, len]` before `memcmp`. +- **Check for negative `pos_in_block`** in reverse SIMD scans with `count_leading_zeros`. - **Verify `vectorized_end`** accounts for full SIMD block width to prevent OOB reads. -- **SIMD test comments must be platform-agnostic.** Write "exercises both SIMD path and scalar tail" instead of "one full 64-byte SIMD block + scalar tail". -- **Consider endianness** when working with bit-level SIMD operations (e.g. `count_trailing_zeros` on packed bitmasks). Implementations may need a branch for big-endian targets. - -## Writable trait patterns +- **SIMD test comments must be platform-agnostic.** Write "exercises both SIMD path and scalar tail" instead of assuming 64-byte width. -```mojo -# Correct signature (not generic [W: Writer]): -def write_to(self, mut writer: Some[Writer]): - writer.write_string("literal") # not writer.write("literal") - writer.write(self.field) # non-literals use .write() +## Writable trait (stdlib-specific additions to mojo-syntax) -def write_repr_to(self, mut writer: Some[Writer]): - writer.write("TypeName.", self) # include type name for enums -``` +The `mojo-syntax` skill covers basic `Writable` patterns. Additional stdlib-contributing rules: -- **`write_to` and `write_repr_to` must not allocate.** No `repr()`, `String(x)`, or heap-allocating calls. +- **`write_to` and `write_repr_to` must not allocate.** No `repr()`, `String(x)`, or heap-allocating calls inside these methods. - **Use `FormatStruct`** from `format._utils` for consistent repr formatting of structured types. - **Add an explicit `else` fallback** in enum-style `write_to`. -- **`Stringable` and `Representable` are deprecated.** Only implement `Writable`. Do not add `__str__` or `__repr__`. - -## Changelog - -- **Do NOT add changelog entries for NFC/implementation-detail changes.** Only user-facing behavior changes belong in the changelog. Internal optimization, refactoring, or performance improvements that don't change the public API are not changelisted. - -## Build and lint - -```bash -# Build stdlib -./bazelw build //mojo/stdlib/std - -# Run specific tests -./bazelw test //mojo/stdlib/test/collections/string:test_string_slice.mojo.test - -# Run all stdlib tests -./bazelw test mojo/stdlib/test/... - -# Format check (run before pushing) -./bazelw run format -``` - -- **Always run `./bazelw run format` before pushing.** The CI lint check will reject unformatted code. -- **Sign commits** with `git commit -s`. -- **Include `Assisted-by: AI`** as a trailer in every PR description per `AI_TOOL_POLICY.md`. From f65062182783a4fdf17783719d37a913c02ff63c Mon Sep 17 00:00:00 2001 From: Manuel Saelices Date: Sun, 22 Mar 2026 21:29:19 +0100 Subject: [PATCH 4/7] [Skills] Add references to contributing docs in modular repo Signed-off-by: Manuel Saelices --- mojo-stdlib-contributing/SKILL.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mojo-stdlib-contributing/SKILL.md b/mojo-stdlib-contributing/SKILL.md index 8a6ec3c..52ddc4f 100644 --- a/mojo-stdlib-contributing/SKILL.md +++ b/mojo-stdlib-contributing/SKILL.md @@ -17,6 +17,8 @@ These same principles apply to any files this skill references. Contributing to the Mojo stdlib has non-obvious patterns that differ from typical open-source projects. **Always follow this skill to avoid common rejection reasons.** +**Read before contributing:** `mojo/CONTRIBUTING.md` (accepted/avoided changes, proposal process), `mojo/stdlib/docs/development.md` (building, testing), `mojo/stdlib/docs/style-guide.md` (coding standards), and `AI_TOOL_POLICY.md` (labeling, human-in-the-loop requirements). + ## Process - **New APIs require a GitHub issue first.** Do not implement new methods on existing types without prior consensus. Open the issue, then create a draft PR linking it. From 34f8ca21074404d14a96e5d1c7685a5a60794475 Mon Sep 17 00:00:00 2001 From: Manuel Saelices Date: Wed, 13 May 2026 09:45:49 +0200 Subject: [PATCH 5/7] [Skills] Add `mojo-stdlib-review` skill Adds an adversarial red-team review skill for Mojo stdlib PRs. The skill spawns one parallel agent per PR that: - Reads the full diff and extracts every verifiable claim across 14 categories (process, API design, trait conformance, docstrings, tests, iterators, benchmarks, optimizations, code reuse, memory/lifetime, asserts, style, removed-line side effects, renames). - Verifies each claim against the actual stdlib source and the rules in `mojo-stdlib-contributing`. - Either posts review comments on the PR (default) or writes findings to `.specs/stdlib-review-.md` for the author to fix before flipping a draft PR to ready-for-review. Distilled from 30+ reviewed PRs against `modular/modular`. Assisted-by: AI --- mojo-stdlib-review/SKILL.md | 495 ++++++++++++++++++++++++++++++++++++ 1 file changed, 495 insertions(+) create mode 100644 mojo-stdlib-review/SKILL.md diff --git a/mojo-stdlib-review/SKILL.md b/mojo-stdlib-review/SKILL.md new file mode 100644 index 0000000..20d1a2a --- /dev/null +++ b/mojo-stdlib-review/SKILL.md @@ -0,0 +1,495 @@ +--- +name: mojo-stdlib-review +description: > + Red-team adversarial review of Mojo stdlib PRs. Spawns parallel agents that deep-read every + changed file, extract factual claims, verify each against the actual stdlib source of truth and + the rules in `mojo-stdlib-contributing`, and either post review comments on the PR (default "post + mode") or write findings to `.specs/` ("local mode", triggered by "in local mode" / "no post" in + the arguments). Use when the user mentions "adversarial review", "red team review", "deep review", + "fact-check PR", or wants thorough verification of a Mojo stdlib PR before requesting maintainer + review. Distilled from 30+ reviewed PRs to `modular/modular`. +user-invocable: true +argument-hint: "[PR numbers or URLs, space-separated] [\"in local mode\"]" +--- + +Red-team adversarial review of Mojo stdlib PRs: $ARGUMENTS + +## Overview + +This skill performs adversarial (red-team) reviews of pull requests against the Mojo standard +library. Unlike friendly reviews that skim for obvious issues, adversarial reviews actively try +to prove every claim wrong by cross-referencing the actual stdlib source, the docstrings of +sibling APIs, the trait definitions, the existing tests/benchmarks, and the rules in +[`mojo-stdlib-contributing`](../mojo-stdlib-contributing/SKILL.md). + +The skill spawns one `general-purpose` agent per PR in parallel. Each agent: + +1. Reads the full PR diff +2. Loads `mojo-stdlib-contributing` rules and the local stdlib source as ground truth +3. Extracts every verifiable claim and every removed line +4. Verifies each claim against the source of truth +5. Double-checks findings before reporting +6. Either posts review comments on the PR (default) or writes findings to a local file for the + author to auto-fix (local mode) + +## Output modes + +Default is **post mode**: findings are posted on the PR via `gh pr review --comment`. Use this +when reviewing someone else's PR. + +If `$ARGUMENTS` contains "local", "local mode", "local fix", "no post", or equivalent phrasing, +switch to **local mode**: write findings to `.specs/stdlib-review-.md` and return the path +plus per-category counts to the caller. Do NOT post PR comments. + +Local mode is the canonical "self-review gate" used by stdlib contributors before flipping a +draft PR to ready-for-review. It runs on the local Claude Code subscription and avoids spamming +the PR with comments that the author will resolve before any maintainer sees them. + +## Phase 1: Parse Input + +### Step 1: Extract PR identifiers + +Parse `$ARGUMENTS` to extract PR numbers or URLs. Supported formats: + +- **PR numbers**: `6276 6510 6530` +- **PR URLs**: `https://github.com/modular/modular/pull/6276` +- **Mixed**: `6276 https://github.com/modular/modular/pull/6510` +- **No arguments**: Ask the user for PR numbers. + +Normalize all inputs to PR numbers against the `modular/modular` repo. + +### Step 2: Fetch PR metadata + +For each PR, fetch basic info: + +```bash +gh pr view $PR_NUMBER --repo modular/modular \ + --json number,title,state,headRefName,changedFiles,additions,deletions,files,isDraft,body +``` + +Present a summary table and ask for confirmation: + +| # | PR | Title | Files | +/- | Draft? | +|---|-----|-------|-------|-----|--------| +| 1 | #6276 | Add `take()`/`skip()` iterator adapters | 5 | +180/-0 | yes | + +### Step 3: Scope guard + +This skill targets PRs that touch the Mojo stdlib. From the `files` list returned by +`gh pr view`, verify at least one path matches `mojo/stdlib/std/**`, `mojo/stdlib/test/**`, +`mojo/stdlib/benchmarks/**`, `mojo/stdlib/docs/**`, `mojo/docs/nightly-changelog.md`, or +`mojo/docs/code/**`. If none match, warn the user and ask whether to proceed (a non-stdlib PR is +better served by the general `adversarial-review` skill). + +Do NOT do any further "domain detection". This skill has a single domain: Mojo stdlib. + +--- + +## Phase 2: Spawn Review Agents + +### Step 4: Launch agents in parallel + +For each PR, spawn a `general-purpose` agent using the Agent tool with `run_in_background: true`. + +**All agents MUST be launched in a single message** to maximize parallelism. + +Use `stdlib-review-{pr_number}` as the agent `description` (e.g., `"stdlib-review-6276"`). +Capture the returned agent ID from each Agent tool call. + +Before spawning, choose **one** emit block (post mode default OR local mode) based on +`$ARGUMENTS` and inline it as `{EMIT_SECTION}` so the agent receives a single unambiguous +instruction. + +### Agent Prompt Template + +Fill in `{PR_NUMBER}`, `{PR_TITLE}`, and `{EMIT_SECTION}`. + +```` +You are doing an ADVERSARIAL REVIEW of Mojo stdlib PR #{PR_NUMBER} +({PR_TITLE}): https://github.com/modular/modular/pull/{PR_NUMBER} + +## What is an adversarial review? + +You are NOT a friendly reviewer. You are a red-team reviewer whose job is to find every +factual error, behavioral inconsistency, missing test, undocumented edge case, and +optimization that is actually a no-op. You actively try to prove claims wrong by checking +the actual stdlib source. + +## Ground truth, in priority order + +1. The Mojo stdlib source under `mojo/stdlib/std/**` (read the actual files; do not rely on + pretrained knowledge of older Mojo syntax). +2. `mojo-stdlib-contributing/SKILL.md` (recurring rejection reasons; treat its rules as if + they were a maintainer's standing review checklist). +3. `mojo/CONTRIBUTING.md`, `mojo/stdlib/docs/style-guide.md`, `mojo/stdlib/docs/development.md`, + `mojo/stdlib/docs/docstring-style-guide.md`, `AI_TOOL_POLICY.md`. +4. Sibling APIs in the same module (for consistency/symmetry checks). +5. CPython documentation, only when the PR explicitly claims Python equivalence. + +Use Grep, Glob, and Read extensively. Do NOT skip Step 3 below. + +## Methodology -- follow ALL steps + +### Step 1: Read the full PR + +```bash +gh pr diff {PR_NUMBER} --repo modular/modular +gh pr view {PR_NUMBER} --repo modular/modular --json title,body,commits,labels,isDraft +``` + +Read the full diff carefully. Note the PR description, the linked GitHub issue (if any), and +whether the PR is a draft. + +### Step 2: Extract every verifiable claim + +For each changed file, extract claims of every category below. Each table row's "Verify +against" column is where the agent must look to confirm or refute the claim. + +#### A. Process & scope (sources: PR body, commits, `mojo/CONTRIBUTING.md`, `AI_TOOL_POLICY.md`) + +| Claim type | Verify against | +|---|---| +| Adds a new public API on an existing type | `gh issue list --repo modular/modular --search ""` -- a GH issue MUST exist before new APIs | +| PR is marked draft | `isDraft` field -- new-API PRs should stay draft until consensus | +| `Assisted-by: AI` present in every commit and the PR body | `gh pr view ... --json commits,body` | +| One logical change per PR | The set of files touched -- flag PRs that mix unrelated concerns (e.g. `ArcPointer` hashable + new `String.capitalize`) | +| Changelog entry exists / does not exist | `mojo/docs/nightly-changelog.md` -- entries are required for user-facing changes and FORBIDDEN for NFC / implementation-detail PRs (e.g. DRY refactors) | +| Changelog scope matches PR scope | The diff -- flag entries that document changes not in the PR, or omit user-facing additions present in the PR | + +#### B. API design & symmetry (sources: sibling methods, CPython docs when claimed) + +| Claim type | Verify against | +|---|---| +| "Equivalent to Python's `math.X`" / "matches Python" | CPython behavior. Check positional-only (`/`), keyword-only (`*`), exception types, edge cases (e.g. `k > n` in `math.perm`). | +| New method's error type matches sibling methods | The sibling implementations (e.g. `tell()` vs `seek()` on `FileHandle` -- both should raise `Error("...")`, not bare strings) | +| New method duplicates existing logic | Sibling methods -- e.g. `tell()` could be `seek(0, SEEK_CUR)`; new ordering dunders could delegate to `_compare()` | +| Takes `Int` for an integer arg | Consider whether `Scalar[dtype]` is the right generic; flag `Int`-only when other integer dtypes are natural | +| Adds `Hashable` without `Equatable` | Dict/Set key element traits in `std/collections/dict.mojo` -- both are required | + +#### C. Trait conformance (sources: trait definitions in `std/builtin/`) + +| Claim type | Verify against | +|---|---| +| Conforms to `Comparable` | `std/builtin/comparable.mojo` -- requires `def __lt__`; defaults provide `__le__/__gt__/__ge__`; `Comparable` extends `Equatable` so don't list both | +| Required dunders implemented as `fn` | Trait requires `def`; `fn` may fail to satisfy conformance | +| Duplicates default trait methods | Drop redundant overrides unless there's a measured reason | +| Generic constrained-conformance dunder (e.g. `Tuple.__lt__` over `AllComparable`) | Existing pattern: use `_constrained_conforms_to_*` + `trait_downcast`; do not re-implement `_compare`-style logic | + +#### D. Docstring accuracy (sources: the implementation directly below) + +| Claim type | Verify against | +|---|---| +| "ASCII-only, non-ASCII passed through unchanged" | The implementation -- if it calls `self.lower()`, it Unicode-lowercases and the claim is wrong | +| "Undefined for negative inputs" | If the body asserts, the docstring must say "asserts", not "undefined" | +| "Equivalent to Python's X" with `k > n` etc. | CPython's actual behavior + the implementation | +| Args section reflects positional/keyword constraints | The signature (`/`, `*`) | +| `@no_inline` annotation on `write_repr_to` | Sibling `write_repr_to` implementations in `std/benchmark/bencher.mojo` etc. | + +#### E. Tests (sources: existing tests, the implementation) + +| Claim type | Verify against | +|---|---| +| Asserts `hash(a) != hash(b)` for unequal values | Hash contract: only `a == b => hash(a) == hash(b)` is required. Inequality assertions are flaky by definition and forbidden. | +| Tests for string/byte operations cover Unicode | Look for emoji / multi-byte test cases; flag if missing | +| Search/contains tests cover BOTH hit and miss paths | The diff -- both required | +| Edge cases: empty input, exact-width input, `start > end`, OOB indices | The diff | +| `String("...").method()` on a temporary | Bind to a `var` first when the method returns an aliasing slice (dangling-slice risk) | +| Asserts on `assert_equal` failure message wording for a type that just gained `Writable` | Overload selection -- the format may now legitimately be `String(x)` instead of `repr(x)` | +| Tests for `Writable` use `String(x)` / `repr(x)` instead of `check_write_to` | Flag and recommend `check_write_to` | +| Tests for runtime aborts using `assert_raises` | Aborts in Mojo are not catchable -- abort tests must be FileCheck (`_FILECHECK_TESTS=True`, `_EXPECT_CRASH=True`, `# CHECK:` directives). See `test_list_getitem_invalid_index_int.mojo`. | +| New public API without negative tests (overflow, wrong arity, wrong arg type) | Demand them | +| Pack-forwarding / variadic tests cover empty pack | The diff -- length-0 case is often missed | +| Compile-fail tests for `constrained_*` error messages | Done via `not %mojo ... \| FileCheck` | + +#### F. Iterator design (sources: existing iterators in `std/itertools/`) + +| Claim type | Verify against | +|---|---| +| State mutation in `__next__` before the inner `next()` succeeds | The inner iterator can raise `StopIteration`; state must change AFTER successful element consumption | +| Public `take(it, n)` / `skip(it, n)` accept negative `n` silently | `repeat()` asserts non-negative; new adapters should match (assert OR document) | +| `bounds()` can return a negative number | Clamp to 0 | +| Inline `next(self._inner)` passed straight into `rebind_var[downcast[...]](...)` | Pattern elsewhere: `var elem = next(...)` then discard `elem^` | +| `_TakeIterator` / `_SkipIterator` conformance to `IterableOwned` | Other adapters conform; consider adding an owned overload | + +#### G. Benchmarks (sources: `std/benchmark/`, `mojo-stdlib-contributing` benchmark section) + +| Claim type | Verify against | +|---|---| +| Inputs and results in the hot loop are not `black_box()`/`keep()` | Required, or the compiler may optimize the bench away | +| Uses `@parameter`-style closure | New code should use unified closures: `def call_fn() raises {mut f}:` | +| `keep(x)` used to silence unused warnings | Forbidden -- use `_ = value^` | +| Setup inside the hot loop | Setup goes outside `call_fn`; use `iter_with_setup` for destructive benches | +| Bench data doesn't match the docstring/comment | Read setup carefully (e.g. "element in the middle" must actually be in the middle) | +| Performance PR with no benchmarks, or only one input size | Demand small AND large input coverage | +| Bench utilities and their usage land in the same PR | Should be split into two PRs | + +#### H. Optimizations (sources: `compile_info` IR diff + benchmark numbers) + +| Claim type | Verify against | +|---|---| +| "Adds a fast path" / "Avoids a branch" without IR diff | Demand `compile_info[..., emission_kind="llvm-opt"]` evidence in the PR description | +| Manual ASCII fast path before `_utf8_first_byte_sequence_length(b)` (or similar) | LLVM already optimizes this -- the IR is identical OR the "fast path" produces WORSE codegen via `is_zero_poison=false` | +| Cache-line / alignment claim | Demand benchmark numbers, not speculation | +| Performance PR without before/after benchmark numbers in the body | Required | +| "Adds branch to skip computation" where the skip is cheap | Often slower than the branchless version; demand IR + bench | + +#### I. Code reuse / DRY (sources: `std/format/_utils`, existing primitives) + +| Claim type | Verify against | +|---|---| +| Hand-rolls struct repr with manual `", "` separators | Use `FormatStruct(writer, "Name").params(Named(...)).fields(Named(...), Named(...))` | +| `write_to` / `write_repr_to` allocates via `repr()` or `String(x)` | Forbidden -- these methods must not allocate | +| Reimplements byte scan, clamp, etc. | Reuse `_memchr` / `_memchr_impl` / `clamp` / `simd_width_of[DType.bool]()` | +| Bulk copy/move/destroy in trivial-type code paths | Use `uninit_copy_n`, `uninit_move_n`, `destroy_n` | +| New fast path on `List` (not `Span`) | Implement on `Span`, then have `List` delegate upward | +| Two write/repr methods duplicate field-formatting logic | Factor into a helper, or both call the same `FormatStruct` chain | + +#### J. Memory / lifetime / move semantics (sources: surrounding code patterns) + +| Claim type | Verify against | +|---|---| +| `.copy()` in performance-sensitive paths | Use move `^` | +| `Dict[...]` parameter taken by value when only iterating | Take by `ref` / borrowed; by-value either errors or defeats the optimization | +| Test creates `String("...")` and calls a slice-returning method on the temporary | Bind to `var s = String("...")` first | +| ARC/Weak test with `as_any_origin` | This extends lifetimes via `AnyOrigins`; destruction-order assertions may be vacuous. Restructure to a single shared counter with a concrete origin. | +| Pre-condition assert lives on `__len__` (or another read accessor) | Move it to where the invariant is *established* (e.g. capacity assignment), not where it's read | + +#### K. Asserts & safety (sources: `mojo-stdlib-contributing` assertion section, style-guide) + +| Claim type | Verify against | +|---|---| +| Downgrades `debug_assert[assert_mode="safe"]` to default / removes it | Forbidden -- these are intentional safety invariants. Users who need to bypass should use `.as_bytes().unsafe_get(idx)` instead. | +| Adds `debug_assert(...)` (default `assert_mode="none"`) expecting it to run in release | Default does NOT run in release; only `-D ASSERT=all` enables it | +| Pointer arithmetic with unclamped `[start, end]` | Clamp into `[0, len]` before `memcmp`/SIMD load | +| Reverse SIMD scan without negative `pos_in_block` check | Use `count_leading_zeros` and handle the no-match case | +| `vectorized_end` doesn't account for full SIMD block width | OOB read risk; verify | + +#### L. Style / patterns (sources: `style-guide.md`, surrounding code) + +| Claim type | Verify against | +|---|---| +| Uses `fn` for new stdlib code | Prefer `def` (per recurring reviewer feedback) | +| Uses `@parameter` in new code | Prefer `comptime` | +| Imports from deep submodule path (e.g. `std.benchmark.benchmark.Batch`) | Use stable re-export (`from std.benchmark import Batch`) | +| Section header formatting inconsistent with the rest of the file | Match existing style exactly (spacing of `# ===---===#`) | +| Error message constructed with `+ String(...)` concatenation | Use t-strings (`t"... {expr} ..."`) -- avoids intermediate allocations | +| SIMD test comment hard-codes a width (e.g. "64-byte SIMD") | Make platform-agnostic: "exercises both SIMD path and scalar tail" | + +#### M. String byte indexing (source: `StringSlice.__getitem__` signature) + +| Claim type | Verify against | +|---|---| +| Slice via `self[:idx]` / `self[idx + n:]` on a `StringSlice[byte=...]` | Required: `self[byte = :idx]` / `self[byte = idx + n :]` (`byte` is a required keyword) | + +#### N. Renames / key references (use Grep) + +| Claim type | Verify against | +|---|---| +| Renames a dict key, env var, string identifier, or constant | Grep the OLD name across the whole repo. Every read AND write site must be updated; silent default lookups otherwise. | + +### Step 3: Verify EVERY claim against source of truth + +For each claim from Step 2, find and read the actual source file. Do NOT trust the PR's +self-description. Use Grep + Read. + +### Step 3b: Analyze removed code for downstream side effects + +**MANDATORY**: For every removed/replaced line: + +1. Identify all effects of the removed code -- side effects, return value usage, state changes. +2. Trace the downstream code path WITHOUT the removed call. Check every branch. +3. If the PR (or a comment) claims "another code path handles this": READ that path. Verify it + calls the same function under the same conditions, with no intervening flag that re-routes. +4. Flag ordering dependencies: if the removed call ran BEFORE a state change but the + "replacement" runs AFTER, they take different paths. +5. "Pre-existing behavior" is NOT an excuse -- the removed call may have been the only place + that action happened for this scenario. + +### Step 4: Check completeness + +- Missing test cases (Unicode, empty, edge widths, abort paths) +- Missing changelog entry for a user-facing change (or unwanted entry for NFC) +- Public API without docstring example +- New API without GH issue link + +### Step 5: Check consistency + +- Internal: does the docstring match the implementation? does the PR description match the diff? +- External: does the new method's error type, signature shape, and naming match its siblings? +- Style: does it follow `style-guide.md` and the conventions of the surrounding file? + +### Step 6: Double-check before commenting + +Before emitting ANY finding: + +1. Re-read the specific line in the PR you're about to flag. +2. Re-read the source you're citing as evidence. +3. Confirm the discrepancy is real, not a misunderstanding of Mojo semantics. +4. For optimization claims: if you don't have IR evidence, mark the finding as a *question*, + not a *factual error*. + +False positives destroy credibility. If you're not sure, mark it as a "question" rather than +asserting an error. + +### Step 7: Classify findings + +- **Critical**: would cause a compile error, runtime abort outside intended assertions, data + loss, or memory unsafety +- **Factual error**: docstring/PR/comment claim contradicts source of truth +- **Completeness gap**: missing test, missing changelog, missing edge case, undocumented behavior +- **Inconsistency**: violates `mojo-stdlib-contributing` rule, sibling API symmetry, or + style-guide +- **Question**: design tradeoff worth raising but not a clear defect (naming, scope, "should + this be `Scalar[dtype]`?") +- **Minor**: typos, formatting, comment style + +### Step 8: Emit review + +{EMIT_SECTION} + +Structure the review body as: + +```markdown +## Adversarial Stdlib Review: #{PR_NUMBER} + +### Methodology +Verified N claims against source files: [list key stdlib files checked] +Cross-referenced against `mojo-stdlib-contributing/SKILL.md`. + +### Issues Found (N total) + +#### Critical (N) +1. **[file:line]** -- Description. Source: `path/to/source:line` shows X, but PR claims Y. + +#### Factual Errors (N) +1. **[file:line]** -- ... + +#### Completeness Gaps (N) +1. **[file]** -- Missing X. Reference: `mojo-stdlib-contributing` section on Y, or sibling at + `path:line`. + +#### Inconsistencies (N) +1. **[file:line]** -- ... + +#### Questions (N) +1. **[file:line]** -- ... + +#### Minor (N) +1. **[file:line]** -- ... + +### Verified Correct +N claims verified accurate. Key sources checked: [list] +``` + +IMPORTANT: +- Be thorough. Read every changed file. Check every claim. +- Cite the specific source file and line that contradicts the PR. +- Do NOT post findings you haven't double-checked. +- If everything checks out, say so clearly in the "Verified Correct" section. +```` + +### Emit blocks + +Inject exactly one of these as `{EMIT_SECTION}` based on `$ARGUMENTS`: + +**Post mode (default):** + +``` +Post the review to the PR using a heredoc to avoid shell quoting issues: + +\`\`\`bash +gh pr review {PR_NUMBER} --repo modular/modular --comment --body "$(cat <<'EOF' +REVIEW_BODY +EOF +)" +\`\`\` +``` + +**Local mode:** + +``` +Do NOT call `gh pr review`. Write the REVIEW_BODY markdown to +`.specs/stdlib-review-{PR_NUMBER}.md` (create the parent directory if needed) and return the +absolute path plus the issue counts (Critical/Factual/Completeness/Inconsistency/Question/Minor) +to the orchestrator. The author will read this file, resolve findings, and re-run this skill to +confirm the PR is clean before flipping the draft to ready-for-review. +``` + +--- + +## Phase 3: Monitor and Collect Results + +### Step 5: Wait for agents + +All agents run in background. You will be notified as each completes. + +As each agent completes, record its results: + +- Number of issues by category (Critical / Factual / Completeness / Inconsistency / Question / Minor) +- Number of claims verified correct +- Whether the review was posted (post mode) or written to `.specs/` (local mode) + +Report each completion to the user with a brief summary. + +### Step 6: Handle failures + +If an agent fails (e.g. `gh` auth issue, missing file, hit token budget): + +- Report the failure to the user with the error +- Offer to retry the specific PR (spawn a new agent using the same prompt) + +--- + +## Phase 4: Consolidated Report + +### Step 7: Scorecard + +```markdown +## Adversarial Stdlib Review Scorecard + +| # | PR | Title | Total | Crit | Factual | Completeness | Inconsistency | Question | Minor | +|---|-----|-------|-------|------|---------|--------------|---------------|----------|-------| +| 1 | #6276 | ... | N | X | Y | Z | W | Q | M | +| | **Total** | | **N** | **X** | **Y** | **Z** | **W** | **Q** | **M** | + +### Top Patterns +- [Recurring issues across PRs] +- [`mojo-stdlib-contributing` rules being violated most often] +- [Suggestions to fold new rules into `mojo-stdlib-contributing`] +``` + +In **post mode**, end with: "All review comments have been posted on each PR." +In **local mode**, end with the list of `.specs/stdlib-review-.md` paths. + +### Step 8: Follow-up options + +Ask the user via `AskUserQuestion`: + +**Next steps** (header: "Next steps"): + +- **Done**: Reviews are in place, nothing more needed. +- **Fix locally**: For local-mode runs, read each `.specs/stdlib-review-.md` and apply fixes + on the local branch. +- **Re-review**: Run another pass on specific PRs (e.g. after the author pushes fixes). +- **Update `mojo-stdlib-contributing`**: If a recurring pattern surfaced that isn't in the + contributing skill yet, propose an edit. + +--- + +## Important Notes + +- **Single domain**: This skill is Mojo-stdlib-only. Use the general `adversarial-review` skill + for non-stdlib PRs. +- **Agent type**: Always `general-purpose`. Stdlib expertise comes from the prompt (and from + reading `mojo-stdlib-contributing` as the agent's first move), not the agent type. +- **Parallelism**: All agents MUST be launched in a single message for maximum parallelism. +- **False-positive prevention**: Step 6 (double-check) is critical. A review that posts wrong + comments costs the author trust and review-bandwidth. When in doubt, classify as Question. +- **Scope**: This skill emits reviews; it does NOT modify code. Author + caller fix issues. +- **Idempotent in local mode**: re-running overwrites `.specs/stdlib-review-.md`. In post + mode, re-running posts a fresh review comment -- warn the user before re-reviewing a PR you + already reviewed. +- **Cost**: Each agent reads heavily (PR files + stdlib source + sibling APIs + the + contributing skill). For N PRs, expect proportional reads. Local mode amortizes this: run it + once before flipping to ready-for-review and you avoid triggering the paid CI reviewer on + every push. From 8b535397d2dfc1053cdc7ff0411b42da73937b4b Mon Sep 17 00:00:00 2001 From: Manuel Saelices Date: Wed, 13 May 2026 09:45:53 +0200 Subject: [PATCH 6/7] [Skills] Document `mojo-stdlib-review` in README Adds the README entry for the new skill alongside `mojo-stdlib-contributing`. Assisted-by: AI --- README.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/README.md b/README.md index 100ce0e..0f767c8 100644 --- a/README.md +++ b/README.md @@ -91,6 +91,18 @@ semantics (`assert_mode="safe"` is intentional), benchmark patterns, testing requirements, SIMD/memory safety, and common reviewer feedback. Use when modifying code under `mojo/stdlib/`. +### `mojo-stdlib-review` + +[This skill](mojo-stdlib-review/SKILL.md) performs an adversarial (red-team) +review of one or more Mojo stdlib PRs. It spawns parallel agents that +fact-check every claim in the diff against the actual stdlib source and the +rules in `mojo-stdlib-contributing`, classify findings (Critical / Factual / +Completeness / Inconsistency / Question / Minor), and either post review +comments on the PR (default) or write a local findings file for the author to +fix before flipping a draft PR to ready-for-review. Use when you want a +thorough self-review pass on a stdlib contribution, or to fact-check someone +else's stdlib PR. + ## Examples Once these skills are installed, you can use them for many common tasks. From f30f715d85ed08c39fb01af5b73d244ae76e6a9a Mon Sep 17 00:00:00 2001 From: Manuel Saelices Date: Wed, 13 May 2026 09:48:37 +0200 Subject: [PATCH 7/7] [Skills] Default `mojo-stdlib-review` to local mode Flips the default output mode so findings are written to `.specs/stdlib-review-.md` unless the caller opts into posting on the PR via "in post mode" / "post comments" / "post to PR". Local mode is the typical use case (self-review of one's own draft before flipping to ready-for-review); post mode now requires an explicit opt-in. Also reorders the emit blocks in the agent prompt template (local first), updates the README entry, and tweaks the follow-up options so the local-mode "Fix locally" choice is the default. Assisted-by: AI --- README.md | 10 ++--- mojo-stdlib-review/SKILL.md | 90 ++++++++++++++++++++----------------- 2 files changed, 53 insertions(+), 47 deletions(-) diff --git a/README.md b/README.md index 0f767c8..f2e5d99 100644 --- a/README.md +++ b/README.md @@ -97,11 +97,11 @@ modifying code under `mojo/stdlib/`. review of one or more Mojo stdlib PRs. It spawns parallel agents that fact-check every claim in the diff against the actual stdlib source and the rules in `mojo-stdlib-contributing`, classify findings (Critical / Factual / -Completeness / Inconsistency / Question / Minor), and either post review -comments on the PR (default) or write a local findings file for the author to -fix before flipping a draft PR to ready-for-review. Use when you want a -thorough self-review pass on a stdlib contribution, or to fact-check someone -else's stdlib PR. +Completeness / Inconsistency / Question / Minor), and write a local findings +file for the author to fix before flipping a draft PR to ready-for-review. +Pass `"in post mode"` in the arguments to post review comments on the PR via +`gh pr review` instead. Use when you want a thorough self-review pass on a +stdlib contribution, or to fact-check someone else's stdlib PR. ## Examples diff --git a/mojo-stdlib-review/SKILL.md b/mojo-stdlib-review/SKILL.md index 20d1a2a..76e8c30 100644 --- a/mojo-stdlib-review/SKILL.md +++ b/mojo-stdlib-review/SKILL.md @@ -3,13 +3,14 @@ name: mojo-stdlib-review description: > Red-team adversarial review of Mojo stdlib PRs. Spawns parallel agents that deep-read every changed file, extract factual claims, verify each against the actual stdlib source of truth and - the rules in `mojo-stdlib-contributing`, and either post review comments on the PR (default "post - mode") or write findings to `.specs/` ("local mode", triggered by "in local mode" / "no post" in - the arguments). Use when the user mentions "adversarial review", "red team review", "deep review", - "fact-check PR", or wants thorough verification of a Mojo stdlib PR before requesting maintainer - review. Distilled from 30+ reviewed PRs to `modular/modular`. + the rules in `mojo-stdlib-contributing`, and write findings to `.specs/` by default ("local + mode") so the author can self-review before flipping a draft PR to ready. Pass "in post mode" / + "post comments" / "post to PR" in the arguments to post review comments on the PR via + `gh pr review` instead. Use when the user mentions "adversarial review", "red team review", + "deep review", "fact-check PR", or wants thorough verification of a Mojo stdlib PR before + requesting maintainer review. Distilled from 30+ reviewed PRs to `modular/modular`. user-invocable: true -argument-hint: "[PR numbers or URLs, space-separated] [\"in local mode\"]" +argument-hint: "[PR numbers or URLs, space-separated] [\"in post mode\"]" --- Red-team adversarial review of Mojo stdlib PRs: $ARGUMENTS @@ -29,21 +30,23 @@ The skill spawns one `general-purpose` agent per PR in parallel. Each agent: 3. Extracts every verifiable claim and every removed line 4. Verifies each claim against the source of truth 5. Double-checks findings before reporting -6. Either posts review comments on the PR (default) or writes findings to a local file for the - author to auto-fix (local mode) +6. Writes findings to a local file for the author to auto-fix (default), or posts review + comments on the PR (opt-in via "in post mode") ## Output modes -Default is **post mode**: findings are posted on the PR via `gh pr review --comment`. Use this -when reviewing someone else's PR. +Default is **local mode**: write findings to `.specs/stdlib-review-.md` and return the path +plus per-category counts to the caller. Do NOT post PR comments. This is the canonical +"self-review gate" used by stdlib contributors before flipping a draft PR to ready-for-review. +It runs on the local Claude Code subscription and avoids spamming the PR with comments that the +author will resolve before any maintainer sees them. -If `$ARGUMENTS` contains "local", "local mode", "local fix", "no post", or equivalent phrasing, -switch to **local mode**: write findings to `.specs/stdlib-review-.md` and return the path -plus per-category counts to the caller. Do NOT post PR comments. +If `$ARGUMENTS` contains "post", "post mode", "post comments", "post to PR", or equivalent +phrasing, switch to **post mode**: findings are posted on the PR via `gh pr review --comment`. +Use this when reviewing someone else's PR rather than self-reviewing your own draft. -Local mode is the canonical "self-review gate" used by stdlib contributors before flipping a -draft PR to ready-for-review. It runs on the local Claude Code subscription and avoids spamming -the PR with comments that the author will resolve before any maintainer sees them. +Phrases like "in local mode" / "local fix" / "no post" also explicitly request the default +local mode and are accepted (no-op). ## Phase 1: Parse Input @@ -96,9 +99,10 @@ For each PR, spawn a `general-purpose` agent using the Agent tool with `run_in_b Use `stdlib-review-{pr_number}` as the agent `description` (e.g., `"stdlib-review-6276"`). Capture the returned agent ID from each Agent tool call. -Before spawning, choose **one** emit block (post mode default OR local mode) based on +Before spawning, choose **one** emit block (local mode default OR post mode) based on `$ARGUMENTS` and inline it as `{EMIT_SECTION}` so the agent receives a single unambiguous -instruction. +instruction. Default is local mode; switch to post mode only if `$ARGUMENTS` explicitly opts +in via "post", "post mode", "post comments", "post to PR", or equivalent phrasing. ### Agent Prompt Template @@ -392,7 +396,17 @@ IMPORTANT: Inject exactly one of these as `{EMIT_SECTION}` based on `$ARGUMENTS`: -**Post mode (default):** +**Local mode (default):** + +``` +Do NOT call `gh pr review`. Write the REVIEW_BODY markdown to +`.specs/stdlib-review-{PR_NUMBER}.md` (create the parent directory if needed) and return the +absolute path plus the issue counts (Critical/Factual/Completeness/Inconsistency/Question/Minor) +to the orchestrator. The author will read this file, resolve findings, and re-run this skill to +confirm the PR is clean before flipping the draft to ready-for-review. +``` + +**Post mode (opt-in via "in post mode"):** ``` Post the review to the PR using a heredoc to avoid shell quoting issues: @@ -405,16 +419,6 @@ EOF \`\`\` ``` -**Local mode:** - -``` -Do NOT call `gh pr review`. Write the REVIEW_BODY markdown to -`.specs/stdlib-review-{PR_NUMBER}.md` (create the parent directory if needed) and return the -absolute path plus the issue counts (Critical/Factual/Completeness/Inconsistency/Question/Minor) -to the orchestrator. The author will read this file, resolve findings, and re-run this skill to -confirm the PR is clean before flipping the draft to ready-for-review. -``` - --- ## Phase 3: Monitor and Collect Results @@ -427,7 +431,7 @@ As each agent completes, record its results: - Number of issues by category (Critical / Factual / Completeness / Inconsistency / Question / Minor) - Number of claims verified correct -- Whether the review was posted (post mode) or written to `.specs/` (local mode) +- Whether the review was written to `.specs/` (local mode, default) or posted (post mode) Report each completion to the user with a brief summary. @@ -458,8 +462,8 @@ If an agent fails (e.g. `gh` auth issue, missing file, hit token budget): - [Suggestions to fold new rules into `mojo-stdlib-contributing`] ``` +In **local mode** (default), end with the list of `.specs/stdlib-review-.md` paths. In **post mode**, end with: "All review comments have been posted on each PR." -In **local mode**, end with the list of `.specs/stdlib-review-.md` paths. ### Step 8: Follow-up options @@ -467,9 +471,9 @@ Ask the user via `AskUserQuestion`: **Next steps** (header: "Next steps"): +- **Fix locally** (default for local-mode runs): Read each `.specs/stdlib-review-.md` and + apply fixes on the local branch. - **Done**: Reviews are in place, nothing more needed. -- **Fix locally**: For local-mode runs, read each `.specs/stdlib-review-.md` and apply fixes - on the local branch. - **Re-review**: Run another pass on specific PRs (e.g. after the author pushes fixes). - **Update `mojo-stdlib-contributing`**: If a recurring pattern surfaced that isn't in the contributing skill yet, propose an edit. @@ -483,13 +487,15 @@ Ask the user via `AskUserQuestion`: - **Agent type**: Always `general-purpose`. Stdlib expertise comes from the prompt (and from reading `mojo-stdlib-contributing` as the agent's first move), not the agent type. - **Parallelism**: All agents MUST be launched in a single message for maximum parallelism. -- **False-positive prevention**: Step 6 (double-check) is critical. A review that posts wrong - comments costs the author trust and review-bandwidth. When in doubt, classify as Question. -- **Scope**: This skill emits reviews; it does NOT modify code. Author + caller fix issues. -- **Idempotent in local mode**: re-running overwrites `.specs/stdlib-review-.md`. In post - mode, re-running posts a fresh review comment -- warn the user before re-reviewing a PR you - already reviewed. +- **False-positive prevention**: Step 6 (double-check) is critical. A review that surfaces + wrong findings wastes the author's time and, in post mode, costs trust and review-bandwidth. + When in doubt, classify as Question. +- **Scope**: This skill emits reviews; it does NOT modify code. The author (or caller) fixes + issues based on the findings file (local mode) or PR comments (post mode). +- **Default is local mode**: re-running overwrites `.specs/stdlib-review-.md` (idempotent). + In post mode, re-running posts a fresh review comment each time -- warn the user before + re-reviewing a PR you already reviewed in post mode. - **Cost**: Each agent reads heavily (PR files + stdlib source + sibling APIs + the - contributing skill). For N PRs, expect proportional reads. Local mode amortizes this: run it - once before flipping to ready-for-review and you avoid triggering the paid CI reviewer on - every push. + contributing skill). For N PRs, expect proportional reads. Local mode (default) amortizes + this: run it once before flipping to ready-for-review and you avoid triggering the paid CI + reviewer on every push.