cc: byte-out narrowing + skip dead pinned-register saves#449
Open
bboe wants to merge 2 commits into
Open
Conversation
5bc600b to
857e618
Compare
After peephole_fold_byte_immediate_through_local rewrites the
*(uint8_t *)&local byte-load idiom into a full-width mov eax, <imm>,
the only AX-touching consumer is out dx, al — which reads AL only.
Add peephole_narrow_acc_immediate_for_byte_out that walks forward to
confirm out dx, al is the consumer and that EAX upper bits are dead
until the next full {acc} clobber, then narrows the load.
Save: 3 bytes per site in 32-bit (mov eax, imm32 is 5 bytes; mov al,
imm8 is 2 bytes), 1 byte in 16-bit. Phase 2 bails conservatively at
labels, ret, and jumps, so loop-tail and trailing-port sites where the
function may return without clobbering EAX stay unnarrowed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_pinned_registers_to_save was saving every pinned register in the builtin's clobber set unconditionally, even when the pinned local hadn't been written yet — preserving garbage from caller-supplied state. Add a per-function pre-pass over the IR that computes the may-defined set of pinned-register values at each builtin call site: - Initial set: registers held by parameters (loaded by the prologue). - On each ir.Copy / ir.BinaryOperation / ir.Index / ir.Call(dest=...) whose destination is a pinned local, add that local's register. - On each ir.Block, peek at the wrapped AST node — VarDecl with init and Assign are stores; MemberAssign / IndexAssign / opaque AST go through pointers or are escape-hatched, skip. - For each loop region (Label..backward-Jump), pre-merge body stores into the loop's entry set so the back-edge sees in-loop stores on every iteration past the first. Scope is intentionally limited to builtin calls. User function calls go through a separate save-set path that the IR-only analysis can't fully model — Block-wrapped statements that fall back to AST codegen, pointer-aliased pinned locals, and ir.CarryBranch wrapping carry-return callees all stay on the conservative save-everything path. Saves: -120 byte kasm reduction kernel-wide. Pre-loop kernel_outb / kernel_inb call sites in drivers (ata_init, etc.) no longer wrap with push/pop edx when the EDX-pinned local hasn't been initialised yet. ping user program shrinks 1544 -> 1522 bytes (archive table updated). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
857e618 to
5f0215e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two cc.py optimizations, one commit each.
Commit 1: narrow
mov eax, imm→mov al, immfor byteout.After
peephole_fold_byte_immediate_through_localrewrites the byte-load idiom into a full-width immediate load, the only consumer isout dx, alwhich reads AL only. New peephole proves EAX upper bits are dead between the load and the next full{acc}clobber, then narrows. Saves 3 bytes per site in 32-bit (5 → 2), 1 byte in 16-bit. Phase 2 bails conservatively at labels,ret, and any jump, so loop-tail and trailing-port sites stay unnarrowed.Commit 2: skip dead pinned-register saves around builtin calls.
_pinned_registers_to_savewas saving every pinned register in the clobber set unconditionally, even when the pinned local hadn't been written yet — preserving garbage. New per-function pre-pass over the IR computes the may-defined set of pinned-register values at each builtin call site (loops pre-merge body stores into the entry set so the back-edge sees in-loop writes). Block-wrappedVarDecl init/Assignare recognised so the IR escape hatch doesn't leak. Scope intentionally limited to builtin calls — user-function paths stay on the conservative save-everything path.Combined: 42286 → 42158 bytes (-128 bytes kernel-wide).
pinguser program shrinks 1544 → 1522 bytes.Test plan
tests/unit/test_cc_codegen.py— 358 PASS (added 4 new tests: narrow narrow / kept-wider / liveness-skipped / liveness-VarDecl).tests/test_cc_casts.py— 6/6.tests/test_cc_bitfields.py— 10/10.tests/test_cc_local_structs.py— 12/12.tests/test_cc_bits.py— 110/110.tests/test_cc_compatibility.py— 57/57.tests/test_asm.py— 42/42 (previously failed macro_sm.asm; fixed by scoping commit 2 to builtin calls only).tests/test_archive.py— 12/12 (updatedpingrow 1544 → 1522 to reflect the new size).tests/test_kernel_archive.py— 12/12.tests/test_pipeline_basic.py— PASS.make_os.shsucceeds; os.bin 42286 → 42158.🤖 Generated with Claude Code