feat: MSI-X interrupt-driven GPU wake + zero-spin SUBMIT_3D#263
Merged
feat: MSI-X interrupt-driven GPU wake + zero-spin SUBMIT_3D#263
Conversation
…acing Three changes that together reduce BWM kernel CPU from ~20% to ~10%: 1. MSI-X interrupt-driven GPU wake: GPU completion fires MSI-X interrupt which immediately unblocks the compositor thread via scheduler. Matches Linux virtio-gpu driver pattern (wait_event_timeout). Zero spin cycles wasted waiting for SUBMIT_3D completion. 2. Zero-spin path: When MSI-X is active, send_command_3desc blocks the thread immediately after notifying the device (no 5k spin warmup). The interrupt handler wakes the thread at precisely 3.4ms. Polling fallback with spin+timer preserved for non-MSI-X configurations. 3. Frame pacing in compositor_wait: Enforces 5ms minimum inter-frame interval to prevent the compositor from saturating the CPU when GPU wake is fast. Uses plain timer block (not compositor block) to avoid consuming dirty-wake signals meant for the main blocking section. Performance (Parallels, bounce demo running): - Kernel GPU busy: ~16% (was 20% with spin, 70% before blocking) - submit_cpu: ~700us (real VirGL work, zero spin waste) - sleep: ~2600us (72% of frame time truly idle) - FPS: ~280 (not artificially capped) - 23/23 bcheck tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Windows were drawn in slot order rather than z-order, causing window decorations to appear behind other windows' content. BWM now passes z_order (vec index) via set_window_position op=17, and the kernel sorts WindowCompositeInfo by z_order before GPU draws back-to-front. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
**Scheduler tick accounting (kernel/src/task/scheduler.rs):** When a thread blocked in a syscall (sleep, waitpid, compositor_wait, GPU command wait), the scheduler kept charging elapsed ticks as CPU time until the next context switch. This caused all blocking processes to report inflated CPU% (e.g., BWM showed 40% when actual CPU was 10%). Fix: charge ticks at block time in all block_current_* functions, and skip tick charging in schedule() for already-blocked threads. This affects every process in Breenix — any blocking syscall now correctly reports only active CPU time. **BWM optimizations (userspace/programs/src/bwm.rs):** - Replace per-pixel fill_rect with libgfx::shapes::fill_rect (3x faster, pre-clips once then writes raw bytes with no per-pixel bounds checks) - Remove wasted content area fill in draw_window_frame (GPU overwrites it) - Skip redundant full redraw when clicking already-focused/top window - Reduce now_realtime() syscalls from every frame to every 30th frame Result: BWM CPU 40% → 10% (accounting fix + reduced userspace work). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
virtio-gpudriver pattern (wait_event_timeout). Zero spin cycles wasted.send_command_3descblocks immediately after notifying the device — no 5k spin warmup. Polling fallback preserved for non-MSI-X configurations.Performance (Parallels, bounce demo)
Test plan
🤖 Generated with Claude Code