Skip to content

feat: MSI-X interrupt-driven GPU wake + zero-spin SUBMIT_3D#263

Merged
ryanbreen merged 3 commits intomainfrom
feat/gpu-msix-zero-spin
Mar 13, 2026
Merged

feat: MSI-X interrupt-driven GPU wake + zero-spin SUBMIT_3D#263
ryanbreen merged 3 commits intomainfrom
feat/gpu-msix-zero-spin

Conversation

@ryanbreen
Copy link
Owner

Summary

  • MSI-X interrupt-driven GPU wake: GPU completion fires MSI-X interrupt → scheduler unblocks compositor thread immediately. Matches Linux virtio-gpu driver pattern (wait_event_timeout). Zero spin cycles wasted.
  • Zero-spin SUBMIT_3D path: When MSI-X is active, send_command_3desc blocks immediately after notifying the device — no 5k spin warmup. Polling fallback preserved for non-MSI-X configurations.
  • Frame pacing in compositor_wait: 5ms minimum inter-frame interval prevents the compositor from saturating CPU when GPU wake is fast. Uses plain timer block to avoid consuming dirty-wake signals.
  • gpu-phases instrumentation: Per-frame timing breakdown (bg upload, window upload, submit wall/cpu, sleep) reported every 500 frames.

Performance (Parallels, bounce demo)

Metric Before After
Kernel GPU busy ~20% ~16%
submit_cpu ~650us (spin) ~700us (real work, zero spin)
Sleep during GPU wait ~5000us (timer) ~2600us (MSI-X wake)
Frame time ~6ms ~3.6ms
FPS ~165 ~280

Test plan

  • 23/23 bcheck tests passing
  • DNS resolution working (38ms)
  • HTTP fetch working (302 from duke.edu)
  • Bounce demo rendering smoothly
  • btop showing reduced CPU usage

🤖 Generated with Claude Code

ryanbreen and others added 3 commits March 13, 2026 13:48
…acing

Three changes that together reduce BWM kernel CPU from ~20% to ~10%:

1. MSI-X interrupt-driven GPU wake: GPU completion fires MSI-X interrupt
   which immediately unblocks the compositor thread via scheduler. Matches
   Linux virtio-gpu driver pattern (wait_event_timeout). Zero spin cycles
   wasted waiting for SUBMIT_3D completion.

2. Zero-spin path: When MSI-X is active, send_command_3desc blocks the
   thread immediately after notifying the device (no 5k spin warmup).
   The interrupt handler wakes the thread at precisely 3.4ms. Polling
   fallback with spin+timer preserved for non-MSI-X configurations.

3. Frame pacing in compositor_wait: Enforces 5ms minimum inter-frame
   interval to prevent the compositor from saturating the CPU when GPU
   wake is fast. Uses plain timer block (not compositor block) to avoid
   consuming dirty-wake signals meant for the main blocking section.

Performance (Parallels, bounce demo running):
- Kernel GPU busy: ~16% (was 20% with spin, 70% before blocking)
- submit_cpu: ~700us (real VirGL work, zero spin waste)
- sleep: ~2600us (72% of frame time truly idle)
- FPS: ~280 (not artificially capped)
- 23/23 bcheck tests passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Windows were drawn in slot order rather than z-order, causing window
decorations to appear behind other windows' content. BWM now passes
z_order (vec index) via set_window_position op=17, and the kernel
sorts WindowCompositeInfo by z_order before GPU draws back-to-front.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
**Scheduler tick accounting (kernel/src/task/scheduler.rs):**
When a thread blocked in a syscall (sleep, waitpid, compositor_wait,
GPU command wait), the scheduler kept charging elapsed ticks as CPU
time until the next context switch. This caused all blocking processes
to report inflated CPU% (e.g., BWM showed 40% when actual CPU was 10%).

Fix: charge ticks at block time in all block_current_* functions, and
skip tick charging in schedule() for already-blocked threads. This
affects every process in Breenix — any blocking syscall now correctly
reports only active CPU time.

**BWM optimizations (userspace/programs/src/bwm.rs):**
- Replace per-pixel fill_rect with libgfx::shapes::fill_rect (3x faster,
  pre-clips once then writes raw bytes with no per-pixel bounds checks)
- Remove wasted content area fill in draw_window_frame (GPU overwrites it)
- Skip redundant full redraw when clicking already-focused/top window
- Reduce now_realtime() syscalls from every frame to every 30th frame

Result: BWM CPU 40% → 10% (accounting fix + reduced userspace work).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ryanbreen ryanbreen merged commit 6c451d9 into main Mar 13, 2026
1 of 4 checks passed
@ryanbreen ryanbreen deleted the feat/gpu-msix-zero-spin branch March 13, 2026 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant