diff --git a/docs/planning/gpu-compositing-attack-plan.md b/docs/planning/gpu-compositing-attack-plan.md new file mode 100644 index 00000000..f8a5c537 --- /dev/null +++ b/docs/planning/gpu-compositing-attack-plan.md @@ -0,0 +1,719 @@ +# GPU Compositing Attack Plan + +## Post-Mortem and Implementation Guide + +This document catalogs every attempt made to implement per-window GPU compositing, +identifies the specific root causes of failure, and provides a step-by-step plan +to implement it correctly. + +--- + +## 1. What Was Tried: Complete Catalog + +### Attempt 1: Pre-Allocated Texture Pool (8 display-sized textures at init) + +**What was done:** +- Created 8 TEXTURE_2D resources (IDs 10-17) at display size (1280x960) during + `virgl_init()`, before any SUBMIT_3D +- Each texture: `RESOURCE_CREATE_3D(TEXTURE_2D, B8G8R8X8_UNORM, BIND_SAMPLER_VIEW|BIND_SCANOUT)` +- Heap-allocated backing (4.9MB each, 39MB total), paged scatter-gather ATTACH_BACKING +- Primed each with TRANSFER_TO_HOST_3D +- `virgl_composite_gpu_batch()` built single SUBMIT_3D: + - Pipeline setup: create_sub_ctx(1), set_sub_ctx(1), tweaks, surface(10), blend(11), + DSA(12), rasterizer(13), VS(14), FS(15), VE(16), sampler_state(18) + - Background quad: sampler_view(17) on COMPOSITE_TEX(res 5), draw_vbo + - Per-window quads: sampler_view(40+i) on window tex(res 10+slot), draw_vbo each + +**Result:** BLACK screen on first boot. `prlctl capture` showed black even after +90 seconds. Later proved this was capture timing -- after 17+ minutes the display +was actually working. Incorrectly diagnosed as "ATTACH_BACKING poisons pipeline." + +**What was actually proven:** +- The display DID work after sufficient time (message 149, 154) +- 4 window textures with ATTACH_BACKING did NOT poison the pipeline (message 286 agent) +- Content was visible: bounce spheres, bcheck 23/23, btop, terminal (message 142) + +**What went wrong:** +- Premature conclusion that "ATTACH_BACKING on secondary textures poisons VirGL" +- `prlctl capture` timing issue misinterpreted as rendering failure +- 39MB heap allocation caused OOM on some boots (heap exhaustion, message 143) +- Z-order issue: window content quads drawn at same depth covered window frames + +### Attempt 2: Lazy Per-Window Texture Creation + +**What was done:** +- Textures created lazily via `init_window_texture()` when windows register +- Various bind flag combinations tested: + - `BIND_SAMPLER_VIEW` only + - `BIND_SAMPLER_VIEW | BIND_SCANOUT` + - `BIND_RENDER_TARGET | BIND_SAMPLER_VIEW | BIND_SCANOUT | BIND_SHARED` (0x14000A) + +**Result:** Window content BLACK. Background from COMPOSITE_TEX rendered correctly. +Per-window textured quads rendered as invisible/black. + +**What was proven:** +- When per-window quads sampled from COMPOSITE_TEX (instead of their own texture), + they rendered correctly (message 129) -- proving multi-quad, NDC coords, shaders, + and UV mapping all work +- `copy_window_pages_to_backing()` confirmed working -- first_pixel values showed + real application content (0x00648CDC, 0x000A0A19, etc.) not zeros (message 489-492) +- TRANSFER_TO_HOST_3D returned success for per-window textures +- Test colors (red/green/blue/yellow) injected at `init_window_texture()` time were + overwritten by actual app content before Phase A2 ran -- proving data flow works + +**Hypothesis that emerged:** "TRANSFER_TO_HOST_3D only works for resources created +before the first SUBMIT_3D" (message 135). This was tested: a 64x64 test texture +created during init showed RED. Pool textures created during init all worked. But +this hypothesis was DISPROVEN by the Linux probe VM tests (message 2707). + +### Attempt 3: Interleaved Z-Order Rendering + +**What was done:** +- For each window back-to-front: (a) frame quad from COMPOSITE_TEX at full window + bounds, (b) content quad from per-window texture at content area +- Cursor overlay quad at end (sampling cursor area from COMPOSITE_TEX) + +**Result:** Z-order fixed for the pre-allocated pool path where textures worked, +but the lazily-created textures still showed BLACK content. + +### Attempt 4: Background-Only in gpu_batch (Isolation Test) + +**What was done:** +- Disabled all per-window quads in `virgl_composite_gpu_batch()` -- drew ONLY the + background quad from COMPOSITE_TEX (message 307) +- This should produce identical output to `virgl_composite_single_quad()` + +**Result:** BLACK even with background only! (message 929) + +**Critical finding:** `virgl_composite_single_quad()` worked perfectly, but +`virgl_composite_gpu_batch()` with IDENTICAL background-only code produced BLACK. + +**What was tested to explain this:** +- Delegation: `gpu_batch()` calling `single_quad()` internally -- result unknown + due to build caching (message 2947-2959) +- Stack overflow: ruled out, 2MB stack vs ~12KB usage (message 2955) +- Code inlining: attempted to inline single_quad code into gpu_batch body (message 2959) +- Build caching: cargo builds completing in 0.06s when real recompilation takes + 5-6s, suggesting stale binaries were deployed (message 3011) + +**Root cause: ALMOST CERTAINLY BUILD CACHING** + +The 0.06-0.07s build times prove that cargo was not recompiling gpu_pci.rs. +Multiple test iterations deployed the SAME stale binary with the original +gpu_batch code, regardless of source changes. This explains why: +- "Identical" code in gpu_batch still produced BLACK (old binary still running) +- Delegation to single_quad appeared not to work (old binary still running) +- Changes to bind flags, handle IDs, etc. had no effect (old binary still running) + +### Attempt 5: Multi-Draw Test (Positive Control) + +**What was done:** +- Modified `virgl_composite_single_quad()` to add ONE extra draw_vbo after the + fullscreen background quad (message 290) +- Second quad: top-right corner with flipped UV mapping + +**Result:** WORKED (message 293). Both quads visible -- normal background and +upside-down test rectangle. This proved multiple draw_vbo calls in one SUBMIT_3D +batch work on Parallels. + +### Attempt 6: Revert + +All per-window texture code was removed. Reverted to CPU blit via +`virgl_composite_single_quad()`. + +--- + +## 2. Linux Probe VM Evidence + +Eight C test programs were run on the Linux probe VM (Parallels ARM64, Ubuntu 24.04.4, +kernel 6.8.0, virtio_gpu DRM card1, 3D accel highest). + +### Definitive Test: `poison_fixed.c` + +1. Create display TEXTURE_2D (B8G8R8X8_UNORM, RT|SV|SCANOUT, 1024x768) +2. Map + fill with BLUE via CPU + TRANSFER_TO_HOST +3. Establish DRM scanout (AddFB + SetCrtc) +4. Create N extra TEXTURE_2D (SV|SCANOUT, 128x128), map, fill, TRANSFER_TO_HOST +5. VirGL SUBMIT_3D: create_sub_ctx + CLEAR to RED +6. Re-display (AddFB + SetCrtc) + +**Results:** +- 0 extra textures: RED (pass) +- 1 extra texture: RED (pass) +- 8 extra textures: RED (pass) +- 32 extra textures: RED (pass) + +### EGL Multi-Texture Test: `gl_multi_texture_test.c` + +Used Mesa's EGL surfaceless + GBM + GLES2 pipeline on `/dev/dri/renderD128`: +- Created 2 FBO textures, rendered different colors into each +- Composited both as textured quads onto a third surface + +**Result:** Multi-texture VirGL rendering CONFIRMED WORKING on Parallels. + +### Key Insight from Linux + +Linux's DRM driver uses the exact same protocol: `DRM_IOCTL_VIRTGPU_RESOURCE_CREATE` + +`DRM_IOCTL_VIRTGPU_MAP` + `DRM_IOCTL_VIRTGPU_TRANSFER_TO_HOST`. For each resource: +1. RESOURCE_CREATE_3D: target=TEXTURE_2D(2), format=B8G8R8X8_UNORM(2), + bind=SV|SCANOUT(0x40008) +2. ATTACH_BACKING: automatic via GEM BO, per-page scatter-gather +3. TRANSFER_TO_HOST_3D: box={0,0,0,w,h,1}, level=0 + +Resources can be created at ANY time -- there is no requirement that they be +created before the first SUBMIT_3D. The "must create before first SUBMIT_3D" +hypothesis was a misinterpretation of a build caching artifact. + +--- + +## 3. Identified Root Causes + +### Root Cause 1: Build Caching (CONFIRMED -- High Confidence) + +**Evidence:** Cargo build times of 0.06-0.07s vs 5-6s for real recompilation. +Multiple iterations of "change code, rebuild, deploy, test" were deploying the +exact same stale binary. This made it appear that: +- Changes to gpu_batch had no effect (stale binary) +- gpu_batch with "identical" code to single_quad still failed (stale binary) +- Bind flag changes didn't help (stale binary) +- Handle changes didn't help (stale binary) + +**Why:** `gpu_pci.rs` is 4262 lines. Touch detection may not propagate through +the dependency chain correctly, or Parallels VM deployment may reuse a cached +disk image. + +**Fix:** Always `touch kernel/src/drivers/virtio/gpu_pci.rs` before building. +Always verify build time is >3 seconds. Always check the `.elf` timestamp. +Always use `run.sh --parallels` which handles the full pipeline including +userspace rebuild and fresh VM creation. + +### Root Cause 2: Per-Window Texture Sampling (UNRESOLVED -- Needs Investigation) + +**What we know:** +- Per-window quads sampling from COMPOSITE_TEX: WORKS (same batch, same shader) +- Per-window quads sampling from their own texture: BLACK +- TRANSFER_TO_HOST_3D returns success for per-window textures +- Backing data is confirmed non-zero (real app content in first_pixel) + +**What we DON'T know (because of build caching):** +- Whether any of the "fixes" tried would have actually worked if properly deployed +- Whether the 64x64 test texture created at init worked because of timing or + because of size +- Whether the handle allocation scheme actually collided + +**Possible sub-causes (all plausible, none definitively confirmed or eliminated):** + +**2a. Missing CTX_ATTACH_RESOURCE for per-window textures** + +If `virgl_ctx_attach_resource_cmd()` was not called for per-window texture +resources, the VirGL context would not have access to them. The sampler_view +would reference a resource the context cannot see, producing BLACK. + +The `init_composite_texture()` function (WORKS) calls: +``` +virgl_ctx_attach_resource_cmd(state, VIRGL_CTX_ID, RESOURCE_COMPOSITE_TEX_ID) +``` + +The per-window `init_window_texture()` code DOES call this. So this is likely +not the issue, but MUST be verified in the new implementation. + +**2b. create_sampler_view format encoding error for per-window textures** + +Memory note: "create_sampler_view format MUST include texture target -- bits +[24:31] must contain PIPE_TEXTURE_2D << 24. Without it, host creates +BUFFER-targeted sampler view -> black." + +The `create_sampler_view` function in virgl.rs correctly encodes: +```rust +let fmt_target = (format & 0x00FF_FFFF) | ((target & 0xFF) << 24); +``` + +And the call in the batch used `pipe::TEXTURE_2D` for target. This encoding +is correct. However, if the wrong format constant was passed (e.g., a raw +number instead of the pipe constant), it would fail silently. + +The background sampler_view(17) uses `vfmt::B8G8R8X8_UNORM, pipe::TEXTURE_2D`. +The per-window sampler_view(40+i) should use the same. If they used a different +format (e.g., B8G8R8A8_UNORM for the texture resource but B8G8R8X8_UNORM for the +sampler_view, or vice versa), the host could reject it silently. + +**2c. Texture not primed (TRANSFER_TO_HOST_3D before first sample)** + +COMPOSITE_TEX is primed during init. Per-window textures ARE primed during +`init_window_texture()`. But if the priming TRANSFER_TO_HOST_3D was called +AFTER the first SUBMIT_3D that tried to sample from the texture (due to +timing), the host might have cached the texture as empty. + +On the other hand, per-frame TRANSFER_TO_HOST_3D uploads should override +this. So this is unlikely but should be verified. + +**2d. TRANSFER_TO_HOST_3D stride mismatch** + +If the backing buffer has a different stride (row pitch) than what's passed +to TRANSFER_TO_HOST_3D, the host reads garbage. For COMPOSITE_TEX, stride = +`tex_w * 4` (correct). For per-window textures, stride should be `win_w * 4` +if the backing is exactly win_w*win_h*4 bytes. If pool textures are +display-sized but the transfer uses window dimensions with window stride, +the host would read the right data. But if the stride is wrong, the upload +silently corrupts. + +**2e. VirGL batch rejection by a bad command** + +If ANY command in a SUBMIT_3D batch is malformed, virglrenderer may reject +the ENTIRE batch silently. A bad `create_sampler_view` for a per-window +texture could poison the whole batch, causing even the background quad to +go BLACK. + +This would explain the "even background-only gpu_batch is BLACK" finding -- +IF the gpu_batch code had additional commands (even unreachable ones) that +were malformed. However, the "background-only" test was supposed to remove +all per-window commands. + +**Since build caching was occurring, we cannot know whether the +background-only test actually ran the modified code.** + +### Root Cause 3: prlctl capture Timing (CONFIRMED) + +`prlctl capture` returns BLACK for 60-90 seconds after boot with VirGL GPU +compositing. This caused multiple false "BLACK screen" diagnoses. The display +WAS rendering correctly -- it just wasn't visible to the capture API. + +**Fix:** Always wait at least 90 seconds before capturing. Take multiple captures +5 seconds apart. Use VNC or direct visual inspection when possible. + +--- + +## 4. Handle Allocation Analysis + +### VirGL Object Handles (within SUBMIT_3D, single hash table per sub-context) + +These are VirGL object handles -- NOT resource IDs. They share one namespace: + +| Handle | Object Type | Used By | +|--------|------------|---------| +| 10 | surface | Render target surface on RESOURCE_3D_ID(2) | +| 11 | blend | Simple blend (dither, RGBA colormask) | +| 12 | DSA | Default depth-stencil-alpha | +| 13 | rasterizer | Default rasterizer | +| 14 | shader (VS) | Texture vertex shader | +| 15 | shader (FS) | Texture fragment shader | +| 16 | vertex_elements | 2-attribute VE (pos + texcoord) | +| 17 | sampler_view | Background sampler view on COMPOSITE_TEX(res 5) | +| 18 | sampler_state | Nearest filter, clamp-to-edge | +| 40+i | sampler_view | Per-window sampler view on window tex(res 10+i) | + +### GPU Resource IDs (global, outside SUBMIT_3D) + +| Resource ID | Type | Purpose | +|-------------|------|---------| +| 2 (RESOURCE_3D_ID) | TEXTURE_2D | VirGL render target (scanout) | +| 3 (RESOURCE_VB_ID) | BUFFER | Vertex buffer (INLINE_WRITE) | +| 5 (RESOURCE_COMPOSITE_TEX_ID) | TEXTURE_2D | Compositor background texture | +| 10-17 | TEXTURE_2D | Per-window texture slots | + +### Collision Analysis + +VirGL object handles (surface, blend, DSA, etc.) live in a SEPARATE namespace +from GPU resource IDs. Handle 10 (surface) does NOT collide with Resource ID 10 +(window texture). These go through different code paths: +- Object handles: `create_surface(handle=10, ...)` inside SUBMIT_3D +- Resource IDs: `virgl_resource_create_3d_cmd(res_id=10, ...)` outside SUBMIT_3D + +Within the VirGL object namespace, there are NO collisions in the scheme above: +- Handles 10-18 for pipeline objects +- Handles 40+ for per-window sampler_views + +**HOWEVER:** If per-window sampler_view handles collide with ANY pipeline object +handle, virglrenderer replaces the existing object. Handle 17 (bg sampler_view) +is recreated per-frame, and handle 17 is also used for frame-quad sampler_views +in the z-order loop. Recreating handle 17 multiple times per batch (once for bg, +once per frame quad) is fine -- virglrenderer replaces the previous object. + +### Recommended Handle Scheme for New Implementation + +Use explicit, well-separated ranges: + +``` +Pipeline objects (created once per batch): + 100: surface (render target) + 101: blend + 102: DSA + 103: rasterizer + 104: VS shader + 105: FS shader + 106: vertex_elements + 107: sampler_state + +Sampler views (re-created per draw): + 200: background sampler_view (COMPOSITE_TEX) + 201+i: per-window sampler_view (window texture i) +``` + +Resource IDs (unchanged): +``` + 2: render target + 3: vertex buffer + 5: compositor texture + 10+i: per-window texture i +``` + +--- + +## 5. Exact VirGL Command Sequence for Multi-Texture Compositing + +### Per-Window Texture Resource Setup (outside SUBMIT_3D, during init or window register) + +For each window texture slot i (resource ID = 10+i): + +``` +1. RESOURCE_CREATE_3D: + resource_id = 10 + i + target = TEXTURE_2D (2) + format = B8G8R8X8_UNORM (2) + bind = BIND_SAMPLER_VIEW (0x8) | BIND_SCANOUT (0x40000) + width = window_width (or display_width for pool) + height = window_height (or display_height for pool) + depth = 1, array_size = 1, last_level = 0, nr_samples = 0 + +2. ATTACH_BACKING (paged scatter-gather): + resource_id = 10 + i + nr_entries = num_pages + entries = [{page_phys_addr, 4096}, ...] for each page + +3. CTX_ATTACH_RESOURCE: + ctx_id = VIRGL_CTX_ID (1) + resource_id = 10 + i + +4. TRANSFER_TO_HOST_3D (priming): + resource_id = 10 + i + box = {0, 0, 0, width, height, 1} + level = 0 + stride = width * 4 +``` + +### Per-Frame Upload (outside SUBMIT_3D, for each dirty window) + +``` +1. Cache clean backing memory (ARM64: DC CIVAC on dirty range) +2. TRANSFER_TO_HOST_3D: + resource_id = 10 + slot + box = {0, 0, 0, win_width, win_height, 1} + stride = win_width * 4 (or pool_width * 4 if pool-sized backing) +``` + +### Per-Frame SUBMIT_3D Batch + +``` +// Pipeline setup (same as working virgl_composite_single_quad) +create_sub_ctx(1) +set_sub_ctx(1) +set_tweaks(1, 1) +set_tweaks(2, display_width) +create_surface(100, RESOURCE_3D_ID, B8G8R8X8_UNORM, 0, 0) +set_framebuffer_state(zsurf=0, cbufs=[100]) +create_blend_simple(101) +bind_object(101, BLEND) +create_dsa_default(102) +bind_object(102, DSA) +create_rasterizer_default(103) +bind_object(103, RASTERIZER) + +// Shaders (num_tokens=300 required by Parallels) +create_shader(104, VERTEX, 300, TEX_VS_TGSI) +bind_shader(104, VERTEX) +create_shader(105, FRAGMENT, 300, TEX_FS_TGSI) +bind_shader(105, FRAGMENT) + +// Vertex elements: 2 attributes (position vec4 + texcoord vec4) +create_vertex_elements(106, [(0,0,0,R32G32B32A32_FLOAT), (16,0,0,R32G32B32A32_FLOAT)]) +bind_object(106, VERTEX_ELEMENTS) + +// Sampler state (shared by all draws) +create_sampler_state(107, CLAMP_TO_EDGE, CLAMP_TO_EDGE, CLAMP_TO_EDGE, + NEAREST, MIPFILTER_NONE, NEAREST) +bind_sampler_states(FRAGMENT, 0, [107]) +set_min_samples(1) +set_viewport(display_width, display_height) + +// CLEAR (optional -- background quad will cover entire screen) +clear_color(0.0, 0.0, 0.0, 1.0) + +// === Draw 0: Background quad (fullscreen, from COMPOSITE_TEX) === +create_sampler_view(200, RESOURCE_COMPOSITE_TEX_ID, B8G8R8X8_UNORM, + TEXTURE_2D, 0, 0, 0, 0, IDENTITY_SWIZZLE) +set_sampler_views(FRAGMENT, 0, [200]) +resource_inline_write(VB_RES_ID, 0, 128, bg_quad_verts) // fullscreen NDC +set_vertex_buffers([(32, 0, VB_RES_ID)]) +draw_vbo(0, 4, TRIANGLE_FAN, 3) + +// === Draw 1..N: Per-window quads (back-to-front z-order) === +for each window (back to front): + // Frame quad: from COMPOSITE_TEX at window bounds (includes frame/decorations) + create_sampler_view(200, RESOURCE_COMPOSITE_TEX_ID, B8G8R8X8_UNORM, + TEXTURE_2D, 0, 0, 0, 0, IDENTITY_SWIZZLE) + set_sampler_views(FRAGMENT, 0, [200]) + resource_inline_write(VB_RES_ID, offset, 128, frame_quad_verts) + set_vertex_buffers([(32, 0, VB_RES_ID)]) + draw_vbo(start, 4, TRIANGLE_FAN, 3) + + // Content quad: from per-window texture at content area + create_sampler_view(201 + i, window_res_id, B8G8R8X8_UNORM, + TEXTURE_2D, 0, 0, 0, 0, IDENTITY_SWIZZLE) + set_sampler_views(FRAGMENT, 0, [201 + i]) + resource_inline_write(VB_RES_ID, offset, 128, content_quad_verts) + set_vertex_buffers([(32, 0, VB_RES_ID)]) + draw_vbo(start, 4, TRIANGLE_FAN, 3) + +// === Final: Cursor overlay (from COMPOSITE_TEX cursor area) === +create_sampler_view(200, RESOURCE_COMPOSITE_TEX_ID, B8G8R8X8_UNORM, + TEXTURE_2D, 0, 0, 0, 0, IDENTITY_SWIZZLE) +set_sampler_views(FRAGMENT, 0, [200]) +resource_inline_write(VB_RES_ID, offset, 128, cursor_quad_verts) +set_vertex_buffers([(32, 0, VB_RES_ID)]) +draw_vbo(start, 4, TRIANGLE_FAN, 3) +``` + +After SUBMIT_3D: +``` +SET_SCANOUT(scanout=0, resource=RESOURCE_3D_ID) +RESOURCE_FLUSH(resource=RESOURCE_3D_ID, rect=0,0,display_w,display_h) +``` + +### TGSI Shaders (Proven Working) + +Vertex shader: +``` +VERT +DCL IN[0] +DCL IN[1] +DCL OUT[0], POSITION +DCL OUT[1], GENERIC[0] + 0: MOV OUT[0], IN[0] + 1: MOV OUT[1], IN[1] + 2: END +``` + +Fragment shader: +``` +FRAG +PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1 +DCL IN[0], GENERIC[0], LINEAR +DCL OUT[0], COLOR +DCL SAMP[0] +DCL SVIEW[0], 2D, FLOAT + 0: TEX OUT[0], IN[0], SAMP[0], 2D + 1: END +``` + +### NDC Coordinate Conversion + +Screen pixel (px, py) to NDC: +``` +ndc_x = (px / display_w) * 2.0 - 1.0 +ndc_y = 1.0 - (py / display_h) * 2.0 // Y flipped +``` + +Texture UV for per-window content: +``` +u_max = win_width / tex_width // <1.0 if tex is pool-sized +v_max = win_height / tex_height +``` + +Quad vertices (4 verts, TRIANGLE_FAN, each 8 floats = 32 bytes): +``` +top-left: (ndc_x0, ndc_y0, 0, 1, u0, v0, 0, 0) +bottom-left: (ndc_x0, ndc_y1, 0, 1, u0, v1, 0, 0) +bottom-right: (ndc_x1, ndc_y1, 0, 1, u1, v1, 0, 0) +top-right: (ndc_x1, ndc_y0, 0, 1, u1, v0, 0, 0) +``` + +--- + +## 6. Step-by-Step Implementation Plan + +### Phase 0: Eliminate Build Caching (MANDATORY FIRST STEP) + +Before any code changes, establish a reliable build+deploy+verify loop: + +1. **Add a build canary:** At the top of `virgl_composite_single_quad()`, add: + ```rust + static BUILD_ID: AtomicU32 = AtomicU32::new(0); + let id = BUILD_ID.fetch_add(1, Ordering::Relaxed); + if id == 0 { + crate::serial_println!("[BUILD] gpu_pci.rs build={}", env!("BUILD_TIMESTAMP_PLACEHOLDER")); + // Or simpler: manually increment a constant each rebuild + crate::serial_println!("[BUILD] gpu_pci.rs version=42"); + } + ``` + Change the version number with EVERY rebuild. If the serial log shows the + wrong version number, the build cache is stale. + +2. **Always touch before building:** + ```bash + touch kernel/src/drivers/virtio/gpu_pci.rs + ``` + +3. **Verify build time:** Real recompilation of gpu_pci.rs takes >3 seconds. + If cargo finishes in <1 second, the build is cached/stale. + +4. **Always use `run.sh --parallels`:** This handles the full pipeline: userspace + rebuild, ext2 disk creation, fresh VM, serial log truncation. + +5. **Wait 90+ seconds** before `prlctl capture`. Take 3 captures 5s apart. + +### Phase 1: Prove Multi-Texture Sampling Works (Minimal Test) + +**Goal:** Two textures, two quads, one SUBMIT_3D batch. All in virgl_init(). + +**Steps:** + +1. Create a second test texture (resource ID 20, small: 64x64) during virgl_init, + AFTER COMPOSITE_TEX but BEFORE the Step 9 SUBMIT_3D: + ``` + RESOURCE_CREATE_3D(res=20, TEXTURE_2D, B8G8R8X8_UNORM, SV|SCANOUT, 64, 64) + ATTACH_BACKING(res=20, paged scatter-gather) + CTX_ATTACH_RESOURCE(ctx=1, res=20) + Fill backing with solid RED (0x00FF0000 in BGRX) + cache clean + TRANSFER_TO_HOST_3D(res=20, box=0,0,0,64,64,1, stride=256) + ``` + +2. In Step 9 SUBMIT_3D batch, add a second draw_vbo AFTER the existing red quad: + ``` + // Draw 1: existing fullscreen colored quad (constant buffer FS) + // Draw 2: small textured quad at top-right from test texture res 20 + create_sampler_view(200, res=20, B8G8R8X8_UNORM, TEXTURE_2D, ...) + set_sampler_views(FRAGMENT, 0, [200]) + // Switch to texture FS (create_shader + bind_shader) + resource_inline_write(VB_RES_ID, 128, 128, small_quad_verts) + set_vertex_buffers([(32, 0, VB_RES_ID)]) + draw_vbo(4, 4, TRIANGLE_FAN, 7) + ``` + +3. **Verification:** Display should show dark blue background (CLEAR) with + fullscreen red quad AND a small red rectangle at top-right (from the test + texture). If the test texture quad is BLACK, the texture sampling is broken. + If it's RED, it works and we can proceed. + +4. **If BLACK:** Compare the create_sampler_view encoding byte-for-byte with + the working COMPOSITE_TEX sampler_view. Check: + - Is the format DWORD identical? (bits [0:23] = format, [24:31] = target) + - Is CTX_ATTACH_RESOURCE called for resource 20? + - Is the TRANSFER_TO_HOST_3D box correct? + - Print the raw hex of the SUBMIT_3D buffer and diff the two sampler_view + commands + +### Phase 2: Per-Window Texture in Production Pipeline + +Only proceed after Phase 1 passes. + +1. **Add a `virgl_create_window_texture()` function** (not a pool -- one per window): + ```rust + fn virgl_create_window_texture( + slot: usize, width: u32, height: u32 + ) -> Result { + let res_id = 10 + slot as u32; + // Same pattern as init_composite_texture: + // 1. Heap allocate backing (page-aligned) + // 2. RESOURCE_CREATE_3D(res_id, TEXTURE_2D, B8G8R8X8_UNORM, SV|SCANOUT, w, h) + // 3. virgl_attach_backing_paged(res_id, ptr, size) + // 4. virgl_ctx_attach_resource_cmd(VIRGL_CTX_ID, res_id) + // 5. cache clean + TRANSFER_TO_HOST_3D (prime) + Ok(res_id) + } + ``` + +2. **Call from graphics.rs** when a window registers (lazy init). + +3. **Per-frame upload:** In `virgl_composite_windows()`, for each dirty window: + - Copy MAP_SHARED pages to contiguous backing (`copy_window_pages_to_backing`) + - Cache clean the backing + - TRANSFER_TO_HOST_3D + +4. **Modify `virgl_composite_single_quad()` to accept window quads:** + + Rather than creating a new gpu_batch function, EXTEND the existing working + function. This eliminates the "two functions, one works, one doesn't" problem. + + Add a parameter for optional per-window quads: + ```rust + fn virgl_composite_single_quad_with_windows( + windows: &[WindowQuadInfo], + ) -> Result<(), &'static str> + ``` + + Start with ZERO windows (identical to current single_quad). Add windows + one at a time, testing after each addition. + +5. **Verification at each step:** + - 0 windows: identical to current display (regression test) + - 1 window: background + one window content quad (should show window content) + - N windows: background + all windows in z-order + +### Phase 3: Z-Order Interleaving + +Once per-window texture sampling is confirmed working: + +1. For each window (back to front): + - Draw frame quad from COMPOSITE_TEX (window bounds including title bar + border) + - Draw content quad from per-window texture (content area only) +2. Final cursor overlay quad from COMPOSITE_TEX + +### Phase 4: Remove CPU Blit from BWM + +1. BWM stops calling `blit_client_pixels()` for windows with GPU textures +2. BWM only composites background, decorations, and cursor into COMPOSITE_TEX +3. Per-window content goes directly from MAP_SHARED pages to GPU texture backing + +--- + +## 7. Diagnostic Checklist + +When per-window texture sampling produces BLACK, check these in order: + +1. **Build canary:** Does the serial log show the expected build version number? + If not, STOP -- you are running stale code. + +2. **CTX_ATTACH_RESOURCE:** Was `virgl_ctx_attach_resource_cmd(1, res_id)` called + for this texture resource? Check serial log for the attach message. + +3. **create_sampler_view format DWORD:** Print the raw u32 value of the fmt_target + parameter. It MUST be `(B8G8R8X8_UNORM & 0x00FFFFFF) | (TEXTURE_2D << 24)` = + `0x02000002`. If it's `0x00000002`, the texture target is missing -> BLACK. + +4. **TRANSFER_TO_HOST_3D box:** Print the box parameters. Width and height must + match the resource dimensions, not zero. Stride must be `width * 4`. + +5. **Backing data:** Print first_pixel of the backing buffer after copy. If it's + zero, the copy failed. If it's non-zero, the data is present. + +6. **Batch rejection:** If EVEN the background quad is BLACK, a command earlier + in the batch is poisoning it. Binary search: comment out the last half of + commands, test, narrow down. + +7. **prlctl capture timing:** Wait 90 seconds. Take 3 captures. If all 3 are + black, it's a real rendering issue. If the 3rd shows content, it's timing. + +--- + +## 8. What NOT to Do + +1. **Do NOT create a separate gpu_batch function.** Extend the working + `virgl_composite_single_quad()` incrementally. The original failure was + partly caused by having two "identical" functions where one worked and + one didn't -- a situation made impossible to debug by build caching. + +2. **Do NOT pre-allocate a pool of 8 display-sized textures.** Each texture + is 4.9MB of heap. 8 textures = 39MB. This caused OOM on some boots. + Create textures at actual window dimensions, lazily. + +3. **Do NOT conclude "ATTACH_BACKING poisons the pipeline"** without first + verifying the build canary. This was a false conclusion caused by stale + binaries. + +4. **Do NOT change multiple variables at once.** Change one thing, verify the + build canary, wait 90 seconds, capture. One variable per test cycle. + +5. **Do NOT use `prlctl capture` as the sole verification method.** Also check + serial output for SUBMIT_3D success/failure, check for error responses, + and use visual inspection via the VM window when possible. diff --git a/kernel/src/drivers/virtio/gpu_pci.rs b/kernel/src/drivers/virtio/gpu_pci.rs index b0695b25..4e764435 100644 --- a/kernel/src/drivers/virtio/gpu_pci.rs +++ b/kernel/src/drivers/virtio/gpu_pci.rs @@ -444,6 +444,11 @@ const TEST_TEX_BYTES: usize = (TEST_TEX_DIM * TEST_TEX_DIM * 4) as usize; /// Resource ID for the compositor texture (BWM uploads pixel buffers here) const RESOURCE_COMPOSITE_TEX_ID: u32 = 5; +/// Resource ID for the GPU cursor texture (12x18 arrow, uploaded once at init) +const RESOURCE_CURSOR_TEX_ID: u32 = 6; +/// Cursor bitmap dimensions +const CURSOR_TEX_W: u32 = 12; +const CURSOR_TEX_H: u32 = 18; // VirtIO standard feature bits const VIRTIO_F_VERSION_1: u64 = 1 << 32; @@ -557,6 +562,133 @@ static COMPOSITE_TEX_H: AtomicU32 = AtomicU32::new(0); /// Whether the compositor texture resource has been initialized. static COMPOSITE_TEX_READY: AtomicBool = AtomicBool::new(false); +/// Whether the cursor GPU texture has been initialized. +static CURSOR_TEX_READY: AtomicBool = AtomicBool::new(false); + +// ============================================================================= +// Per-Window GPU Textures +// ============================================================================= + +/// Base resource ID for per-window textures. Window slot N gets resource (10 + N). +const RESOURCE_WIN_TEX_BASE: u32 = 10; +/// Maximum number of per-window texture slots. +const MAX_WIN_TEX_SLOTS: usize = 8; + +/// Per-slot backing buffer pointer and length. +static mut WIN_TEX_BACKING: [(*mut u8, usize); MAX_WIN_TEX_SLOTS] = [(core::ptr::null_mut(), 0); MAX_WIN_TEX_SLOTS]; +/// Width/height of each slot's texture. +static mut WIN_TEX_DIMS: [(u32, u32); MAX_WIN_TEX_SLOTS] = [(0, 0); MAX_WIN_TEX_SLOTS]; +/// Whether each slot has been initialized. +static mut WIN_TEX_INITIALIZED: [bool; MAX_WIN_TEX_SLOTS] = [false; MAX_WIN_TEX_SLOTS]; + +/// Create a per-window VirGL texture for GPU compositing. +/// +/// Same resource creation pattern as COMPOSITE_TEX (proven working): +/// RESOURCE_CREATE_3D -> ATTACH_BACKING -> CTX_ATTACH -> TRANSFER_TO_HOST_3D +pub fn create_window_texture( + slot: usize, + width: u32, + height: u32, +) -> Result { + use super::virgl::{format as vfmt, pipe}; + + if slot >= MAX_WIN_TEX_SLOTS { + return Err("window texture slot out of range"); + } + + let res_id = RESOURCE_WIN_TEX_BASE + slot as u32; + + // Already initialized — return existing + if unsafe { WIN_TEX_INITIALIZED[slot] } { + return Ok(res_id); + } + + let tex_size = (width as usize) * (height as usize) * 4; + let layout = alloc::alloc::Layout::from_size_align(tex_size, 4096) + .map_err(|_| "invalid window texture layout")?; + let ptr = unsafe { alloc::alloc::alloc_zeroed(layout) }; + if ptr.is_null() { + return Err("failed to allocate window texture backing"); + } + + // RESOURCE_CREATE_3D — same bind flags as COMPOSITE_TEX + with_device_state(|state| { + virgl_resource_create_3d_cmd( + state, res_id, pipe::TEXTURE_2D, vfmt::B8G8R8X8_UNORM, + pipe::BIND_SAMPLER_VIEW | pipe::BIND_SCANOUT, + width, height, 1, 1, + ) + })?; + + // ATTACH_BACKING (paged scatter-gather) + with_device_state(|state| { + virgl_attach_backing_paged(state, res_id, ptr, tex_size) + })?; + + // CTX_ATTACH_RESOURCE + with_device_state(|state| { + virgl_ctx_attach_resource_cmd(state, VIRGL_CTX_ID, res_id) + })?; + + // Prime with TRANSFER_TO_HOST_3D + dma_cache_clean(ptr, tex_size); + with_device_state(|state| { + transfer_to_host_3d(state, res_id, 0, 0, width, height, width * 4) + })?; + + unsafe { + WIN_TEX_BACKING[slot] = (ptr, tex_size); + WIN_TEX_DIMS[slot] = (width, height); + WIN_TEX_INITIALIZED[slot] = true; + } + + crate::serial_println!( + "[virgl-win-tex] Created: slot={} res_id={} {}x{} ({}B)", + slot, res_id, width, height, tex_size + ); + Ok(res_id) +} + +/// Upload dirty window pixels to GPU texture via TRANSFER_TO_HOST_3D. +/// Copies scattered MAP_SHARED pages to contiguous backing, then uploads. +fn upload_window_texture( + slot: usize, + width: u32, + height: u32, + page_phys_addrs: &[u64], + total_size: usize, +) -> Result<(), &'static str> { + if slot >= MAX_WIN_TEX_SLOTS { return Err("slot out of range"); } + let (backing_ptr, backing_len) = unsafe { WIN_TEX_BACKING[slot] }; + if backing_ptr.is_null() { return Err("backing not allocated"); } + + let win_bytes = (width as usize) * (height as usize) * 4; + let copy_len = win_bytes.min(total_size).min(backing_len); + + // Copy scattered pages to contiguous backing. + // page_phys_addrs contains PHYSICAL addresses — convert to kernel virtual. + let mut copied = 0usize; + for &page_phys in page_phys_addrs { + if copied >= copy_len { break; } + let chunk = 4096usize.min(copy_len - copied); + let virt = phys_to_kern_virt(page_phys); + unsafe { + core::ptr::copy_nonoverlapping( + virt as *const u8, + backing_ptr.add(copied), + chunk, + ); + } + copied += chunk; + } + + let res_id = RESOURCE_WIN_TEX_BASE + slot as u32; + dma_cache_clean(backing_ptr, copy_len); + with_device_state(|state| { + transfer_to_host_3d(state, res_id, 0, 0, width, height, width * 4) + }) +} + /// Allocate and initialize the compositor texture resource for GPU compositing. /// Creates a TEXTURE_2D resource with SAMPLER_VIEW bind, attaches heap-allocated /// backing, and primes it with TRANSFER_TO_HOST_3D. @@ -615,48 +747,149 @@ fn init_composite_texture(width: u32, height: u32) -> Result<(), &'static str> { COMPOSITE_TEX_READY.store(true, Ordering::Release); crate::serial_println!("[virgl-composite] Texture resource initialized (id={})", RESOURCE_COMPOSITE_TEX_ID); - // ── Pre-allocate per-window texture pool ── - // Parallels requires resources to be created BEFORE the first SUBMIT_3D. - // Resources created after SUBMIT_3D has been called don't get their - // TRANSFER_TO_HOST_3D data. Pre-allocate all slots now with display-sized - // backing so they're ready when windows appear. - let pool_w = width; - let pool_h = height; - let pool_size = (pool_w as usize) * (pool_h as usize) * 4; - let mut pool_count = 0usize; + // Pre-allocate per-window texture pool at init time. + // TRANSFER_TO_HOST_3D only works for resources created before first SUBMIT_3D. for slot in 0..MAX_WIN_TEX_SLOTS { + let max_w: u32 = 1024; + let max_h: u32 = 768; + let tex_size = (max_w as usize) * (max_h as usize) * 4; let res_id = RESOURCE_WIN_TEX_BASE + slot as u32; - let layout = alloc::alloc::Layout::from_size_align(pool_size, 4096) - .map_err(|_| "win texture pool: layout error")?; + + let layout = alloc::alloc::Layout::from_size_align(tex_size, 4096) + .map_err(|_| "invalid pre-alloc texture layout")?; let ptr = unsafe { alloc::alloc::alloc_zeroed(layout) }; if ptr.is_null() { - crate::serial_println!("[virgl-pool] slot {} alloc failed, pool stopped at {}", slot, slot); - break; + return Err("failed to allocate pre-alloc texture backing"); + } + + // Fill slot 0 with red for testing + if slot == 0 { + unsafe { + let px = ptr as *mut u32; + let count = (max_w as usize) * (max_h as usize); + for i in 0..count { *px.add(i) = 0x000000FF; } // B8G8R8X8: red + } } with_device_state(|state| { - virgl_resource_create_3d_cmd( - state, res_id, pipe::TEXTURE_2D, vfmt::B8G8R8X8_UNORM, - pipe::BIND_SAMPLER_VIEW | pipe::BIND_SCANOUT, - pool_w, pool_h, 1, 1, - ) + virgl_resource_create_3d_cmd(state, res_id, pipe::TEXTURE_2D, vfmt::B8G8R8X8_UNORM, + pipe::BIND_SAMPLER_VIEW | pipe::BIND_SCANOUT, max_w, max_h, 1, 1) })?; with_device_state(|state| { - virgl_attach_backing_paged(state, res_id, ptr, pool_size) + virgl_attach_backing_paged(state, res_id, ptr, tex_size) })?; with_device_state(|state| { virgl_ctx_attach_resource_cmd(state, VIRGL_CTX_ID, res_id) })?; - dma_cache_clean(ptr, pool_size); + dma_cache_clean(ptr, tex_size); with_device_state(|state| { - transfer_to_host_3d(state, res_id, 0, 0, pool_w, pool_h, pool_w * 4) + transfer_to_host_3d(state, res_id, 0, 0, max_w, max_h, max_w * 4) })?; - unsafe { WIN_TEX_BACKING[slot] = (ptr, pool_size); } - pool_count += 1; + unsafe { + WIN_TEX_BACKING[slot] = (ptr, tex_size); + WIN_TEX_DIMS[slot] = (max_w, max_h); + WIN_TEX_INITIALIZED[slot] = true; + } + crate::serial_println!("[virgl-pool] Pre-allocated slot={} res_id={} {}x{}", slot, res_id, max_w, max_h); } - crate::serial_println!("[virgl-pool] Pre-allocated {}/{} window texture slots ({}x{}, {}KB each)", - pool_count, MAX_WIN_TEX_SLOTS, pool_w, pool_h, pool_size / 1024); + + // Initialize cursor GPU texture (12x18 arrow bitmap, uploaded once) + init_cursor_texture()?; + + Ok(()) +} + +/// Initialize a small GPU texture containing the cursor arrow bitmap. +/// +/// The cursor is rendered as a GPU quad in `virgl_composite_single_quad()`, +/// sampling from this texture. This avoids stamping the cursor into COMPOSITE_TEX +/// (which caused ghost trails when the saved background was stale). +fn init_cursor_texture() -> Result<(), &'static str> { + use super::virgl::{format as vfmt, pipe}; + + // Arrow cursor bitmap: 1=white, 2=black outline, 0=transparent (12x18) + const CURSOR_BITMAP: [[u8; 12]; 18] = [ + [2,0,0,0,0,0,0,0,0,0,0,0], + [2,2,0,0,0,0,0,0,0,0,0,0], + [2,1,2,0,0,0,0,0,0,0,0,0], + [2,1,1,2,0,0,0,0,0,0,0,0], + [2,1,1,1,2,0,0,0,0,0,0,0], + [2,1,1,1,1,2,0,0,0,0,0,0], + [2,1,1,1,1,1,2,0,0,0,0,0], + [2,1,1,1,1,1,1,2,0,0,0,0], + [2,1,1,1,1,1,1,1,2,0,0,0], + [2,1,1,1,1,1,1,1,1,2,0,0], + [2,1,1,1,1,1,1,1,1,1,2,0], + [2,1,1,1,1,1,2,2,2,2,2,0], + [2,1,1,1,1,2,0,0,0,0,0,0], + [2,1,1,2,1,1,2,0,0,0,0,0], + [2,1,2,0,2,1,1,2,0,0,0,0], + [2,2,0,0,2,1,1,2,0,0,0,0], + [2,0,0,0,0,2,1,2,0,0,0,0], + [0,0,0,0,0,2,2,0,0,0,0,0], + ]; + + let w = CURSOR_TEX_W; + let h = CURSOR_TEX_H; + let size = (w as usize) * (h as usize) * 4; + + // Allocate page-aligned backing (single 4KB page is sufficient for 12*18*4=864 bytes) + let layout = alloc::alloc::Layout::from_size_align(4096, 4096) + .map_err(|_| "invalid cursor texture layout")?; + let ptr = unsafe { alloc::alloc::alloc_zeroed(layout) }; + if ptr.is_null() { + return Err("failed to allocate cursor texture backing"); + } + + // Rasterize cursor bitmap into BGRA pixels. + // Transparent pixels (0) are fully transparent black (alpha=0). + // White fill (1) and black outline (2) are fully opaque (alpha=0xFF). + unsafe { + let pixels = ptr as *mut u32; + for row in 0..h as usize { + for col in 0..w as usize { + let idx = row * w as usize + col; + *pixels.add(idx) = match CURSOR_BITMAP[row][col] { + 1 => 0xFF_FF_FF_FF, // white (B8G8R8A8: BGRA = FF,FF,FF,FF) + 2 => 0xFF_00_00_00, // black with alpha=FF + _ => 0x00_00_00_00, // transparent (alpha=0) + }; + } + } + } + + // Create texture resource (SAMPLER_VIEW only, never used as render target) + with_device_state(|state| { + virgl_resource_create_3d_cmd( + state, + RESOURCE_CURSOR_TEX_ID, + pipe::TEXTURE_2D, + vfmt::B8G8R8A8_UNORM, + pipe::BIND_SAMPLER_VIEW, + w, h, 1, 1, + ) + })?; + + // Attach backing memory + with_device_state(|state| { + virgl_attach_backing_paged(state, RESOURCE_CURSOR_TEX_ID, ptr, 4096) + })?; + + // Attach to VirGL context + with_device_state(|state| { + virgl_ctx_attach_resource_cmd(state, VIRGL_CTX_ID, RESOURCE_CURSOR_TEX_ID) + })?; + + // Prime with TRANSFER_TO_HOST_3D + dma_cache_clean(ptr, size); + with_device_state(|state| { + transfer_to_host_3d(state, RESOURCE_CURSOR_TEX_ID, 0, 0, w, h, w * 4) + })?; + + CURSOR_TEX_READY.store(true, Ordering::Release); + crate::serial_println!("[virgl-cursor] Cursor texture initialized (id={}, {}x{})", + RESOURCE_CURSOR_TEX_ID, w, h); Ok(()) } @@ -711,6 +944,14 @@ fn virt_to_phys(addr: u64) -> u64 { } } +/// Inverse of virt_to_phys: convert a physical address to a kernel-accessible +/// virtual address. Uses physical_memory_offset (HHDM base) when available. +#[inline(always)] +fn phys_to_kern_virt(phys: u64) -> u64 { + let offset = crate::memory::physical_memory_offset().as_u64(); + offset + phys +} + /// Clean (flush) a range of memory from CPU caches to physical RAM. /// /// On ARM64, CPU writes to WB-cacheable BSS memory stay in L1/L2 cache. @@ -2269,96 +2510,6 @@ fn virgl_attach_backing_from_pages( Ok(()) } -/// Base resource ID for per-window VirGL textures. Window slot N → resource (10 + N). -const RESOURCE_WIN_TEX_BASE: u32 = 10; -const MAX_WIN_TEX_SLOTS: usize = 8; - -/// Per-window contiguous backing buffers for VirGL textures. -/// Parallels requires contiguous physical backing for TRANSFER_TO_HOST_3D to work. -/// We allocate a contiguous heap buffer per window and copy MAP_SHARED pixels into it -/// before uploading. -static mut WIN_TEX_BACKING: [(*mut u8, usize); MAX_WIN_TEX_SLOTS] = - [(core::ptr::null_mut(), 0); MAX_WIN_TEX_SLOTS]; - -/// Initialize a VirGL TEXTURE_2D resource for a window buffer. -/// -/// Creates the 3D resource, attaches CONTIGUOUS heap-allocated backing, -/// attaches to the VirGL context, and primes with TRANSFER_TO_HOST_3D. -/// Returns the resource ID on success. -pub fn init_window_texture( - slot_index: usize, - width: u32, - height: u32, - _page_phys_addrs: &[u64], - _total_len: usize, -) -> Result { - - if slot_index >= MAX_WIN_TEX_SLOTS { - return Err("init_window_texture: slot_index out of range"); - } - - let resource_id = RESOURCE_WIN_TEX_BASE + slot_index as u32; - - // Pool was pre-allocated at init time (before first SUBMIT_3D). - // Just verify the slot exists and return the resource ID. - let (existing_ptr, existing_len) = unsafe { WIN_TEX_BACKING[slot_index] }; - if existing_ptr.is_null() || existing_len == 0 { - return Err("init_window_texture: slot not pre-allocated"); - } - - crate::serial_println!( - "[virgl-win] init_window_texture: slot={} using pre-allocated res={} ({}x{}, backing={:#x})", - slot_index, resource_id, width, height, existing_ptr as u64 - ); - Ok(resource_id) -} - -/// Blit window content from MAP_SHARED pages directly into COMPOSITE_TEX at (x, y). -/// This composites window pixels into the single compositor texture, giving correct -/// z-order when called bottom-to-top. The cursor is drawn AFTER this, so it appears on top. -fn blit_window_to_compositor( - win_x: u32, win_y: u32, - win_w: u32, win_h: u32, - page_phys_addrs: &[u64], - tex_w: u32, tex_h: u32, -) { - let phys_offset = crate::memory::physical_memory_offset().as_u64(); - let row_bytes = (win_w as usize) * 4; - let tex_stride = (tex_w as usize) * 4; - let tex_ptr = unsafe { COMPOSITE_TEX_PTR }; - - for row in 0..win_h as usize { - let dst_y = (win_y as usize) + row; - if dst_y >= tex_h as usize { break; } - let dst_x = win_x as usize; - let copy_w = (win_w as usize).min((tex_w as usize).saturating_sub(dst_x)); - if copy_w == 0 { continue; } - let copy_bytes = copy_w * 4; - - let src_offset = row * row_bytes; - let dst_offset = dst_y * tex_stride + dst_x * 4; - - // Copy from scattered pages, handling page boundaries - let mut copied = 0usize; - while copied < copy_bytes { - let linear_pos = src_offset + copied; - let page_idx = linear_pos / 4096; - let page_off = linear_pos % 4096; - if page_idx >= page_phys_addrs.len() { break; } - let chunk = (4096 - page_off).min(copy_bytes - copied); - let src_ptr = (phys_offset + page_phys_addrs[page_idx] + page_off as u64) as *const u8; - unsafe { - core::ptr::copy_nonoverlapping( - src_ptr, - tex_ptr.add(dst_offset + copied), - chunk, - ); - } - copied += chunk; - } - } -} - /// Flush a specific resource to the display (SET_SCANOUT must point at it first). fn resource_flush_3d(state: &mut GpuPciDeviceState, resource_id: u32) -> Result<(), &'static str> { unsafe { @@ -3411,16 +3562,35 @@ pub fn virgl_composite_frame_textured( Ok(()) } -/// Build and submit a single fullscreen textured quad from COMPOSITE_TEX. +/// Render the composited frame: fullscreen background from COMPOSITE_TEX, +/// then per-window content quads from individual GPU textures. /// -/// COMPOSITE_TEX already contains the fully-composited frame: background, window -/// frames/decorations, window content (blitted in z-order), and cursor. -fn virgl_composite_single_quad() -> Result<(), &'static str> { +/// COMPOSITE_TEX contains background, window frames/decorations. +/// Per-window textures contain the actual window content (pixels from clients). +/// The cursor is rendered as a GPU quad from a dedicated cursor texture (last draw). +/// When a window has no GPU texture, its content was already blitted into +/// COMPOSITE_TEX by BWM, so the background quad covers it. +fn virgl_composite_single_quad( + windows: &[crate::syscall::graphics::WindowCompositeInfo], +) -> Result<(), &'static str> { use super::virgl::{CommandBuffer, format as vfmt, pipe, swizzle}; + // BWM frame decoration constants (must match bwm.rs) + const TITLE_BAR_HEIGHT: i32 = 32; + const BORDER_WIDTH: i32 = 2; + + // ── Build canary — detect stale binary deployment ── + static FRAME_COUNT: AtomicU32 = AtomicU32::new(0); + let frame = FRAME_COUNT.fetch_add(1, Ordering::Relaxed); + if frame == 0 { + crate::serial_println!("[BUILD-CANARY] gpu_pci.rs version=8 gpu-cursor-quad"); + } + let tex_w = COMPOSITE_TEX_W.load(Ordering::Relaxed); let tex_h = COMPOSITE_TEX_H.load(Ordering::Relaxed); let (display_w, display_h) = dimensions().ok_or("GPU not initialized")?; + let dw = display_w as f32; + let dh = display_h as f32; let mut cmdbuf = CommandBuffer::new(); cmdbuf.create_sub_ctx(1); @@ -3457,25 +3627,182 @@ fn virgl_composite_single_quad() -> Result<(), &'static str> { cmdbuf.set_min_samples(1); cmdbuf.set_viewport(display_w as f32, display_h as f32); + // ── Create sampler views for all textures upfront ── + // Handle 17: COMPOSITE_TEX (background + frames + decorations) cmdbuf.create_sampler_view(17, RESOURCE_COMPOSITE_TEX_ID, vfmt::B8G8R8X8_UNORM, pipe::TEXTURE_2D, 0, 0, 0, 0, swizzle::IDENTITY); - cmdbuf.set_sampler_views(pipe::SHADER_FRAGMENT, 0, &[17]); + // Handles 40+i: per-window textures + for (i, win) in windows.iter().enumerate() { + if !win.virgl_initialized || win.virgl_resource_id == 0 { continue; } + let sv_handle = 40 + i as u32; + cmdbuf.create_sampler_view(sv_handle, win.virgl_resource_id, vfmt::B8G8R8X8_UNORM, + pipe::TEXTURE_2D, 0, 0, 0, 0, swizzle::IDENTITY); + } let u_max = (tex_w.min(display_w) as f32) / (tex_w as f32); let v_max = (tex_h.min(display_h) as f32) / (tex_h as f32); - let bg_verts: [u32; 32] = [ - (-1.0f32).to_bits(), (1.0f32).to_bits(), 0f32.to_bits(), 1.0f32.to_bits(), - 0f32.to_bits(), 0f32.to_bits(), 0f32.to_bits(), 0f32.to_bits(), - (-1.0f32).to_bits(), (-1.0f32).to_bits(), 0f32.to_bits(), 1.0f32.to_bits(), - 0f32.to_bits(), v_max.to_bits(), 0f32.to_bits(), 0f32.to_bits(), - 1.0f32.to_bits(), (-1.0f32).to_bits(), 0f32.to_bits(), 1.0f32.to_bits(), - u_max.to_bits(), v_max.to_bits(), 0f32.to_bits(), 0f32.to_bits(), - 1.0f32.to_bits(), (1.0f32).to_bits(), 0f32.to_bits(), 1.0f32.to_bits(), - u_max.to_bits(), 0f32.to_bits(), 0f32.to_bits(), 0f32.to_bits(), - ]; - cmdbuf.resource_inline_write(RESOURCE_VB_ID, 0, 128, &bg_verts); - cmdbuf.set_vertex_buffers(&[(32, 0, RESOURCE_VB_ID)]); - cmdbuf.draw_vbo(0, 4, pipe::PRIM_TRIANGLE_FAN, 3); + + // Helper: build a textured quad's 4 vertices (TRIANGLE_FAN) from pixel coords + UV + let make_quad = |px0: f32, py0: f32, px1: f32, py1: f32, + u0: f32, v0: f32, u1: f32, v1: f32| -> [u32; 32] { + let nx0 = px0 / dw * 2.0 - 1.0; + let ny0 = 1.0 - py0 / dh * 2.0; + let nx1 = px1 / dw * 2.0 - 1.0; + let ny1 = 1.0 - py1 / dh * 2.0; + [ + nx0.to_bits(), ny0.to_bits(), 0f32.to_bits(), 1.0f32.to_bits(), + u0.to_bits(), v0.to_bits(), 0f32.to_bits(), 0f32.to_bits(), + nx0.to_bits(), ny1.to_bits(), 0f32.to_bits(), 1.0f32.to_bits(), + u0.to_bits(), v1.to_bits(), 0f32.to_bits(), 0f32.to_bits(), + nx1.to_bits(), ny1.to_bits(), 0f32.to_bits(), 1.0f32.to_bits(), + u1.to_bits(), v1.to_bits(), 0f32.to_bits(), 0f32.to_bits(), + nx1.to_bits(), ny0.to_bits(), 0f32.to_bits(), 1.0f32.to_bits(), + u1.to_bits(), v0.to_bits(), 0f32.to_bits(), 0f32.to_bits(), + ] + }; + + // Helper: emit a textured quad draw (inline write + draw_vbo) + let mut vb_offset: u32 = 0; + let mut draw_idx: u32 = 0; + let emit_quad = |cmdbuf: &mut CommandBuffer, verts: &[u32; 32], + vb_off: &mut u32, di: &mut u32| { + cmdbuf.resource_inline_write(RESOURCE_VB_ID, *vb_off, 128, verts); + cmdbuf.set_vertex_buffers(&[(32, 0, RESOURCE_VB_ID)]); + cmdbuf.draw_vbo(*di, 4, pipe::PRIM_TRIANGLE_FAN, 3); + *vb_off += 128; + *di += 4; + }; + + // ── Draw 0: Fullscreen background quad from COMPOSITE_TEX ── + // Contains background, window frames/decorations, taskbar, appbar. + cmdbuf.set_sampler_views(pipe::SHADER_FRAGMENT, 0, &[17]); + let bg_verts = make_quad(0.0, 0.0, dw, dh, 0.0, 0.0, u_max, v_max); + emit_quad(&mut cmdbuf, &bg_verts, &mut vb_offset, &mut draw_idx); + + // ── Per-window interleaved draws (back to front for correct z-order) ── + // For each window: + // 1. Content quad from per-window texture (covers content area) + // 2. Title bar strip from COMPOSITE_TEX (covers title bar area on top of content) + // 3. Left/right/bottom border strips from COMPOSITE_TEX + // Front windows draw last, naturally covering back windows. + let tw = tex_w as f32; + let th = tex_h as f32; + + for (i, win) in windows.iter().enumerate() { + if !win.virgl_initialized || win.virgl_resource_id == 0 { continue; } + + // Window content position (set by BWM via set_window_position) + let cx = win.x as f32; + let cy = win.y as f32; + let cw = win.width as f32; + let ch = win.height as f32; + + // Frame bounds (content is inset by BORDER_WIDTH left/right/bottom and TITLE_BAR_HEIGHT top) + let fx0 = cx - BORDER_WIDTH as f32; + let fy0 = cy - TITLE_BAR_HEIGHT as f32; + let fx1 = cx + cw + BORDER_WIDTH as f32; + let fy1 = cy + ch + BORDER_WIDTH as f32; + + // 1. Content quad from per-window texture + let slot = (win.virgl_resource_id as usize).saturating_sub(RESOURCE_WIN_TEX_BASE as usize); + let (tex_alloc_w, tex_alloc_h) = if slot < MAX_WIN_TEX_SLOTS { + unsafe { WIN_TEX_DIMS[slot] } + } else { + (win.width, win.height) + }; + let wu = win.width as f32 / tex_alloc_w as f32; + let wv = win.height as f32 / tex_alloc_h as f32; + + let sv_handle = 40 + i as u32; + cmdbuf.set_sampler_views(pipe::SHADER_FRAGMENT, 0, &[sv_handle]); + let content_verts = make_quad(cx, cy, cx + cw, cy + ch, 0.0, 0.0, wu, wv); + emit_quad(&mut cmdbuf, &content_verts, &mut vb_offset, &mut draw_idx); + + // 2. Frame strips from COMPOSITE_TEX (drawn ON TOP of content for z-order) + cmdbuf.set_sampler_views(pipe::SHADER_FRAGMENT, 0, &[17]); + + // Title bar: full width of frame, from frame top to content top + if fy0 < cy { + let tu0 = fx0.max(0.0) / tw; + let tv0 = fy0.max(0.0) / th; + let tu1 = fx1.min(dw) / tw; + let tv1 = cy.min(dh) / th; + let title_verts = make_quad(fx0.max(0.0), fy0.max(0.0), fx1.min(dw), cy.min(dh), + tu0, tv0, tu1, tv1); + emit_quad(&mut cmdbuf, &title_verts, &mut vb_offset, &mut draw_idx); + } + // Left border: from content top to frame bottom + if fx0 < cx { + let lu0 = fx0.max(0.0) / tw; + let lv0 = cy.max(0.0) / th; + let lu1 = cx / tw; + let lv1 = fy1.min(dh) / th; + let left_verts = make_quad(fx0.max(0.0), cy.max(0.0), cx, fy1.min(dh), + lu0, lv0, lu1, lv1); + emit_quad(&mut cmdbuf, &left_verts, &mut vb_offset, &mut draw_idx); + } + // Right border: from content top to frame bottom + if fx1 > cx + cw { + let ru0 = (cx + cw) / tw; + let rv0 = cy.max(0.0) / th; + let ru1 = fx1.min(dw) / tw; + let rv1 = fy1.min(dh) / th; + let right_verts = make_quad(cx + cw, cy.max(0.0), fx1.min(dw), fy1.min(dh), + ru0, rv0, ru1, rv1); + emit_quad(&mut cmdbuf, &right_verts, &mut vb_offset, &mut draw_idx); + } + // Bottom border: between left and right borders + if fy1 > cy + ch { + let bu0 = cx / tw; + let bv0 = (cy + ch) / th; + let bu1 = (cx + cw) / tw; + let bv1 = fy1.min(dh) / th; + let bot_verts = make_quad(cx, cy + ch, cx + cw, fy1.min(dh), + bu0, bv0, bu1, bv1); + emit_quad(&mut cmdbuf, &bot_verts, &mut vb_offset, &mut draw_idx); + } + + if frame < 3 { + crate::serial_println!( + "[GPU-WIN] frame={} win[{}] res={} content=({},{})-({}x{}) frame=({:.0},{:.0})-({:.0},{:.0})", + frame, i, win.virgl_resource_id, win.x, win.y, win.width, win.height, + fx0, fy0, fx1, fy1 + ); + } + } + + // ── Draw cursor as GPU quad (rendered LAST, on top of everything) ── + // The cursor lives in a dedicated GPU texture (RESOURCE_CURSOR_TEX_ID, 12x18, + // B8G8R8A8_UNORM with alpha for transparency). Alpha blending ensures transparent + // pixels don't overwrite the content underneath. + if CURSOR_TEX_READY.load(Ordering::Acquire) { + let (mouse_x, mouse_y) = if crate::drivers::virtio::input_mmio::is_tablet_initialized() { + crate::drivers::virtio::input_mmio::mouse_position() + } else { + crate::drivers::usb::hid::mouse_position() + }; + let mx = mouse_x as f32; + let my = mouse_y as f32; + let cw = CURSOR_TEX_W as f32; + let ch = CURSOR_TEX_H as f32; + + // Only draw if cursor is within display bounds + if mx < dw && my < dh { + // Switch to alpha-blending blend state for cursor transparency + cmdbuf.create_blend_alpha(19); + cmdbuf.bind_object(19, super::virgl::OBJ_BLEND); + + // Create sampler view for cursor texture (B8G8R8A8_UNORM with alpha) + cmdbuf.create_sampler_view(20, RESOURCE_CURSOR_TEX_ID, vfmt::B8G8R8A8_UNORM, + pipe::TEXTURE_2D, 0, 0, 0, 0, swizzle::IDENTITY); + cmdbuf.set_sampler_views(pipe::SHADER_FRAGMENT, 0, &[20]); + + // Cursor quad: position at (mx, my), size = cursor bitmap dimensions + let cursor_verts = make_quad(mx, my, (mx + cw).min(dw), (my + ch).min(dh), + 0.0, 0.0, 1.0, 1.0); + emit_quad(&mut cmdbuf, &cursor_verts, &mut vb_offset, &mut draw_idx); + } + } virgl_submit_sync(cmdbuf.as_slice())?; with_device_state(|state| set_scanout_resource(state, RESOURCE_3D_ID))?; @@ -3584,58 +3911,17 @@ pub fn virgl_composite_windows( } } - // Step 2: Blit window content from MAP_SHARED pages into COMPOSITE_TEX. - // Windows are composited in z-order (bottom first in the array, top last) - // so higher-z windows correctly overwrite lower-z windows where they overlap. - // This must happen BEFORE cursor drawing so the cursor appears on top. - if bg_dirty || any_window_dirty { - for win in windows.iter() { - if win.page_phys_addrs.is_empty() || win.width == 0 || win.height == 0 { - continue; - } - blit_window_to_compositor( - win.x as u32, win.y as u32, - win.width, win.height, - &win.page_phys_addrs, - tex_w, tex_h, - ); - } - } + // Step 2: Window content is blitted by BWM in z-order (with occluded optimization). + // BWM writes directly into COMPOSITE_TEX via MAP_SHARED. No kernel-level blit needed. - // ── Step 3: Cursor rendering ──────────────────────────────────────────── - // Draw the mouse cursor directly into COMPOSITE_TEX so it appears in the - // composited output without requiring a full 4.9MB upload from userspace. - // Track cursor state to erase the old position and detect cursor-only moves. + // ── Step 3: Cursor position tracking ──────────────────────────────────── + // Read mouse position and detect movement for early-out optimization. + // The cursor is rendered as a GPU quad in virgl_composite_single_quad(), + // NOT stamped into COMPOSITE_TEX (which caused ghost trails with per-window textures). use core::sync::atomic::AtomicI32; static CURSOR_PREV_X: AtomicI32 = AtomicI32::new(-1); static CURSOR_PREV_Y: AtomicI32 = AtomicI32::new(-1); - // Arrow cursor bitmap: 1=white, 2=black outline, 0=transparent (12x18) - const CURSOR_W: u32 = 12; - const CURSOR_H: u32 = 18; - const CURSOR_BITMAP: [[u8; 12]; 18] = [ - [2,0,0,0,0,0,0,0,0,0,0,0], - [2,2,0,0,0,0,0,0,0,0,0,0], - [2,1,2,0,0,0,0,0,0,0,0,0], - [2,1,1,2,0,0,0,0,0,0,0,0], - [2,1,1,1,2,0,0,0,0,0,0,0], - [2,1,1,1,1,2,0,0,0,0,0,0], - [2,1,1,1,1,1,2,0,0,0,0,0], - [2,1,1,1,1,1,1,2,0,0,0,0], - [2,1,1,1,1,1,1,1,2,0,0,0], - [2,1,1,1,1,1,1,1,1,2,0,0], - [2,1,1,1,1,1,1,1,1,1,2,0], - [2,1,1,1,1,1,2,2,2,2,2,0], - [2,1,1,1,1,2,0,0,0,0,0,0], - [2,1,1,2,1,1,2,0,0,0,0,0], - [2,1,2,0,2,1,1,2,0,0,0,0], - [2,2,0,0,2,1,1,2,0,0,0,0], - [2,0,0,0,0,2,1,2,0,0,0,0], - [0,0,0,0,0,2,2,0,0,0,0,0], - ]; - // Saved background pixels under the cursor (BGRA packed u32) - static mut CURSOR_SAVED_BG: [u32; 12 * 18] = [0; 12 * 18]; - let (mouse_x, mouse_y) = if crate::drivers::virtio::input_mmio::is_tablet_initialized() { crate::drivers::virtio::input_mmio::mouse_position() } else { @@ -3646,101 +3932,7 @@ pub fn virgl_composite_windows( let prev_cx = CURSOR_PREV_X.load(Ordering::Relaxed); let prev_cy = CURSOR_PREV_Y.load(Ordering::Relaxed); let cursor_moved = cur_x != prev_cx || cur_y != prev_cy; - - // Erase old cursor by restoring background pixels. - // - // With MAP_SHARED (bg_pixels=None), BWM writes directly to COMPOSITE_TEX. - // On full_redraw (dirty_rect=None), BWM fills entire background + blits all windows, - // overwriting the old cursor area — skip erase. - // Otherwise, use saved_bg to restore the old cursor area. - // With non-MAP_SHARED (bg_pixels=Some), use bg_pixels for partial mode. - let full_bg_copy = bg_dirty && dirty_rect.is_none() && bg_pixels.is_some(); - let map_shared_full_redraw = bg_dirty && dirty_rect.is_none() && bg_pixels.is_none(); - if prev_cx >= 0 && prev_cy >= 0 && !full_bg_copy && !map_shared_full_redraw - && (cursor_moved || any_window_dirty) - { - let tex_ptr = unsafe { COMPOSITE_TEX_PTR as *mut u32 }; - let tw = tex_w as usize; - - if bg_dirty && bg_pixels.is_some() { - // Partial mode with bg_pixels: read correct background from BWM's buffer - if let Some(pixels) = bg_pixels { - let src_w = bg_width.min(tex_w) as usize; - for row in 0..CURSOR_H as usize { - let py = prev_cy as usize + row; - if py >= tex_h as usize || py >= (bg_height as usize) { break; } - for col in 0..CURSOR_W as usize { - let px = prev_cx as usize + col; - if px >= tw || px >= src_w { break; } - if CURSOR_BITMAP[row][col] != 0 { - unsafe { - *tex_ptr.add(py * tw + px) = *pixels.as_ptr().add(py * src_w + px); - } - } - } - } - } - } else { - // MAP_SHARED partial or cursor-only: restore from saved_bg. - // BWM already wrote correct content to COMPOSITE_TEX for dirty regions; - // saved_bg captures the pre-cursor state for the cursor area. - for row in 0..CURSOR_H as usize { - let py = prev_cy as usize + row; - if py >= tex_h as usize { break; } - for col in 0..CURSOR_W as usize { - let px = prev_cx as usize + col; - if px >= tw { break; } - if CURSOR_BITMAP[row][col] != 0 { - unsafe { - let saved = CURSOR_SAVED_BG[row * CURSOR_W as usize + col]; - *tex_ptr.add(py * tw + px) = saved; - } - } - } - } - } - } - - // After bg/window blits may have changed pixels under the old cursor, - // re-read if content was re-blitted over the old cursor area - // (bg_dirty or window blit already wrote fresh pixels, so saved_bg is stale — that's fine, - // we just restored stale pixels that got immediately overwritten by the blit above) - - // Save background under new cursor position, then draw cursor - let draw_cursor = cur_x >= 0 && cur_y >= 0 - && (cur_x as u32) < tex_w && (cur_y as u32) < tex_h; - if draw_cursor { - let tex_ptr = unsafe { COMPOSITE_TEX_PTR as *mut u32 }; - let tw = tex_w as usize; - // Save - for row in 0..CURSOR_H as usize { - let py = cur_y as usize + row; - if py >= tex_h as usize { break; } - for col in 0..CURSOR_W as usize { - let px = cur_x as usize + col; - if px >= tw { break; } - if CURSOR_BITMAP[row][col] != 0 { - unsafe { - CURSOR_SAVED_BG[row * CURSOR_W as usize + col] = - *tex_ptr.add(py * tw + px); - } - } - } - } - // Draw - for row in 0..CURSOR_H as usize { - let py = cur_y as usize + row; - if py >= tex_h as usize { break; } - for col in 0..CURSOR_W as usize { - let px = cur_x as usize + col; - if px >= tw { break; } - match CURSOR_BITMAP[row][col] { - 1 => unsafe { *tex_ptr.add(py * tw + px) = 0x00FFFFFF; }, // white (BGRX) - 2 => unsafe { *tex_ptr.add(py * tw + px) = 0x00000000; }, // black - _ => {} - } - } - } + if cursor_moved { CURSOR_PREV_X.store(cur_x, Ordering::Relaxed); CURSOR_PREV_Y.store(cur_y, Ordering::Relaxed); } @@ -3782,15 +3974,6 @@ pub fn virgl_composite_windows( crate::tracing::providers::counters::GPU_PARTIAL_UPLOADS.increment(); crate::tracing::providers::counters::GPU_BYTES_UPLOADED.add((uw as u64) * (uh as u64) * 4); } - // Upload cursor areas: old position (erased) and new position (drawn) - if cursor_moved || any_window_dirty { - if prev_cx >= 0 && prev_cy >= 0 { - upload_rect(prev_cx as u32, prev_cy as u32, CURSOR_W, CURSOR_H)?; - } - if draw_cursor { - upload_rect(cur_x as u32, cur_y as u32, CURSOR_W, CURSOR_H)?; - } - } } else if bg_dirty { // Full background upload dma_cache_clean(unsafe { COMPOSITE_TEX_PTR }, tex_bytes_total); @@ -3799,25 +3982,28 @@ pub fn virgl_composite_windows( })?; crate::tracing::providers::counters::GPU_FULL_UPLOADS.increment(); crate::tracing::providers::counters::GPU_BYTES_UPLOADED.add(tex_bytes_total as u64); - } else { - // Cursor moved and/or windows dirty — upload cursor bounding boxes + dirty windows - if prev_cx >= 0 && prev_cy >= 0 { - upload_rect(prev_cx as u32, prev_cy as u32, CURSOR_W, CURSOR_H)?; - } - // Upload new cursor area - if draw_cursor { - upload_rect(cur_x as u32, cur_y as u32, CURSOR_W, CURSOR_H)?; - } - // Note: kernel does NOT blit client windows from MAP_SHARED pages. - // BWM composites all windows at correct z-order and sends bg_dirty=2 - // with a dirty rect when client content changes. + } + // Note: cursor-only moves (no bg_dirty, no window_dirty) still trigger SUBMIT_3D + // below to redraw the GPU cursor quad at the new position. No COMPOSITE_TEX upload needed. + + // ========================================================================= + // Phase A2: Upload per-window GPU textures + // Per-window textures pre-allocated at init, TRANSFER_TO_HOST_3D proven working. + // Uploads dirty window content from MAP_SHARED pages to GPU textures. + // ========================================================================= + for win in windows.iter() { + if !win.virgl_initialized || !win.dirty { continue; } + if win.page_phys_addrs.is_empty() { continue; } + let slot = (win.virgl_resource_id as usize).saturating_sub(RESOURCE_WIN_TEX_BASE as usize); + if slot >= MAX_WIN_TEX_SLOTS { continue; } + let _ = upload_window_texture(slot, win.width, win.height, &win.page_phys_addrs, win.size); } // ========================================================================= - // Phase B+C: Single fullscreen SUBMIT_3D quad + display + // Phase B+C: GPU compositing + display // ========================================================================= - // Window content was already blitted into COMPOSITE_TEX in z-order (step 2), - // so a single textured quad correctly displays everything including cursor. + // Background + decorations from COMPOSITE_TEX, per-window content from + // individual GPU textures, all in one SUBMIT_3D batch. // Perf: timestamp before display phase #[cfg(target_arch = "aarch64")] @@ -3827,7 +4013,7 @@ pub fn virgl_composite_windows( v }; - virgl_composite_single_quad()?; + virgl_composite_single_quad(windows)?; // Perf: end of frame #[cfg(target_arch = "aarch64")] diff --git a/kernel/src/drivers/virtio/virgl.rs b/kernel/src/drivers/virtio/virgl.rs index 73435e2f..c7293b19 100644 --- a/kernel/src/drivers/virtio/virgl.rs +++ b/kernel/src/drivers/virtio/virgl.rs @@ -266,6 +266,30 @@ impl CommandBuffer { } } + /// Create a blend state with SRC_ALPHA / INV_SRC_ALPHA alpha blending. + /// + /// Used for rendering quads with per-pixel transparency (e.g., cursor texture + /// where alpha=0 means transparent and alpha=0xFF means opaque). + pub fn create_blend_alpha(&mut self, handle: u32) { + // S2[0] encoding (per virgl_hw.h): + // bit 0: blend_enable = 1 + // bits 1-3: rgb_func = PIPE_BLEND_ADD (0) + // bits 4-8: rgb_src_factor = PIPE_BLENDFACTOR_SRC_ALPHA (0x03) + // bits 9-13: rgb_dst_factor = PIPE_BLENDFACTOR_INV_SRC_ALPHA (0x13) + // bits 14-16: alpha_func = PIPE_BLEND_ADD (0) + // bits 17-21: alpha_src_factor = PIPE_BLENDFACTOR_ONE (0x01) + // bits 22-26: alpha_dst_factor = PIPE_BLENDFACTOR_ZERO (0x11) + // bits 27-30: colormask = 0xF (write RGBA) + self.push(Self::cmd0(ccmd::CREATE_OBJECT, obj::BLEND, 11)); + self.push(handle); + self.push(0x00000004); // S0: dither enabled + self.push(0); // S1: logicop_func = 0 + self.push(0x7C42_2631); // S2[0]: alpha blend enabled + for _ in 0..7 { + self.push(0); + } + } + /// Create a depth-stencil-alpha state matching Mesa exactly. /// Mesa sends DSA with S0=0x00000000, length=5. pub fn create_dsa_default(&mut self, handle: u32) { diff --git a/kernel/src/syscall/graphics.rs b/kernel/src/syscall/graphics.rs index af99cf6b..54bda828 100644 --- a/kernel/src/syscall/graphics.rs +++ b/kernel/src/syscall/graphics.rs @@ -663,54 +663,33 @@ fn handle_virgl_op(cmd: &FbDrawCmd) -> SyscallResult { } else { &[] }; - // Extract window info under lock, then drop lock before VirGL init - let win_info = { + let registered = { let mut reg = WINDOW_REGISTRY.lock(); - // Find slot index first (immutable borrow) - let slot_idx = reg.buffers.iter().position(|s| { - s.as_ref().map_or(false, |b| b.id == buffer_id) - }); - match (slot_idx, reg.find_mut(buffer_id)) { - (Some(idx), Some(buf)) => { + match reg.find_mut(buffer_id) { + Some(buf) => { buf.registered = true; buf.title_len = title.len().min(MAX_TITLE_LEN); buf.title[..buf.title_len].copy_from_slice(&title[..buf.title_len]); - Some((idx, buf.width, buf.height, buf.page_phys_addrs.clone(), buf.size)) + true } - _ => None, + None => false, } }; - match win_info { - Some((slot_idx, w, h, pages, size)) => { - // Initialize VirGL texture for this window (outside registry lock) - if crate::drivers::virtio::gpu_pci::is_virgl_enabled() { - match crate::drivers::virtio::gpu_pci::init_window_texture(slot_idx, w, h, &pages, size) { - Ok(res_id) => { - let mut reg = WINDOW_REGISTRY.lock(); - if let Some(buf) = reg.find_mut(buffer_id) { - buf.virgl_resource_id = res_id; - buf.virgl_initialized = true; - } - } - Err(e) => { - crate::serial_println!("[window] VirGL texture init failed for buffer {}: {}", buffer_id, e); - } - } - } - // Bump registry generation + wake compositor so it discovers the new window - #[cfg(target_arch = "aarch64")] - { - REGISTRY_GENERATION.fetch_add(1, core::sync::atomic::Ordering::Relaxed); - let compositor_tid = COMPOSITOR_WAITING_THREAD.load(core::sync::atomic::Ordering::Acquire); - if compositor_tid != 0 { - crate::task::scheduler::with_scheduler(|sched| { - sched.unblock(compositor_tid); - }); - } + if registered { + // Bump registry generation + wake compositor so it discovers the new window + #[cfg(target_arch = "aarch64")] + { + REGISTRY_GENERATION.fetch_add(1, core::sync::atomic::Ordering::Relaxed); + let compositor_tid = COMPOSITOR_WAITING_THREAD.load(core::sync::atomic::Ordering::Acquire); + if compositor_tid != 0 { + crate::task::scheduler::with_scheduler(|sched| { + sched.unblock(compositor_tid); + }); } - SyscallResult::Ok(0) } - None => SyscallResult::Err(super::ErrorCode::InvalidArgument as u64), + SyscallResult::Ok(0) + } else { + SyscallResult::Err(super::ErrorCode::InvalidArgument as u64) } } 13 => { @@ -1330,30 +1309,30 @@ fn handle_composite_windows(desc_ptr: u64) -> SyscallResult { if !buf.registered { continue; } if buf.width == 0 || buf.height == 0 { continue; } - // Lazy VirGL texture init: create per-window GPU texture on first composite + let dirty = buf.generation > buf.last_uploaded_gen; + + // Lazy-init per-window GPU texture on first composite if !buf.virgl_initialized && !buf.page_phys_addrs.is_empty() && matches!(crate::graphics::compositor_backend(), crate::graphics::CompositorBackend::VirGL) { - let slot_idx = (buf.id as usize).saturating_sub(1) % 16; - match crate::drivers::virtio::gpu_pci::init_window_texture( - slot_idx, buf.width, buf.height, &buf.page_phys_addrs, buf.size + let slot_idx = (buf.id as usize).saturating_sub(1) % 8; + match crate::drivers::virtio::gpu_pci::create_window_texture( + slot_idx, buf.width, buf.height, ) { Ok(res_id) => { buf.virgl_resource_id = res_id; buf.virgl_initialized = true; - crate::serial_println!("[composite] Window {} got VirGL texture (res={})", - buf.id, res_id); } Err(e) => { - crate::serial_println!("[composite] Window {} texture init failed: {}", - buf.id, e); + crate::serial_println!( + "[composite] GPU texture init failed for buf {}: {}", + buf.id, e + ); } } } - let dirty = buf.generation > buf.last_uploaded_gen; - result.push(WindowCompositeInfo { virgl_resource_id: buf.virgl_resource_id, virgl_initialized: buf.virgl_initialized, diff --git a/userspace/programs/src/bwm.rs b/userspace/programs/src/bwm.rs index f3b6628c..37d793a9 100644 --- a/userspace/programs/src/bwm.rs +++ b/userspace/programs/src/bwm.rs @@ -187,16 +187,6 @@ struct Window { minimized: bool, /// Stable ordering for appbar (assigned at discovery time, never changes) creation_order: u32, - /// Direct-mapped pointer to client window's pixel buffer (read-only, MAP_SHARED) - /// Stored for future per-window direct blit (currently compositor uses bulk composite). - #[allow(dead_code)] - mapped_ptr: *const u32, - /// Client window buffer width (from map_window_buffer) - #[allow(dead_code)] - mapped_w: u32, - /// Client window buffer height (from map_window_buffer) - #[allow(dead_code)] - mapped_h: u32, } impl Window { @@ -614,15 +604,6 @@ fn discover_windows(windows: &mut Vec, screen_w: usize, screen_h: usize, core::str::from_utf8(&title[..title_len]).unwrap_or("?"), info.buffer_id, info.width, info.height, cascade_x, cascade_y); - // Map client window buffer into our address space for zero-copy reads - let (map_ptr, map_w, map_h) = match graphics::map_window_buffer(info.buffer_id) { - Ok(result) => result, - Err(_) => { - print!("[bwm] WARNING: failed to map window {} buffer\n", info.buffer_id); - (core::ptr::null(), 0, 0) - } - }; - // Tell kernel where the client content goes on screen (for GPU compositing) let content_x = cascade_x + BORDER_WIDTH as i32; let content_y = cascade_y + TITLE_BAR_HEIGHT as i32 + BORDER_WIDTH as i32; @@ -636,7 +617,6 @@ fn discover_windows(windows: &mut Vec, screen_w: usize, screen_h: usize, owner_pid: info.owner_pid, minimized: false, creation_order: order, - mapped_ptr: map_ptr, mapped_w: map_w, mapped_h: map_h, }); added = true; } @@ -652,7 +632,7 @@ fn redraw_all_windows(fb: &mut FrameBuf, windows: &[Window], focused_win: usize, for i in 0..windows.len() { if windows[i].minimized { continue; } draw_window_frame(fb, &windows[i], i == focused_win); - // GPU compositing handles client content — don't blit here + // Window content rendered by GPU from per-window textures — no CPU blit needed } draw_appbar(fb, windows, focused_win); } @@ -709,8 +689,7 @@ fn compose_partial_redraw( let end = row * screen_w + dx1; sbuf[start..end].copy_from_slice(&bg[start..end]); } - // 2. Redraw UI elements that intersect dirty region - // GPU compositing handles client content — only draw frames/decorations + // 2. Redraw UI elements (frames only — content rendered by GPU) if dy0 < TASKBAR_HEIGHT { draw_taskbar(sfb, clock); } @@ -733,8 +712,7 @@ fn compose_partial_redraw( vram[start..end].copy_from_slice(&sbuf[start..end]); } } else { - // Non-shadow path: restore bg region, redraw affected windows - // GPU compositing handles client content — only draw frames/decorations + // Non-shadow path: restore bg region, redraw affected windows (frames only) for row in dy0..dy1 { let start = row * screen_w + dx0; let end = row * screen_w + dx1; @@ -861,6 +839,7 @@ fn main() { let mut dragging: Option<(usize, i32, i32)> = None; let mut full_redraw = true; let mut content_dirty = false; + let mut windows_dirty = false; // Clock state let mut last_clock_sec: i64 = -1; @@ -1134,22 +1113,15 @@ fn main() { } } - // ── 5. GPU compositing handles window content — just check which are dirty ── - // Skip entirely if compositor_wait didn't report dirty content + // ── 5. Window content handled by GPU ── + // Per-window textures are uploaded by the kernel directly from MAP_SHARED + // pages and composited via VirGL SUBMIT_3D with z-order interleaved + // frame strips. No CPU blit needed. Mark windows_dirty so the composite + // syscall triggers per-window GPU upload + render WITHOUT re-uploading + // the full COMPOSITE_TEX (which contains only frames/decorations). if ready & graphics::COMPOSITOR_READY_DIRTY != 0 { - for i in 0..windows.len().min(16) { - if windows[i].window_id != 0 && !windows[i].minimized { - if graphics::check_window_dirty(windows[i].window_id).unwrap_or(false) { - content_dirty = true; - let (bx0, by0, bx1, by1) = windows[i].bounds(); - dirty_x0 = dirty_x0.min(bx0); - dirty_y0 = dirty_y0.min(by0); - dirty_x1 = dirty_x1.max(bx1); - dirty_y1 = dirty_y1.max(by1); - } - } - } - } // end if DIRTY + windows_dirty = true; + } // ── 5b. Update clock (once per second) ── if let Ok(ts) = libbreenix::time::now_realtime() { @@ -1179,6 +1151,7 @@ fn main() { ); full_redraw = false; content_dirty = false; + windows_dirty = false; } else if content_dirty { let sw = screen_w as i32; let sh = screen_h as i32; @@ -1191,12 +1164,16 @@ fn main() { 2, dx, dy, dw, dh, ); content_dirty = false; - } else if mouse_moved_this_frame { - // Mouse-only update: no content changed, but kernel draws cursor + windows_dirty = false; + } else if windows_dirty || mouse_moved_this_frame { + // Window content and/or mouse-only update: no COMPOSITE_TEX change, + // but kernel uploads per-window textures and draws cursor via SUBMIT_3D. + // dirty_mode=0 tells kernel bg_dirty=false → skip COMPOSITE_TEX upload. let _ = graphics::virgl_composite_windows_rect( cbuf, cw, ch, 0, 0, 0, 0, 0, ); + windows_dirty = false; } // No sleep — compositor_wait handles blocking