vello_hybrid: Avoid rendering into an intermediate framebuffer#1546
vello_hybrid: Avoid rendering into an intermediate framebuffer#1546
Conversation
|
|
||
| // WebGL framebuffers are y-up, so we need to invert the rows. | ||
| let row_bytes = width as usize * 4; | ||
| let height = height as usize; | ||
| for y in 0..height / 2 { | ||
| let (a, b) = pixels.split_at_mut((height - 1 - y) * row_bytes); | ||
| a[y * row_bytes..(y + 1) * row_bytes].swap_with_slice(&mut b[..row_bytes]); | ||
| } | ||
|
|
There was a problem hiding this comment.
From my understanding, this code would have been needed in the first place, and the fact we didn't need it is purely accidental. Here, we were setting READ_FRAMEBUFFER to the view framebuffer:
vello/sparse_strips/vello_hybrid/src/render/webgl.rs
Lines 193 to 195 in 5796226
But we were never setting it to the final blitted framebuffer. Therefore, the tests were reading the non-flipped version, and when reading it back to the CPU pixmap it therefore wasn't necessary to flip it again.
|
Before I dive into review, what's the level of effort to support build time shader flags so we can avoid the runtime branching? (E.g. using wesl) I suspect build time flags may prove useful for other endeavours too (e.g. debug info, only supporting bilinear sampling to avoid low/med/high branching etc) |
|
I’m not sure because I’ve never looked into it, but it seems to me like this is something orthogonal to this PR, no? I don’t think a single select in the vertex shader and blending/layer branch is gonna kill performance for us. 😄 But if you want to I can certainly look into it. |
Definitely not going to kill performance. But build time flags reduce complexity in the driver code ( BUT, I just took a closer look at the code and this isn't a build-time flag. It varies at runtime. In the current code, if you're rendering to a slot texture, we don't invert NDC. If you're rendering to the viewport, do you invert NDC. So, because this varies at runtime, it should remain a runtime flag. I'll take a closer look at the code soon. |
taj-p
left a comment
There was a problem hiding this comment.
Nice looks good! So good to save this huge copy and allocation. Some comments 🙏
| /// See [`Config::ndc_y_negate`]. | ||
| pub ndc_y_negate: u32, |
There was a problem hiding this comment.
This field is never read in the shader. I recommend keeping it as _padding
| max_texture_dimension_2d: u32, | ||
| new_render_size: &RenderSize, | ||
| ) { | ||
| if self.render_size != *new_render_size { |
There was a problem hiding this comment.
This means that we only update the config when the render size has changed - that assumes the viewport and the atlas texture are not the same size. If they are the same size, we'll never update negate_ndc
There was a problem hiding this comment.
Oh damn, very sorry for missing that. 🙏 Will fix this.
| slot_height: u32::from(Tile::HEIGHT), | ||
| texture_height: u32::from(Tile::HEIGHT) * total_slots, | ||
| _padding: 0, | ||
| ndc_y_negate: 0, |
There was a problem hiding this comment.
| ndc_y_negate: 0, | |
| _padding: 0, |
ndc_y_negate isn't read in clear_slots.wgsl
| /// coordinate of WebGPU and the y-up coordinate system of WebGL framebuffers. | ||
| /// | ||
| /// However, for the native WebGL backend, we want to _avoid_ this transform so that we | ||
| /// can write directly into the user-provided framebuffer, but still ensure that the scene |
There was a problem hiding this comment.
Might be worth mentioning how, without this optimisation, we need to allocate an extra frame buffer, render into that, and then pay a blit cost from that texture to the screen.
|
|
||
| let gl = canvas | ||
| .get_context_with_context_options("webgl2", &context_options) | ||
| .get_context("webgl2") |
There was a problem hiding this comment.
I think we want to keep the AA false here.
- The sparse strips pipeline already does per pixel AA
- A 4x MSAA default framebuffer uses 4x more memory
- Unsure how double AA will interact
- Are we getting any value from enabling AA here?
I think to keep this PR contained, let's not consider this for now and keep the default of AA disabled. (Perhaps update the comment that AA is disabled for performance, memory, and because we handle our own AA)
Confirmed that this gives a good speed boost on my Samsung Galaxy Tab A6 for low-complexity scenes! I also ran the
native_webglandwgpu_webglexamples and they seem to run fine.In my view, there were two ways of solving this problem:
Scenewhen rendering with native WebGL. While this would require less code changes, it's IMO pretty ugly because it's not something that the user should have to handle manually.