Skip to content

vello_hybrid: Avoid rendering into an intermediate framebuffer#1546

Open
LaurenzV wants to merge 1 commit intomainfrom
laurenz/avoid
Open

vello_hybrid: Avoid rendering into an intermediate framebuffer#1546
LaurenzV wants to merge 1 commit intomainfrom
laurenz/avoid

Conversation

@LaurenzV
Copy link
Copy Markdown
Collaborator

@LaurenzV LaurenzV commented Mar 27, 2026

Confirmed that this gives a good speed boost on my Samsung Galaxy Tab A6 for low-complexity scenes! I also ran the native_webgl and wgpu_webgl examples and they seem to run fine.

In my view, there were two ways of solving this problem:

  • Forcing the user to provide a root transform on Scene when rendering with native WebGL. While this would require less code changes, it's IMO pretty ugly because it's not something that the user should have to handle manually.
  • Basically what this PR implements, while it requires a bit more code changes, it completely abstracts it away from the user, which IMO is much better!

Comment on lines +824 to +832

// WebGL framebuffers are y-up, so we need to invert the rows.
let row_bytes = width as usize * 4;
let height = height as usize;
for y in 0..height / 2 {
let (a, b) = pixels.split_at_mut((height - 1 - y) * row_bytes);
a[y * row_bytes..(y + 1) * row_bytes].swap_with_slice(&mut b[..row_bytes]);
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, this code would have been needed in the first place, and the fact we didn't need it is purely accidental. Here, we were setting READ_FRAMEBUFFER to the view framebuffer:

self.gl.bind_framebuffer(
WebGl2RenderingContext::READ_FRAMEBUFFER,
Some(&self.programs.resources.view_framebuffer),

But we were never setting it to the final blitted framebuffer. Therefore, the tests were reading the non-flipped version, and when reading it back to the CPU pixmap it therefore wasn't necessary to flip it again.

@taj-p
Copy link
Copy Markdown
Contributor

taj-p commented Mar 27, 2026

Before I dive into review, what's the level of effort to support build time shader flags so we can avoid the runtime branching? (E.g. using wesl)

I suspect build time flags may prove useful for other endeavours too (e.g. debug info, only supporting bilinear sampling to avoid low/med/high branching etc)

@LaurenzV
Copy link
Copy Markdown
Collaborator Author

I’m not sure because I’ve never looked into it, but it seems to me like this is something orthogonal to this PR, no? I don’t think a single select in the vertex shader and blending/layer branch is gonna kill performance for us. 😄 But if you want to I can certainly look into it.

@taj-p
Copy link
Copy Markdown
Contributor

taj-p commented Mar 28, 2026

I’m not sure because I’ve never looked into it, but it seems to me like this is something orthogonal to this PR, no? I don’t think a single select in the vertex shader and blending/layer branch is gonna kill performance for us. 😄 But if you want to I can certainly look into it.

Definitely not going to kill performance. But build time flags reduce complexity in the driver code (webgl.rs) and are likely the better way to handle build-time shader configuration (for maintenance, future evolution, etc).

BUT, I just took a closer look at the code and this isn't a build-time flag. It varies at runtime. In the current code, if you're rendering to a slot texture, we don't invert NDC. If you're rendering to the viewport, do you invert NDC. So, because this varies at runtime, it should remain a runtime flag.

I'll take a closer look at the code soon.

Copy link
Copy Markdown
Contributor

@taj-p taj-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice looks good! So good to save this huge copy and allocation. Some comments 🙏

Comment on lines +731 to +732
/// See [`Config::ndc_y_negate`].
pub ndc_y_negate: u32,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is never read in the shader. I recommend keeping it as _padding

max_texture_dimension_2d: u32,
new_render_size: &RenderSize,
) {
if self.render_size != *new_render_size {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that we only update the config when the render size has changed - that assumes the viewport and the atlas texture are not the same size. If they are the same size, we'll never update negate_ndc

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh damn, very sorry for missing that. 🙏 Will fix this.

slot_height: u32::from(Tile::HEIGHT),
texture_height: u32::from(Tile::HEIGHT) * total_slots,
_padding: 0,
ndc_y_negate: 0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ndc_y_negate: 0,
_padding: 0,

ndc_y_negate isn't read in clear_slots.wgsl

/// coordinate of WebGPU and the y-up coordinate system of WebGL framebuffers.
///
/// However, for the native WebGL backend, we want to _avoid_ this transform so that we
/// can write directly into the user-provided framebuffer, but still ensure that the scene
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth mentioning how, without this optimisation, we need to allocate an extra frame buffer, render into that, and then pay a blit cost from that texture to the screen.


let gl = canvas
.get_context_with_context_options("webgl2", &context_options)
.get_context("webgl2")
Copy link
Copy Markdown
Contributor

@taj-p taj-p Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to keep the AA false here.

  • The sparse strips pipeline already does per pixel AA
  • A 4x MSAA default framebuffer uses 4x more memory
  • Unsure how double AA will interact
  • Are we getting any value from enabling AA here?

I think to keep this PR contained, let's not consider this for now and keep the default of AA disabled. (Perhaps update the comment that AA is disabled for performance, memory, and because we handle our own AA)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants