vello_hybrid: Avoid rendering into an intermediate framebuffer by LaurenzV · Pull Request #1546 · linebender/vello

LaurenzV · 2026-03-27T14:40:27Z

Confirmed that this gives a good speed boost on my Samsung Galaxy Tab A6 for low-complexity scenes! I also ran the native_webgl and wgpu_webgl examples and they seem to run fine.

In my view, there were two ways of solving this problem:

Forcing the user to provide a root transform on Scene when rendering with native WebGL. While this would require less code changes, it's IMO pretty ugly because it's not something that the user should have to handle manually.
Basically what this PR implements, while it requires a bit more code changes, it completely abstracts it away from the user, which IMO is much better!

LaurenzV · 2026-03-27T14:42:17Z

sparse_strips/vello_sparse_tests/tests/renderer.rs

+
+        // WebGL framebuffers are y-up, so we need to invert the rows.
+        let row_bytes = width as usize * 4;
+        let height = height as usize;
+        for y in 0..height / 2 {
+            let (a, b) = pixels.split_at_mut((height - 1 - y) * row_bytes);
+            a[y * row_bytes..(y + 1) * row_bytes].swap_with_slice(&mut b[..row_bytes]);
+        }
+


From my understanding, this code would have been needed in the first place, and the fact we didn't need it is purely accidental. Here, we were setting READ_FRAMEBUFFER to the view framebuffer:

vello/sparse_strips/vello_hybrid/src/render/webgl.rs

Lines 193 to 195 in 5796226

self.gl.bind_framebuffer(

WebGl2RenderingContext::READ_FRAMEBUFFER,

Some(&self.programs.resources.view_framebuffer),

But we were never setting it to the final blitted framebuffer. Therefore, the tests were reading the non-flipped version, and when reading it back to the CPU pixmap it therefore wasn't necessary to flip it again.

taj-p · 2026-03-27T23:59:17Z

Before I dive into review, what's the level of effort to support build time shader flags so we can avoid the runtime branching? (E.g. using wesl)

I suspect build time flags may prove useful for other endeavours too (e.g. debug info, only supporting bilinear sampling to avoid low/med/high branching etc)

LaurenzV · 2026-03-28T06:40:31Z

I’m not sure because I’ve never looked into it, but it seems to me like this is something orthogonal to this PR, no? I don’t think a single select in the vertex shader and blending/layer branch is gonna kill performance for us. 😄 But if you want to I can certainly look into it.

taj-p · 2026-03-28T18:30:34Z

I’m not sure because I’ve never looked into it, but it seems to me like this is something orthogonal to this PR, no? I don’t think a single select in the vertex shader and blending/layer branch is gonna kill performance for us. 😄 But if you want to I can certainly look into it.

Definitely not going to kill performance. But build time flags reduce complexity in the driver code (webgl.rs) and are likely the better way to handle build-time shader configuration (for maintenance, future evolution, etc).

BUT, I just took a closer look at the code and this isn't a build-time flag. It varies at runtime. In the current code, if you're rendering to a slot texture, we don't invert NDC. If you're rendering to the viewport, do you invert NDC. So, because this varies at runtime, it should remain a runtime flag.

I'll take a closer look at the code soon.

taj-p

Nice looks good! So good to save this huge copy and allocation. Some comments 🙏

taj-p · 2026-03-28T18:36:49Z

sparse_strips/vello_hybrid/src/render/webgl.rs

+    /// See [`Config::ndc_y_negate`].
+    pub ndc_y_negate: u32,


This field is never read in the shader. I recommend keeping it as _padding

taj-p · 2026-03-28T18:39:34Z

sparse_strips/vello_hybrid/src/render/webgl.rs

        max_texture_dimension_2d: u32,
        new_render_size: &RenderSize,
    ) {
        if self.render_size != *new_render_size {


This means that we only update the config when the render size has changed - that assumes the viewport and the atlas texture are not the same size. If they are the same size, we'll never update negate_ndc

Oh damn, very sorry for missing that. 🙏 Will fix this.

taj-p · 2026-03-28T18:43:36Z

sparse_strips/vello_hybrid/src/render/webgl.rs

                    slot_height: u32::from(Tile::HEIGHT),
                    texture_height: u32::from(Tile::HEIGHT) * total_slots,
-                    _padding: 0,
+                    ndc_y_negate: 0,


Suggested change

ndc_y_negate: 0,

_padding: 0,

ndc_y_negate isn't read in clear_slots.wgsl

taj-p · 2026-03-28T19:04:10Z

sparse_strips/vello_hybrid/src/render/common.rs

+    /// coordinate of WebGPU and the y-up coordinate system of WebGL framebuffers.
+    ///
+    /// However, for the native WebGL backend, we want to _avoid_ this transform so that we
+    /// can write directly into the user-provided framebuffer, but still ensure that the scene


Might be worth mentioning how, without this optimisation, we need to allocate an extra frame buffer, render into that, and then pay a blit cost from that texture to the screen.

taj-p · 2026-03-28T19:13:26Z

sparse_strips/vello_hybrid/src/render/webgl.rs

-
        let gl = canvas
-            .get_context_with_context_options("webgl2", &context_options)
+            .get_context("webgl2")


I think we want to keep the AA false here.

The sparse strips pipeline already does per pixel AA

A 4x MSAA default framebuffer uses 4x more memory

Unsure how double AA will interact

Are we getting any value from enabling AA here?

I think to keep this PR contained, let's not consider this for now and keep the default of AA disabled. (Perhaps update the comment that AA is disabled for performance, memory, and because we handle our own AA)

vello_hybrid: Avoid rendering into an intermediate framebuffer

90edecd

LaurenzV requested review from ajakubowicz-canva and taj-p March 27, 2026 14:40

LaurenzV commented Mar 27, 2026

View reviewed changes

taj-p reviewed Mar 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vello_hybrid: Avoid rendering into an intermediate framebuffer#1546

vello_hybrid: Avoid rendering into an intermediate framebuffer#1546
LaurenzV wants to merge 1 commit intomainfrom
laurenz/avoid

LaurenzV commented Mar 27, 2026 •

edited

Loading

Uh oh!

LaurenzV Mar 27, 2026

Uh oh!

taj-p commented Mar 27, 2026 •

edited

Loading

Uh oh!

LaurenzV commented Mar 28, 2026

Uh oh!

taj-p commented Mar 28, 2026

Uh oh!

taj-p left a comment

Uh oh!

taj-p Mar 28, 2026

Uh oh!

taj-p Mar 28, 2026

Uh oh!

LaurenzV Mar 28, 2026

Uh oh!

taj-p Mar 28, 2026

Uh oh!

taj-p Mar 28, 2026

Uh oh!

taj-p Mar 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	self.gl.bind_framebuffer(
	WebGl2RenderingContext::READ_FRAMEBUFFER,
	Some(&self.programs.resources.view_framebuffer),

Conversation

LaurenzV commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LaurenzV Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

taj-p commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LaurenzV commented Mar 28, 2026

Uh oh!

taj-p commented Mar 28, 2026

Uh oh!

taj-p left a comment

Choose a reason for hiding this comment

Uh oh!

taj-p Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

taj-p Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

LaurenzV Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

taj-p Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

taj-p Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

taj-p Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LaurenzV commented Mar 27, 2026 •

edited

Loading

taj-p commented Mar 27, 2026 •

edited

Loading

taj-p Mar 28, 2026 •

edited

Loading