Skip to content

Fix the context lifecycle#339

Open
Cyberax wants to merge 3 commits intowcandillon:mainfrom
Cyberax:feat/resizing-works
Open

Fix the context lifecycle#339
Cyberax wants to merge 3 commits intowcandillon:mainfrom
Cyberax:feat/resizing-works

Conversation

@Cyberax
Copy link
Copy Markdown
Contributor

@Cyberax Cyberax commented Mar 31, 2026

Sorry for the big PR. This is still a bit of work-in-progress, but it's now in a state where it doesn't crash my Android emulator and renders all the examples correctly.

The handling of onscreen/offscreen transitions was broken in all kinds of interesting ways:

  1. On Android, the rendering surface can go away at any point from the
    point of view of the JavaScript thread. There's no way to control it.
  2. On iOS/macOS the surfaces never go away, but we can't resize them
    outside the main thread.

The previous commits add an example of a virtualized list that exercises
the surface lifecycle quite vigorously.

Oh, and to add to the fun: Dawn WebGPU device is not threadsafe. So it
easily crashes even if all we want to do is copy an offscreen texture
to the screen, while the JavaScript thread is rendering.

This commit adds a GPU-level lock to all NativeObject-based wrappers,
so it is automatically taken on any operation. And the UI-thread based
logic takes this lock manually. This serializes all the GPU operations.

And because the iOS and Android implementations have diverged quite a
bit, the SurfaceInfo is now an abstract class SurfaceBridge that has
platform-specific implementations.

Cyberax added 3 commits March 26, 2026 10:00
This fixes the overlapping 3-button navigation toolbar on Android.
Well, this was not a small one-line fix. The handling of
onscreen/offscreen transitions was broken in all kinds of interesting
ways:

1. On Android, the rendering surface can go away at any point from the
   point of view of the JavaScript thread. There's no way to control it.
2. On iOS/macOS the surfaces never go away, but  we can't resize them
   outside the main thread.

The previous commits add an example of a virtualized list that exercises
the surface lifecycle quite vigorously.

Oh, and to add to the fun: Dawn WebGPU device is not threadsafe. So it
easily crashes even if all we want to do is copy an offscreen texture
to the screen, while the JavaScript thread is rendering.

This commit adds a GPU-level lock to all `NativeObject`-based wrappers,
so it is automatically taken on any operation. And the UI-thread based
logic takes this lock manually. This serializes all the GPU operations.

And because the iOS and Android implementations have diverged quite a
bit, the `SurfaceInfo` is now an abstract class `SurfaceBridge` that has
platform-specific implementations.
@wcandillon
Copy link
Copy Markdown
Owner

Thanks a lot for making this PR. I was thinking about redesign the current model to work properly. I need to think about it. I think we can do a single threaded model with an API à la transferControlToOffscreen like on the Web. And make the resource "transferable" (e.g only available one by one thread at a time). The issue now is by default people write by default WebGPU on the JS thread (which works on the web because it is the UI thread but in our case it is not). I need to think about it a little more, hopefully we can find a good design there.

I also need to check how to transfer textures from one thread/device to another. this would be how maybe we go from the JS thread canvas (seemingly offscreen) to the onscreen UI thread canvas.

Do I have the right intuition there? I'm excited for us to fix this.

@Cyberax
Copy link
Copy Markdown
Contributor Author

Cyberax commented Mar 31, 2026

Thanks a lot for making this PR. I was thinking about redesign the current model to work properly. I need to think about it. I think we can do a single threaded model with an API à la transferControlToOffscreen like on the Web. And make the resource "transferable" (e.g only available one by one thread at a time).

I don't think this will work with JavaScript and complicated scenes (e.g. anything with ThreeJS). You'll need to serialize everything to the UI thread. That was my initial plan, but our code uses WGPU for complex 2D renderings and serializing them is a pain.

But... Hmm... What if we made the device itself be "borrowable"? To get it, you "lock" it to borrow the device from the main thread. Then you do rendering and put it back. This can be represented as a closure, like the Web Locks ( https://developer.mozilla.org/en-US/docs/Web/API/Web_Locks_API ) to cope with exceptions.

And this nicely integrates with worklets.

If we want to preserve the current semantics, then in my patch, Android now always renders into an offscreen texture buffer. I don't think there's any other choice for robust Android apps. But the upside is that the JS thread does not need to concern itself with the synchronization. The native surface then picks up the last presented texture when the view becomes attached to a window and gets the hardware surface.

This works reasonably well and was easy to implement. The downside is that it requires the overhead of doing one texture-to-texture copy and one additional back buffer for the whole window.

I went a little bit off the deep end with iOS to implement zero-overhead rendering. I'm not sure it's worth it...

@wcandillon
Copy link
Copy Markdown
Owner

is there a simple example that we could add to the repo that makes it really easy to show the crash issue?

@Cyberax
Copy link
Copy Markdown
Contributor Author

Cyberax commented Mar 31, 2026

is there a simple example that we could add to the repo that makes it really easy to show the crash issue?

Yes, the first commit add a new example: WebGPU widget inside a virtualized list ("MultiContext"). It's here: 30116d9

Without the fix, it crashes quickly on Android during scrolling or when switching from the main screen to the screen with the list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants