SpawnDev.ILGPU ships a unified canvas rendering API that presents an ILGPU pixel buffer to an HTML <canvas> element with the lowest possible overhead on every backend.
ICanvasRenderer
├── WebGPUCanvasRenderer — zero-copy fullscreen-triangle render pass, no CPU readback
├── WebGLCanvasRenderer — ImageBitmap blit from the GL worker, draw inside callback
└── CPUCanvasRenderer — reused ImageData object, fast Uint8Array copy (Wasm / desktop CPU)
CanvasRendererFactory.Create(accelerator) returns the right implementation automatically.
using SpawnDev.ILGPU.Rendering;
using SpawnDev.BlazorJS.JSObjects;
// Create the best renderer for the active accelerator (call once)
ICanvasRenderer _renderer = CanvasRendererFactory.Create(accelerator);
// Attach to the canvas element (call once, or again when the canvas changes)
using var canvas = new HTMLCanvasElement(_canvasRef);
_renderer.AttachCanvas(canvas);
// Each frame: run kernel → present
_kernel(_outputBuffer.IntExtent, _outputBuffer.View /*, ...args */);
await accelerator.SynchronizeAsync();
await _renderer.PresentAsync(_outputBuffer);PresentAsync accepts both MemoryBuffer2D<uint, Stride2D.DenseX> and MemoryBuffer2D<int, Stride2D.DenseX>. Pixels are packed RGBA little-endian: R in bits 0–7, G 8–15, B 16–23, A 24–31.
No CPU readback at all. On every PresentAsync call:
FlushPendingCommands()ensures all queued kernel dispatches have been submitted.- A cached fullscreen-triangle render pipeline reads the pixel buffer directly from a
read-only-storagebinding. - A
GPURenderPassrasterises a 3-vertex triangle that covers the entire viewport, with the fragment shader unpacking eachuint32pixel into RGBA. - The result is blitted from an off-DOM internal canvas to the user-visible canvas via
CanvasRenderingContext2D.drawImage.
The render pipeline and bind-group layout are built once in AttachCanvas and reused every frame. The uniform buffer (width/height) is only re-uploaded when the resolution changes.
kernel output buffer (GPUBuffer)
│ storage read
▼
fullscreen triangle renderpass ← no CPU round-trip
│
internal WebGPU canvas
│ drawImage (zero-copy GPU blit)
▼
display canvas (2d context)
The WebGL backend runs in a dedicated Web Worker. Blitting to a visible canvas therefore requires getting an ImageBitmap across the worker boundary. The implementation avoids a race where the browser could clear the canvas between the blit and the draw:
BlitAndDrawAsyncposts ablitmessage to the GL worker.- The worker renders the texture to an offscreen framebuffer and calls
transferToImageBitmap(). - The
ImageBitmapis transferred back to the main thread. - Synchronously inside the message-handler callback — before any JS event-loop turn can run —
ctx.drawImage(bitmap)paints the bitmap onto the canvas. - Only after the draw does
BlitAndDrawAsyncresolve itsTask.
The synchronous draw is the critical detail. Without it, Blazor's render cycle can overwrite the canvas between frames.
kernel output buffer (WebGL texture in worker)
│ texelFetch + offscreen FBO
▼
worker: transferToImageBitmap()
│ postMessage (structured clone, zero-copy)
▼
main thread callback: ctx.drawImage(bitmap) ← synchronous, in the handler
│
display canvas (2d context)
Used for any accelerator that is neither WebGPU nor WebGL (Wasm in the browser; CPU on desktop). It reuses a single ImageData object to avoid GC churn:
- If the buffer is browser-backed (
IBrowserMemoryBuffer) — true for Wasm buffers — it callsCopyToHostUint8ArrayAsyncfor a fast JS-side copy with no managed allocation. - Otherwise it falls back to synchronous
CopyToCPUinto a pooleduint[]array. ctx.putImageDatawrites theImageDatato the canvas.
Kernels should pack pixels as uint32 little-endian RGBA:
static void PixelKernel(Index2D idx, ArrayView2D<uint, Stride2D.DenseX> output)
{
byte r = /* red 0–255 */;
byte g = /* green 0–255 */;
byte b = /* blue 0–255 */;
byte a = 255;
// Little-endian: R in byte 0, A in byte 3
output[idx] = (uint)((a << 24) | (b << 16) | (g << 8) | r);
}The same packing works identically on all three renderer implementations.
@page "/mypage"
@implements IAsyncDisposable
@inject IJSRuntime JS
<canvas @ref="_canvasRef" width="800" height="600" />
@code {
private ElementReference _canvasRef;
private Context? _context;
private Accelerator? _accelerator;
private MemoryBuffer2D<uint, Stride2D.DenseX>? _output;
private Action<Index2D, ArrayView2D<uint, Stride2D.DenseX>>? _kernel;
private ICanvasRenderer? _renderer;
private bool _running;
protected override async Task OnAfterRenderAsync(bool firstRender)
{
if (!firstRender) return;
_context = await Context.CreateAsync(b => b.AllAcceleratorsAsync());
_accelerator = await _context.CreatePreferredAcceleratorAsync();
_output = _accelerator.Allocate2DDenseX<uint>(new LongIndex2D(800, 600));
_kernel = _accelerator.LoadAutoGroupedStreamKernel<
Index2D, ArrayView2D<uint, Stride2D.DenseX>>(PixelKernel);
_renderer = CanvasRendererFactory.Create(_accelerator);
using var canvas = new HTMLCanvasElement(_canvasRef);
_renderer.AttachCanvas(canvas);
_running = true;
_ = RenderLoop();
}
private async Task RenderLoop()
{
while (_running)
{
_kernel!(_output!.IntExtent, _output.View);
await _accelerator!.SynchronizeAsync();
await _renderer!.PresentAsync(_output);
await Task.Yield(); // yield to keep the browser responsive
}
}
static void PixelKernel(Index2D idx, ArrayView2D<uint, Stride2D.DenseX> output)
{
byte r = (byte)(255 * idx.X / 800);
byte g = (byte)(255 * idx.Y / 600);
output[idx] = (uint)(0xFF000000u | ((uint)128 << 16) | ((uint)g << 8) | r);
}
public async ValueTask DisposeAsync()
{
_running = false;
_renderer?.Dispose();
_output?.Dispose();
_accelerator?.Dispose();
_context?.Dispose();
}
}| WebGPU | WebGL | Wasm (+ desktop CPU) | |
|---|---|---|---|
| CPU readback | ❌ None | ❌ None (ImageBitmap) | ✅ Required |
| Extra allocations per frame | Bind group only | None | None (cached ImageData) |
| GPU stall | None | None | Sync on CPU path |
| Main-thread work | drawImage |
drawImage |
putImageData |
The WebGPU and WebGL paths avoid all CPU ↔ GPU data transfers during rendering. The pixel data stays GPU-resident from kernel output to the canvas.
// Create once alongside the accelerator
ICanvasRenderer renderer = CanvasRendererFactory.Create(accelerator);
// Attach whenever the canvas element is available/changes
renderer.AttachCanvas(canvas);
// Call every frame
await renderer.PresentAsync(buffer);
// Dispose alongside the accelerator
renderer.Dispose();AttachCanvas can be called multiple times (e.g., when the Blazor component re-renders and the ElementReference changes). The previous context is disposed and replaced.
If you need direct control over the blit pipeline — or need to support a buffer layout that does not fit MemoryBuffer2D<uint> — you can replicate what the renderers do:
WebGPU:
// Flush batched dispatches first
webGpuAccelerator.FlushPendingCommands();
// Then execute your own render pass using NativeAccelerator.NativeDevice
var gpuBuffer = ((WebGPUMemoryBuffer)rawBuffer).NativeBuffer.NativeBuffer;WebGL:
// BlitAndDrawAsync posts to the worker and calls your callback synchronously
// with the ImageBitmap before resolving, keeping the draw in the same event-loop turn.
await webGlAccelerator.BlitAndDrawAsync(memBuf, width, height, bitmap =>
{
ctx.DrawImage(bitmap);
// Additional compositing can go here
});