Skip to content

Add Vulkan rendering backend#7233

Open
laanwj wants to merge 18 commits intoscp-fs2open:masterfrom
laanwj:vulkan-pr
Open

Add Vulkan rendering backend#7233
laanwj wants to merge 18 commits intoscp-fs2open:masterfrom
laanwj:vulkan-pr

Conversation

@laanwj
Copy link
Contributor

@laanwj laanwj commented Feb 16, 2026

Implement a Vulkan 1.1 renderer that replaces the previous stub with a fully functional backend, mostly matching the OpenGL backend's rendering capabilities. The game should be playable with minimal divergence from OpenGL rendering.

This is, most likely, too big to go in all at once, but just filing it here for reference because it's reached a testable state.

Core rendering infrastructure. The code lives under code/graphics/vulkan:

  • VulkanMemory: Custom allocator with sub-allocation from device-local and host-visible memory pools
  • VulkanBuffer: Per-frame bump allocator for streaming uniform/vertex/index data (persistently mapped, double-buffered, auto-growing)
  • VulkanTexture: Full texture management including 2D, 2D-array, 3D, and cubemap types with automatic mipmap generation and sampler caching
  • VulkanPipeline: Lazy pipeline creation from hashed render state, with persistent VkPipelineCache
  • VulkanShader: SPIR-V shader loading (main, deferred, effects, post-processing, shadows, decals, fog, MSAA resolve, etc.)
  • VulkanDescriptorManager: 3-set descriptor layout (Global/Material/PerDraw) with per-frame pool allocation, auto-grow, and batched updates
  • VulkanDeletionQueue: Deferred resource destruction synchronized to frame-in-flight fences

Design choices:

  • Two frames in flight with fence-based synchronization
  • Asynchronous texture upload, no waitIdle or other CPU-on-GPU blocking in hot path
  • Single command buffer per frame; render passes begun/ended as needed for the multi-pass deferred pipeline
  • Per-frame descriptor pools
  • All descriptor bindings pre-initialized with fallback resources (zero UBO + 1x1 white texture) so partial updates never leave undefined state
  • Streaming data (such as immediates) uses a bump allocator (one large VkBuffer per frame)
  • Pipeline cache persisted to disk for fast startup on subsequent runs

Some notable Vulkan vs OpenGL differences are:

  • Because shaders are pre-compiled to SPIR-V, shader variants are less feasible in Vulkan. Preprocessing directives have been converted to run-time uniform based branching.
  • Depth range is [0,1] not [-1,1]: shadow projection matrices adjusted, shaders that linearize depth need isinf/zero guards at depth boundaries where OpenGL gives finite values
  • Vulkan render target is "upside down", y-flip for render target is handled through negative viewport height, as is common
  • gl_ClipDistance is always evaluated: must write 1.0 when clipping is disabled (OpenGL allows leaving it uninitialized)
  • Texture addressing for AABITMAP/INTERFACE/CUBEMAP forced to clamp (OpenGL's sampler state happens to do this implicitly)
  • Render pass architecture requires explicit transitions between G-buffer, shadow, decal, light accumulation, fog, and post-processing passes (OpenGL just switches FBO bindings)
  • No geometry shaders. They're possible with Vulkan, but less common. Currently they're not used.

Preparation patches to common game code (these commits need to go in first):

  • Extract sphere and cylinder mesh generation into shared graphics utility: Needed in both GL and Vulkan
  • Route ImGui calls through gr_screen function pointers: Makes it possible for the Vulkan backend to provide its own ImGui implementation
  • Free bitmaps before destroying graphics backend: Fix shutdown order issue
  • Use float shader input instead of SCREEN_POS in gr_flash_internal: Compatibilty with Vulkan shaders
  • Remove now-unused SCREEN_POS vertex format: Cleanup after previous commit
  • Add dds_block_size and dds_compressed_mip_size utilities: Factor out utility code to be used in Vulkan backend
  • Add CAPABILITY_QUERIES_REUSABLE for GPU queries: Vulkan needs different lifecycle for GPU queries
  • Fix gr_flip debug output ordering: Prevent immediate buffer from being overwritten
  • Fix gr_end_2d_matrix viewport for render-to-texture: Fix RTT for Vulkan
  • Fix undefined gl_ClipDistance and use uint for std140 bool: Shader compatibility with Vulkan
  • Fix shader build MAIN_DEPENDENCY and add conditional GLSL/struct generation: Build system change for OpenGL/Vulkan shader split
  • Add missing memcpy_if_trivial_else_error for void *, const void*

What's possibly left to be done:

  • Unify OpenGL and Vulkan shaders where possible: the only shader shared with OpenGL (defined in the buid system's SHADERS_GL_SHARED) is still the default material. Although the Vulkan backend does some things differently, it would definitely be possible to share more code. But i didn't want to accidentally break OpenGL in some way.

  • Integrate VMA (Vulkan Memory Allocator). Some of the memory handling could be simplified by importing this dependency.

  • OpenXR anything. This is currently not implemented at all.

Build steps:

cmake -B build -DCMAKE_BUILD_TYPE=Debug -DFSO_BUILD_WITH_VULKAN=ON -DFSO_BUILD_WITH_OPENXR=OFF
cmake --build build

To run (with maximum debugging and Vulkan layer validation):

export VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation
export VK_LOADER_DEBUG=all

build/bin/fs2_open_25_1_0_x64_AVX2-DEBUG -vulkan -gr_debug -stdout_log -profile_frame_time

Full disclosure: i used Claude Opus 4.6 while developing this. However, the overall direction and design is my own, and i've paid careful attention to the code.

@BMagnu
Copy link
Member

BMagnu commented Feb 17, 2026

Thanks for the PR!
I'll be looking at it and playing around with it soon.
Please be aware, this being as big as it is, that it might be a while until we get through it.

@BMagnu
Copy link
Member

BMagnu commented Feb 17, 2026

Okay, played around with it a little.
Got it to run, though with a slew of visual artifacts and some crashes on some mods.
Still, a great first step to see it running in vulkan, at quite impressive performance numbers.
I'd love to discuss some of the design decisions in more detail. Are you on the discord, or somewhere else sensible for extended discussion?

@The-E
Copy link
Member

The-E commented Feb 19, 2026

I played around with it a little bit as well, and using nSight I could at least get as far as seeing that rendering a background with a skybox causes some amount of corruption to get into the main framebuffer - I haven't yet been able to see where it's coming from (as rendering a skybox should be one of the simpler things, just some basic geo and a couple textures, no lighting), but, well, there it is.

@laanwj
Copy link
Contributor Author

laanwj commented Feb 21, 2026

Are you on the discord, or somewhere else sensible for extended discussion?

i'm on the hard-light discord server i'm "mara" there. i'm not very active on discord, but happy to discuss.

Got it to run, though with a slew of visual artifacts and some crashes on some mods.

To be honest i've only been replaying the retail campaign with it. So code paths not exercised there will be less (or even not) tested. Please let me know which mods this happens with!

I played around with it a little bit as well, and using nSight I could at least get as far as seeing that rendering a background with a skybox causes some amount of corruption to get into the main framebuffer - I haven't yet been able to see where it's coming from (as rendering a skybox should be one of the simpler things, just some basic geo and a couple textures, no lighting), but, well, there it is.

Thanks for trying. i've installed NVidia Coresight but couldn't get it to report any issues in the level i tried (and i'd been using Vulkan's validation layer as well as RenderDoc during development and it should be clean). Can you send me the level file this happens with? And the messages that i should watch for?

@The-E
Copy link
Member

The-E commented Feb 21, 2026

Thanks for trying. i've installed NVidia Coresight but couldn't get it to report any issues in the level i tried (and i'd been using Vulkan's validation layer as well as RenderDoc during development and it should be clean). Can you send me the level file this happens with? And the messages that i should watch for?

My testing was done using the latest mediaVPs mod, running both the first mission and using the lab environment.
image

RenderDoc capture available here: https://drive.google.com/file/d/1ficdGUP-e8xfmUjZWAzWZmwpa9aAtFmt/view?usp=drive_link

@BMagnu
Copy link
Member

BMagnu commented Feb 21, 2026

I was similarly running the MediaVPs' first mission (where I got the artifacts), and I was testing the "Icarus" Cutscene from Blue Planet (which crashes on trying to render the opening movie, skippable with -nomovies)

@Shivansps
Copy link
Contributor

Im not in any position to ask, but instead of getting the vulkan lib and headers currently installed in the host system, maybe its better to use a glad2 loader for vulkan in the same way as it is for OpenGL?

@SamuelCho
Copy link
Contributor

Wow, nice work. Pretty straightforward design, nothing surprising. I have a local WIP DX12 implementation I've been working on and off on in my spare time for general practice and I see a lot of similar decisions you've made here in your VK implementation.

I kind of wonder if we need to double buffer the immediate buffer so that we leave alone the one that's in-flight. But maybe it doesn't matter if the fence in the command buffer submission and flip takes care of everything. Or is it the buffer manager that keeps track of the frame num?

Surprised that my batching code made it out intact. Also surprised that my render primitives immediate code also made it out intact. Sorry if it caused any headaches.

@Shivansps
Copy link
Contributor

Shivansps commented Feb 28, 2026

Ill put this in here for reference in case anyone is interested.

i did tried to see if i can change it to use the glad2 loader instead, as i expected since it is using vulkan.hpp, it is using the Vulkan C++ bindings, glad 2 loader has the C bindings, in the exact same way as with the version OpenGL. So its not a huge amount of work to change it, but it is still considerable work to change all bindings. (like 2-3 days). It is some work just to get it compile again not knkwing it is going to still work after that.

I also got the current PR version to compile for android by just adding the missing .hpp vulkan headers to the Android NDK, not elegant as im adding stuff to the toolchain but, it will do for now. Buuuuut it does not compile for 32 bits (x86/arm32), but it does for x86_64/arm64, not sure if this also the case for regular builds

On my phone with a Mali-G57
fs2_open.log
Crashes during shader compilation
0000000001911d04 /vendor/lib64/egl/libGLES_mali.so (cmpbe_v2_compile_multiple_shaders+2372) (BuildId: 747cc1a89e3838ab)
02-28 11:41:37.638 7140 7140 F DEBUG : Cause: null pointer dereference

On my Retroid G2 Handheld with a Qualcomm G2 and an Adreno 22 GPU, it fails to init vulkan because it lacks a transfer queue. I guess it is VK_QUEUE_TRANSFER_BIT? So its not completely 1.1 it uses an optional extension/feature.
fs2_open.log.txt

@laanwj
Copy link
Contributor Author

laanwj commented Mar 1, 2026

@The-E

My testing was done using the latest mediaVPs mod, running both the first mission and using the lab environment.

Thanks. The renderdoc capture should be helpful for reproduction.
(had to send a request to access it)

@SamuelCho

I kind of wonder if we need to double buffer the immediate buffer so that we leave alone the one that's in-flight. But maybe it doesn't matter if the fence in the command buffer submission and flip takes care of everything. Or is it the buffer manager that keeps track of the frame num?

It does. This is handled purely in the Vulkan layer. The buffer manager does a double buffering of all dynamic and streaming buffers in FrameBumpAllocator m_frameAllocs[MAX_FRAMES_IN_FLIGHT]. So it should have the same behavior as the GL backend with regard to orphaned buffers.
(It's also enforced that dynamic and streaming buffer content isn't reused between frames, by throwing a failure in that case)

Surprised that my batching code made it out intact. Also surprised that my render primitives immediate code also made it out intact. Sorry if it caused any headaches.

Hahah it wasn't too bad!

Im not in any position to ask, but instead of getting the vulkan lib and headers currently installed in the host system, maybe its better to use a glad2 loader for vulkan in the same way as it is for OpenGL?

Will look into it. It seems it would be way easier to vendor vulkan.hpp instead of switching to using C bindings, so i'll go for that first.

@laanwj
Copy link
Contributor Author

laanwj commented Mar 1, 2026

Okay. i've bundled the Vulkan and Vulkan-CPP headers in lib/vulkan-headers and updated the build system for this. Function loading was already happening dynamically through SDL, except for ImGui, which now does so too. i did not need to use glad2.

With this, it should be possible to build it on (or for) platforms without the Vulkan library and headers installed.

@laanwj
Copy link
Contributor Author

laanwj commented Mar 1, 2026

Trying to get it to pass the CI now. Will squash all these changes into the main (or otherwise original) commit when done.

@laanwj
Copy link
Contributor Author

laanwj commented Mar 1, 2026

i'm not happy where clang-tidy is taking some of these. It first wants to make these functions static (because it could), and now it want to refer to them by fully qualified class name instead of instance:

-	auto* texSlot = texManager->getTextureSlot(handle);
+	auto* texSlot = graphics::vulkan::VulkanTextureManager::getTextureSlot(handle);
-	drawManager->stencilClear();
+	graphics::vulkan::VulkanDrawManager::stencilClear();

Which is strictly correct but it's also less readable, and asymmetric with the rest of the API. Will see if (void)this works.

Edit: it did. Will look into rendering issues next.

@laanwj laanwj force-pushed the vulkan-pr branch 2 times, most recently from e5c9a34 to 61f890e Compare March 1, 2026 22:36
@GamingCity
Copy link

GamingCity commented Mar 2, 2026

Hi, Shivansps here, im on a diferent account, i think i know why it says there is no transfer queue on the adreno driver.

I think this if here is wrong
https://github.com/laanwj/fs2open.github.com/blob/61f890e2966bbce9650d94eee9249ce13cae864b/code/graphics/vulkan/VulkanRenderer.cpp#L104

if (!values.transferQueueIndex.initialized && queue.queueFlags & vk::QueueFlagBits::eTransfer) {
//False if no eTransfer (optional)
} else if (queue.queueFlags & vk::QueueFlagBits::eTransfer && !(queue.queueFlags & vk::QueueFlagBits::eGraphics)) {
//False if no eTransfer (optional)
}

Acording to the documentation
https://registry.khronos.org/VulkanSC/specs/1.0-extensions/man/html/VkQueueFlagBits.html

"All commands that are allowed on a queue that supports transfer operations are also allowed on a queue that supports either graphics or compute operations. Thus, if the capabilities of a queue family include VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT, then reporting the VK_QUEUE_TRANSFER_BIT capability separately for that queue family is optional."

eGraphics (and eCompute) all include a transfer queue but may not report it.
So i think " & vk::QueueFlagBits::eTransfer" should be removed from the first if and assume it is. (and maybe make sure it is not eCompute? im not sure about that)

@laanwj
Copy link
Contributor Author

laanwj commented Mar 2, 2026

So i think " & vk::QueueFlagBits::eTransfer" should be removed from the first if and assume it is. (and maybe make sure it is not eCompute? im not sure about that)

Good catch. Yes, the logic there is wrong. "It worked on NVidia" 😊 Will fix.

Edit: Mind that the transfer queue is currently unused, as this makes the upload code simpler, due to there being no cross-queue synchronization requirement. In the current design there wouldn't be a benefit to using it, just overhead, as there's (AFAIK) no way to exploit parallelism here. So we could even decide to completely remove checking for it.

@laanwj
Copy link
Contributor Author

laanwj commented Mar 2, 2026

i've pushed a few rendering corruption fixes. Some wrong assumptions about renderpass state, and Vulkan vs GL differences. The cubemap corruption and random framebuffer noise should be solved now.

@Shivansps
Copy link
Contributor

Shivansps commented Mar 3, 2026

Just reporting back here, the change to the transfer queue selection did work. Now the Adreno GPU works and can get into the game.
The Mali GPU still crashes while compiling the default material shader, but no matter, ill guess that will be something to look at after the PR is merged.

@The-E
Copy link
Member

The-E commented Mar 3, 2026

Alright, your latest changes definitely fixed the framebuffer corruption, but I have more:
image
Not exactly sure what's going on here, but it seems there's something going weird when post processing is enabled: without post processing, the frame renders normally
One thing to examine would be wireframe rendering: I think there might be some options not being set correctly here.
image
Note that shutting off post processing in the lab also turns off imgui rendering.

Transparency is not rendered correctly
image

Particle and glowpoint blending modes are not set correctly:
image
Note the black halo around the glowpoint attached to the chin fin (or whatever that thing is called....)

textures appear to be downsampled in the lab:
image

I would also recommend running through the Blue Planet: War in Heaven intro - it shows a couple instances of textures rendered as pure white for some reason

@GamingCity
Copy link

Today i saw two things:
again, ill remember you android is not a working platform yet and ill work on the android PR after this and SDL3 is merged, so, i just mention things to keep track of it to see if can be fixed or it creates problems on other platforms. That said i dont know why 32bit CI does not complains about this and is only a problem on the ndk toolchain.

  1. I discovered why i was unable to compile 32 bits builds with the android-ndk, there is a mix of C and C++ types here:
    VulkanMemory.h
    struct VulkanAllocation {
    VkDeviceMemory memory = VK_NULL_HANDLE;
    VkDeviceSize offset = 0;
    VkDeviceSize size = 0;
    void* mappedPtr = nullptr; // Non-null if memory is mapped
    uint32_t memoryTypeIndex = 0;
    bool dedicated = false; // True if this is a dedicated allocation
    };

Changing to C++ types fixes 32 bit compilation

struct VulkanAllocation {
vk::DeviceMemory memory = VK_NULL_HANDLE;
vk::DeviceSize offset = 0;
vk::DeviceSize size = 0;
void* mappedPtr = nullptr; // Non-null if memory is mapped
uint32_t memoryTypeIndex = 0;
bool dedicated = false; // True if this is a dedicated allocation
};

Why this compiles its not going to work or it is going to have additional issues as VulkanPipeline.cpp has shifts to go out of range for 32 bit types.
shift warnings.txt

@BMagnu
Copy link
Member

BMagnu commented Mar 3, 2026

While not an immediate priority, I'd love to question the following design goal:
"Because shaders are pre-compiled to SPIR-V, shader variants are less feasible in Vulkan. Preprocessing directives have been converted to run-time uniform based branching."

Long / Medium term, I would like for FSO to ship with shadertool or something to allow it to compile to SPIR-V itself. This gets rid of a lot of issues here. First, we'd be able to keep text-based shaders that can be dual-use for OpenGL and Vulkan. Any incompatibilities can just be put in preprocessor blocks like main-f's prereplace, allowing full dual-use of all shaders. Furthermore, it'd allow table-able postprocessing and shader changes. While currently a full shader replace is necessary for custom shaders, I eventually want this to be properly modular, so being able to modify parts of shaders is a goal, and that for sure requires compilation on-the-fly.

Shipping with shadertool and then compiling on load (ideally after game-settings.tbl, especially since the recent Z-Compress changes) all available shaderfiles to SPRIV shouldn't be that hard either.

@laanwj
Copy link
Contributor Author

laanwj commented Mar 4, 2026

@Shivansps
Good!
i could in principle test Android + Adreno on my Ayn Thor. i don't have any device with a Mali GPU. But one thing at a time. i've never really done android development so it'll be some things to figure out.

@GamingCity
Ah yes, you're right. It's better to be consistent about using the C++ vulkan types instead of the C ones. Will switch it over. Though i'm very surprised that it makes a difference in practice.

@The-E
Thanks for the reports. At least the rendering issues are getting more subtle.

@BMagnu
Yes. i think it would be fine to make FSO depend on a GLSL-to-SPIR-V compiler library, and then do the compilation at runtime instead of compile-time. i can look into it.
i was just trying to be careful here to not introduce any big dependencies.
In principle, modular shaders can also be done with simpler SPIR-V level linking. But that'd be incompatible with the goal of unifying with the OpenGL backend.

@Shivansps
Copy link
Contributor

Please, I dont want to make you waste time on android testing, its not even a working platform yet. Ill post if i can find out something.

If you want to see i have a Fso_Android_Wrapper](https://github.com/Shivansps/Fso_Android_Wrapper) as the android test app, Fso-Android-Prebuilts were i have the script and instructions to build the fso dependencies and fso itself, and i have a "android-build-vulkan" branch on my fork where i added this pr to my previous android work,

I did found one problem with android on VulkanRenderer
createInfo.preTransform = deviceValues.surfaceCapabilities.currentTransform;

It seems that if you leave at that and use
SDL_SetHint(SDL_HINT_ORIENTATIONS, "LandscapeLeft LandscapeRight");
that im using to force landscape mode on android, it will (if i understood right) get a surface that is already rotated and then rotate it again, making it rander in portrait mode.

I changed it to this that did worked.

auto supported = deviceValues.surfaceCapabilities.supportedTransforms;
if (supported & vk::SurfaceTransformFlagBitsKHR::eIdentity) {
createInfo.preTransform = vk::SurfaceTransformFlagBitsKHR::eIdentity;
} else {
createInfo.preTransform = deviceValues.surfaceCapabilities.currentTransform;
}

I dont know if thats the right fix, it does not seems to do anything in windows. Keep in mind i used an AI to point me to this and the potential fix as i did not know if anything in vulkan could cause this, it told me to check where the preTransfor and surface capabilities are set for the transform and that i should use the eidentity flag.

https://docs.vulkan.org/refpages/latest/refpages/source/VkSurfaceTransformFlagBitsKHR.html

@BMagnu
Copy link
Member

BMagnu commented Mar 5, 2026

Fair enough re: compiling shaders and large dependencies, but I think it is worth here.
Even just having a unified backend to maintain (where the shader compiler likely needs little to no continued maintainance) is worth it alone IMO, but with tableable variants, it is for sure.

@laanwj laanwj force-pushed the vulkan-pr branch 2 times, most recently from 4cd808a to 25d4c43 Compare March 6, 2026 19:58
@laanwj
Copy link
Contributor Author

laanwj commented Mar 6, 2026

The most recent commit unifies the GL and Vulkan shaders (as much as possible). It also uses shader variants throughout, just like GL.

Apparently i need to rebase. i think i'll first squash back the Vulkan backend into one commit, to avoid back-and-forth changes (e.g. the changes to uniform_structs.h turn out to be unnecessary, and adding the compiled shaders was unnecessary, etc).

Edit: i could also extract some common shader compiler logic between Vulkan and OpenGL backends, which i'll put in a commit before introducing the Vulkan backend. Still working on this.

Edit/2: sorry i messed up the rebase. i have it working again locally, but ran into some new rendering issues to do with the z-compression commit (#7245). Will push the new branch only when i have that sorted out.

Edit/3: this should be back to the functional state before the shader unification and rebase. The depth compression for the deferred shading position buffer works in Vulkan too. There's still the same rendering issues with particles, but no other regressions AFAIK.

laanwj added 15 commits March 8, 2026 10:39
Move the pure-math vertex/index generation out of `gropengldeferred.cpp`
into graphics/util/primitives so it can be reused by the Vulkan backend.
Modernize to use `SCP_vector` instead of `vm_malloc`/`vm_free` for automatic
memory management.
Replace direct `ImGui_ImplOpenGL3` calls in game code with
backend-agnostic `gr_imgui_new_frame` and `gr_imgui_render_draw_data`
function pointers, matching the pattern used by all other `gr_*`
functions. This makes it possible for the Vulkan backend
to provide its own ImGui implemantation.
`bm_close` calls `gf_bm_free_data` for each bitmap slot, which needs the
graphics backend (Vulkan texture manager, OpenGL context) to still be
alive. Move `bm_close` before the backend cleanup switch in `gr_close`.
`gr_flash_internal` used int vertices with `SCREEN_POS` (`VK_FORMAT_R32G32_SINT`)
but the default-material vertex shader expects vec4 float at location 0.
OpenGL silently converts via glVertexAttribPointer; Vulkan requires exact
type matching. Use float vertices with `POSITION2` format instead. There
should be no difference in behavior.
The `SCREEN_POS` vertex format is no longer used after the only use in
`gr_flash` was removed. Remove it entirely.
Deduplicate compressed texture block-size mapping and mip-size
calculation into two inline helpers in `ddsutils.h`, replacing
repeated inline formulas in `ddsutils.cpp` and `gropengltexture.cpp`.
Add a render system capability to indicate whether GPU timestamp query
handles can be immediately reused after reading.

When queries are not reusable, `free_query_object` returns handles to the
backend via `gr_delete_query_object` instead of the tracing free list,
letting the backend manage its own reset lifecycle. This greatly
simplifies query management for Vulkan.

Also change shutdown to discard gpu_events for backends where queries
aren't reusable (no more frames will be submitted to make them
available).
Move `output_uniform_debug_data` before `gr_reset_immediate_buffer` so
debug text is rendered while the immediate buffer still contains valid
data. The previous ordering read from a buffer that was already reset to
offset 0, which is logically wrong for any backend and a hard failure
for deferred-submission backends.
`gr_set_proj_matrix` already branches on rendering_to_texture to choose
top-left (RTT) vs bottom-left (screen) viewport origin. `gr_end_2d_matrix`
should match, but it unconditionally used the bottom-left formula. Add
the same `rendering_to_texture` branch so the viewport is restored
correctly when rendering to a texture.
Change `bool clipEnabled` to `uint clipEnabled` in the default-material
shader UBO. GLSL bool has implementation-defined std140 layout; uint is
portable and matches the SPIR-V decompiled output.

Add an else-branch writing `gl_ClipDistance[0] = 1.0` when clipping is
disabled. Without this, gl_ClipDistance is undefined and some drivers
cull geometry unexpectedly.
Memcpy from a `const void*` to `void*` is trivial enough. However, this
case was missing, resulting in a false positive compilation error.
Extract shader loading and preprocessing (include/predefine
expansion) into code/graphics/shader_preprocess.cpp, so it
can be shared with the Vulkan backend.
Bundle Vulkan headers (v1.4.309).
Bundle Vulkan Memory Allocator (v3.2.1).
@laanwj
Copy link
Contributor Author

laanwj commented Mar 8, 2026

i managed to narrow down and fix the particle/laser background problem. It was really tricky!

Apparently, ddsutils had a fixed dependency on GLAD, and queried OpenGL extensions to see if compressed textures should be decompressed before returning them. This always returned 0 for Vulkan. Ideally this would only affect performance. However, there was a sneaky interaction with additive/alpha blend modes based on bpp.

It's been changed to use the CAPABILITY_BPTC render system capability instead, and add a new capability CAPABILITY_S3TC for BC1/BC2/BC3.

There were some other bugs in Vulkan compressed texture handling as well. With these fixed, not only do the lasers etc. look as they should, loading is a lot faster too.

Edit: With MVPS, there's still a render issue where it shows contours around ships when backgrounded against particles like the engine exhaust. But we're getting there.

@laanwj laanwj force-pushed the vulkan-pr branch 2 times, most recently from 95f5285 to 299a2e5 Compare March 8, 2026 23:37
@notimaginative
Copy link
Contributor

Tested this on my Mac and after a slight struggle to get things going (libs and shadertool primarily) it runs pretty well. Some visual glitches, but nothing terrible.

I only did a few quick tests so I can't say for sure how well everything is working. But from what I saw the average FPS jumped up by 40-50 FPS over what OpenGL gives me. Full neb missions were quite slow in comparison, but this appeared to be the case with OpenGL too and may be related to MVPS 5.0.x as this was my first run with that update as well. The massive battle missions that often run at 7-8 FPS at the start with OpenGL were at 20+ FPS with vulkan.

One issue that hit me early on was error handling. If vulkan failed to init properly for some reason then it would always hit an assert or memory error during deinit. This only appears to happen during init failure, as a proper init will deinit just fine.

Another issue was memory usage. It was consuming 2x-3x the memory compared to using OpenGL. I also only ever saw the memory usage increase, never decrease. With OpenGL the memory usage would fluctuate up or down based on the loaded mission. With vulkan, after loading 3 or 4 different missions, it appeared to be using roughly 16 GB of memory (~10 for app, ~6 for graphics, according to metal overlay). When using OpenGL it typically appears to use 4-5 GB of memory total. I'm not sure if that's a platform issue, a code issue, or just a renderer difference.

@laanwj
Copy link
Contributor Author

laanwj commented Mar 9, 2026

Tested this on my Mac and after a slight struggle to get things going (libs and shadertool primarily) it runs pretty well. Some visual glitches, but nothing terrible.

Thanks for testing!

FWIW: You shouldn't need shadertool. It's only used to generate default-material_structs.*.h. Which are checked into the repository, and stable unless the UBO is changed inside the GLSL shader. No other build-time shader compilation is done anymore.
i'll remove the -DSHADERS_ENABLE_COMPILATION=ON from the OP example.

One issue that hit me early on was error handling. If vulkan failed to init properly for some reason then it would always hit an assert or memory error during deinit. This only appears to happen during init failure, as a proper init will deinit just fine.

Agree that ideally, deinit shouldn't hit asserts. Any problem during init will already have been reported by then, so it's redundant.

It's absolutely set to be somewhat paranoid on the side of error handling right now because i didn't want bugs to silently creep in.

Another issue was memory usage. It was consuming 2x-3x the memory compared to using OpenGL.

Did you test before or after the texture compression fix? If before, please retry, as that alone makes textures, a big part of memory use, much smaller.

In any case, thanks for the report. i haven't looked into optimizing memory usage at all, yet.
Most notably i remember the OpenGL backend has a callback to flush caches, that's called between missions. This is not currently done for Vulkan.

@notimaginative
Copy link
Contributor

Agree that ideally, deinit shouldn't hit asserts. Any problem during init will already have been reported by then, so it's redundant.

It's absolutely set to be somewhat paranoid on the side of error handling right now because i didn't want bugs to silently creep in.

I'll do some better testing later this week or early next week so I can point you to the exact areas I was hitting. Most of that appeared to be due to library issues, which we'd fix with a new set of prebuilt libs, so people generally shouldn't have those issues anyway. But I'll document what I can in case there does happen to be an actual code issue hiding in there somewhere.

Did you test before or after the texture compression fix? If before, please retry, as that alone makes textures, a big part of memory use, much smaller.

After. And I double-checked that to be sure before submitting that previous comment since I figured that might be an issue. I confirmed that both S3TC and BPTC were actually supported too in case that was the problem.

That might be part of the issue as well since OpenGL on the Mac does not support BPTC and the decompression routine limits the mipmap size which could artificially lower OpenGL memory requirements on the larger compressed textures. So it's a unfair comparison to make on the Mac and should really be looked at on Windows/Linux with better OpenGL drivers to confirm whether there is a problem or not.

Still though, it's pretty awesome to jump from 30-40 avg fps on the Mac to 90-100 avg fps. With a debug build no less. So I'm pretty thrilled with the progress here and the great work that you're doing!

laanwj added 3 commits March 10, 2026 15:38
ddsutils.cpp checked OpenGL-specific GLAD globals to decide whether to
decompress DXT textures. When the Vulkan backend was active these
variables were never set, so all DXT textures were decompressed to
32bpp RGBA.

Replace the GLAD checks with gr_is_capable() queries for the new
CAPABILITY_S3TC and existing CAPABILITY_BPTC, making ddsutils
backend-agnostic. Add the S3TC capability handler to the OpenGL backend.
Extract shader type tables (filenames, descriptions) and
variant tables (type, flag, define, description) into shared
code/graphics/shader_types.{h,cpp}.

Also move FXAA quality preset defines into shader_types so both
backends can share a single implementation.
Implement a Vulkan 1.1 renderer that replaces the previous stub with a
fully functional backend, mostly matching the OpenGL backend's rendering
capabilities.

Core rendering infrastructure:

- `VulkanMemory`: Custom allocator with sub-allocation from device-local and
  host-visible memory pools
- `VulkanBuffer`: Per-frame bump allocator for streaming uniform/vertex/index
  data (persistently mapped, double-buffered, auto-growing)
- `VulkanTexture`: Full texture management including 2D, 2D-array, 3D, and
  cubemap types with automatic mipmap generation and sampler caching
- `VulkanPipeline`: Lazy pipeline creation from hashed render state, with
  persistent VkPipelineCache
- `VulkanShader`: GLSL shader loading. Shader code and metadata are
  shared with OpenGL, with differences guarded by preprocessor
  conditions
- `VulkanDescriptorManager`: 3-set descriptor layout (Global/Material/PerDraw)
  with per-frame pool allocation, auto-grow, and batched updates
- `VulkanDeletionQueue`: Deferred resource destruction synchronized to
  frame-in-flight fences

Design choices:

- Two frames in flight with fence-based synchronization
- Asynchronous texture upload, no `waitIdle` in hot path
- Single command buffer per frame; render passes begun/ended as needed
  for the multi-pass deferred pipeline
- Per-frame descriptor pools
- All descriptor bindings pre-initialized with fallback resources (zero
  UBO + 1x1 white texture) so partial updates never leave undefined state
- Streaming data uses a bump allocator (one large VkBuffer per frame)
- Pipeline cache persisted to disk for fast startup on subsequent runs
- Use VMA (Vulkan Memory Allocator) for buffer management

Some notable Vulkan vs OpenGL differences are:

- Depth range is [0,1] not [-1,1]: shadow projection matrices adjusted,
  shaders that linearize depth need isinf/zero guards at depth boundaries
  where OpenGL gives finite values
- In Vulkan, all shader outputs must be initialized. Leaving them
  uninitialized can result in random corruptions, while
  OpenGL allows leaving them in some cases
- Swap chain is B8G8R8A8: screenshot/save_screen paths swizzle to RGBA
- Vulkan render target is "upside down", y-flip for render target is
  handled through negative viewport height, as is common
- Texture addressing for AABITMAP/INTERFACE/CUBEMAP forced to clamp
  (OpenGL's sampler state happens to do this implicitly)
- Render pass architecture requires explicit transitions between G-buffer,
  shadow, decal, light accumulation, fog, and post-processing passes
  (OpenGL just switches FBO bindings)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discussion This issue has (or wants) a discussion feature A totally new sort of functionality graphics A feature or issue related to graphics (2d and 3d)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants