Add GL_EXT_mesh_shader #640

yuq · 2024-12-16T02:35:56Z

This is an OpenGL extension forking VK_EXT_mesh_shader to provide OpenGL mesh shader functionality.

Numbers in the spec haven't been allocated, so use fake numbers for now. No header/XML updates in the PR either.

This extension is for the request of nvidium users to add OpenGL mesh shader support to drivers other than NVIDIA GPUs:

oddhack · 2024-12-16T05:30:16Z

Re discussion on the mesa issue, Khronos does not "approve" new vendor extensions, though we do try and consistency-check them and make sure they're following the extension guidelines before we include them in the extension registry and hand out enum allocations. It's true that GL spec activity is very minimal within Khronos, but vendor and EXT extension development do not have to happen inside Khronos.

So the first thing to ask is whether there is a commitment to implement this on the part of someone actually writing Mesa drivers. There's no point in publishing an extension spec in the registry if nobody has implemented it. Then, how and why does it differ from the NV extension? I see a slight signature change on one of the APIs but haven't tried to review the whole thing. Because of the close relationship between them, there should at least be a section down around the "Interactions" discussing the things that are the same, and those that had to be changed, and why.

Would it be possible to implement the NV extension as it stands today on your target GPUs, and then add a really small extension on top of that to accommodate the changed signature, rather than duplicate so much of that language?

BTW, when promoting an extension we keep the enum values unchanged so long as they are indistinguishable semantically from the point of view of the driver they are passed to. Only if there's a need to behave differently depending on which extension is being used would the enum value need to change.

yuq · 2024-12-16T06:54:30Z

Re discussion on the mesa issue, Khronos does not "approve" new vendor extensions, though we do try and consistency-check them and make sure they're following the extension guidelines before we include them in the extension registry and hand out enum allocations. It's true that GL spec activity is very minimal within Khronos, but vendor and EXT extension development do not have to happen inside Khronos.

Thanks for the explanation.

So the first thing to ask is whether there is a commitment to implement this on the part of someone actually writing Mesa drivers. There's no point in publishing an extension spec in the registry if nobody has implemented it.

Yeah, I'm going to implement it in mesa if it's accepted.

Then, how and why does it differ from the NV extension? I see a slight signature change on one of the APIs but haven't tried to review the whole thing. Because of the close relationship between them, there should at least be a section down around the "Interactions" discussing the things that are the same, and those that had to be changed, and why.

The difference with NV extension has been listed in the issue Q&A:

Would it be possible to implement the NV extension as it stands today on your target GPUs, and then add a really small extension on top of that to accommodate the changed signature, rather than duplicate so much of that language?

It's not possible to stack a new extension on NV. Because the NV extension interface (mostly GLSL part) is not suitable for other GPU vendors, that's why Vulkan created VK_EXT_mesh_shader. We can implement NV extension with many ugly workaround in driver, but it will hurt performance:

https://gitlab.freedesktop.org/mesa/mesa/-/issues/7192#note_1822130

BTW, when promoting an extension we keep the enum values unchanged so long as they are indistinguishable semantically from the point of view of the driver they are passed to. Only if there's a need to behave differently depending on which extension is being used would the enum value need to change.

The runtime API part mostly come from the VK_EXT_mesh_shader to leverage the existing agreement made by different GPU vendors. I can keep the enum value which is same as NV extension and assign a fake value for the new ones.

Venemo · 2025-01-06T16:46:27Z

Hi,

So the first thing to ask is whether there is a commitment to implement this on the part of someone actually writing Mesa drivers.

Yes, there is interest in implementing it in RadeonSI as well as in Zink (which would work on top of the Vulkan EXT_mesh_shader exposed by the underlying Vulkan driver).

Then, how and why does it differ from the NV extension?

there should at least be a section down around the "Interactions" discussing the things that are the same, and those that had to be changed, and why

Would it be possible to implement the NV extension as it stands today on your target GPUs

Same as the Vulkan NV vs. EXT extensions. In a nutshell, the NV extension makes it impossible to implement mesh shaders with reasonable performance on other vendors's HW; and EXT fixes that. Furthermore, EXT is better aligned with D3D12 mesh shaders and therefore benefits developers by providing a more familiar programming model.

If you are interested in the exact details, they have been discussed in the Vulkan EXT_mesh_shader blog post and also on the spec MR here, among other places. This comment in the Mesa repo goes through the main issues with implemeing the NV extension on HW that wasn't designed for it.

zmike · 2025-01-07T20:17:23Z

I'd like to see the mesa implementation at least well underway before this is released, but I think it's a great addition to the ecosystem. Bringing cross-vendor support to GL mesh shading will enable things like nvidium to finally run on more platforms.

yuq · 2025-01-08T01:28:44Z

Then can the numbers be allocated first, so that I can update headers and start implementation?

zmike · 2025-01-08T01:38:40Z

Yeah that seems good. @oddhack do you take care of that or am I supposed to do something?

oddhack · 2025-01-08T08:37:15Z

Then can the numbers be allocated first, so that I can update headers and start implementation?

@yuq how many do you need? We allocate in blocks of 16. I think I counted 62 enums in the spec as it stands, so I can give you a block of 64 if you need that many (the new bit values not included in that total since they are semantically in a different namespace).

yuq · 2025-01-08T08:47:52Z

I need 22, aligned to 16 is 32. I reused some enum numbers from GL_NV_mesh_shader, only those begin with 0xF need to be allocated.

How about the extension serial number (I fake to 1024)? Do they have to be allocated when release?

oddhack · 2025-01-08T08:56:56Z

I need 22, aligned to 16 is 32. I reused some enum numbers from GL_NV_mesh_shader, only those begin with 0xF need to be allocated.

How about the extension serial number (I fake to 1024)? Do they have to be allocated when release?

Done, see d8fdb8d (enums 0x9740-0x975F).

The extension number is assigned when we publish. It isn't actually used as anything but an ordering mechanism.

yuq · 2025-01-08T09:12:47Z

OK, thanks.

Leadwerks · 2025-05-03T01:55:27Z

We are 100% in support of this proposal and are excited about working with this functionality.

yuq · 2025-07-02T09:13:32Z

I've done the implementation in mesa for AMD GPU. Next I'm going to upstream the code while giving it more test. https://gitlab.freedesktop.org/yuq825/mesa/-/commits/topic/mesh-shader

Could this MR be merged now? @oddhack @zmike

zmike · 2025-07-02T15:37:17Z

@yuq WG has some review comments pending. Also will wait to ship until nvidium is at least semi-working with this (pending) to ensure things are usable as expected.

extensions/EXT/EXT_mesh_shader.txt

Headcrabed · 2025-07-02T16:52:12Z

@yuq WG has some review comments pending. Also will wait to ship until nvidium is at least semi-working with this (pending) to ensure things are usable as expected.

I think nvdium only works on nvidia cards? Switching from nv's mesh shader implmentation to this is enough to make it run on amd cards?

xml/gl.xml

MCRcortex

Main thing is probably rasterization order and if possible having driver preferences on fast path payload sizes

MCRcortex · 2025-07-04T07:43:46Z

extensions/EXT/EXT_mesh_shader.txt

+
+    * MESH_PREFERS_COMPACT_PRIMITIVE_OUTPUT_EXT, TRUE if the implementation
+      will perform best if there are no unused primitives in the output array.
+


It may be useful to have MAX_PREFERRED_PAYLOAD_SIZE_EXT etc for optimial task to mesh payload size, e.g. on nvidia this is 128 bytes for the fast path if remembering correctly

Usually these optimized parameter is in application or game engine. I added these parameter just in order to forking VK_EXT_mesh_shader. Of course MAX_PREFERRED_PAYLOAD_SIZE_EXT can be added if required, but VK_EXT_mesh_shader does not have this parameter which should make GL-VK translation layer always return max payload size for this.

I don't think that a MAX_PREFERRED_PAYLOAD_SIZE_EXT makes much sense here. All GPU manufacturers suggest to use as little payload as possible, so the "preferred" amount would be zero.

I also don't think it's a good idea to deviate from Vulkan's VkPhysicalDeviceMeshShaderPropertiesEXT. We don't want to reinvent the wheel here.

e.g. on nvidia this is 128 bytes for the fast path if remembering correctly

That sounds like an implementation detail for NVidia and even they didn't feel like it's worth including that in the NV extension.

extensions/EXT/EXT_mesh_shader.txt

MCRcortex · 2025-07-07T10:04:55Z

Are there any guarantees about how/where the mesh shader workgroups are launched? Know it would be hardware dependent but as an example, launching many mesh tasks from a one task shader and few meshes from other task shaders perform significantly worse than a roughly even distribution of mesh tasks?
Do have a project would be interested to implement a renderer using this extension, main thing is translucency ordering think?

Are there also any guarantees about how work is dispatched from the mesh shader to the remaining raster pipeline? That is, if a single mesh shader workgroup from a set of mesh shader workgroups (dispatched from a task shader) takes significantly longer to complete, would this block the other workgroups from dispatching work to the raster pipeline? (asking due to possibility of doing compute raster in the mesh shader while still dispatching larger tris to the hw raster pipeline, which would also enable higher gpu sillicon utilization and hopefully maximize full throughput)
Guess tho this is also completely implmentation dependent and

Venemo · 2025-07-07T14:51:43Z

WG has some review comments pending. Also will wait to ship until nvidium is at least semi-working with this (pending) to ensure things are usable as expected.

I think nvdium only works on nvidia cards? Switching from nv's mesh shader implmentation to this is enough to make it run on amd cards?

@Headcrabed As far as I see the author of nvidium @MCRcortex is here with us, so I will interpret his presence as being interested in porting nvidium to use the EXT mesh shaders instead of the NV extension. Assuming no other NVidia specifics are used by nvidium, it should then work on other GPUs too.

I'm not that familiar with radeonsi internals, so maybe I am missing something obvious like a GFX10+ feature used in the code, but I am curious: can you shed some info on what is preventing this from being used on GFX9 as well (code has a requirement of >=GFX10_3)?

@Ristovski Previous GPUs don't have the hardware capability to implement mesh shaders. Most notably, only GFX10.3 and newer support per-primitive outputs.

Venemo · 2025-07-07T15:07:04Z

Are there any guarantees about how/where the mesh shader workgroups are launched?

@MCRcortex Not sure if this thread is the right one to discuss GPU specific implementation details, but I'm happy to answer your questions about how it works at least on AMD HW. I reached out to you on your Discord.

Venemo · 2025-07-09T15:26:45Z

With regards to rasterization order: it seems that both AMD and NVidia do guarantee the "strict" rasterization order in their currently released GPUs. I haven't got any info about other hardware vendors yet.

Considering that the Vulkan spec is different from the proposal as well as different from the D3D12 spec, I asked for clarification on the Vulkan spec to make sure whether the current spec is what was intended. For consistency between OpenGL and Vulkan, I suggest to wait until that is resolved before we move forward here.

MCRcortex · 2025-07-13T11:24:42Z

Am doing some work on mesh shaders now and had some other questions, would it be possible to get a gl_DrawID as described in ARB_shader_draw_parameters for multidraw commands? it's possible to get the "inner id" (semi equivalent to gl_InstanceID) with gl_WorkGroupID, NV_mesh_shader has a somewhat equivalence of this by having a 'first' argument in the draw commands
quote
The x component of gl_WorkGroupID of the first active stage will be within the range of [<first> , <first + count - 1>]

am doubtful of being able to get a first argument but would it be possible to have gl_DrawID work as a replacement (as is already described in ARB_shader_draw_parameters)?

yuq · 2025-07-14T02:40:22Z

would it be possible to get a gl_DrawID as described in ARB_shader_draw_parameters for multidraw commands?

Yes. GLSL spec described it: https://github.com/KhronosGroup/GLSL/blob/main/extensions/ext/GLSL_EXT_mesh_shader.txt#L964

MCRcortex · 2025-07-14T02:54:13Z

Oh awsome, ty, missed that in the spec

MCRcortex · 2025-07-14T02:58:01Z

Have a project that should be more easily to test this extension with (unfortunately am missing hardware todo so (dont have any amd hardware))

zmike · 2025-09-02T15:37:50Z

How does this interact with ARB_separate_shader_objects? There are no explicit notes in the spec here for it.

yuq · 2025-09-03T03:15:40Z

How does this interact with ARB_separate_shader_objects? There are no explicit notes in the spec here for it.

ARB_separate_shader_objects is added to OpenGL 4.1, this extension is written on OpenGL 4.6 spec, so no explicit notes on ARB_separate_shader_objects interaction. There's some notes to add mesh shader stages for ARB_separate_shader_objects introduced spec words:

OpenGL-Registry/extensions/EXT/EXT_mesh_shader.txt

Line 196 in 95b430e

Accepted by the <stages> parameter of UseProgramStages:
OpenGL-Registry/extensions/EXT/EXT_mesh_shader.txt

Line 375 in 95b430e

Modify Section 7.4, Program Pipeline Objects, p. 122
OpenGL-Registry/extensions/EXT/EXT_mesh_shader.txt

Line 390 in 95b430e

(modify the first error in "Errors" for UseProgramStages, p. 124 to allow
OpenGL-Registry/extensions/EXT/EXT_mesh_shader.txt

Line 390 in 95b430e

(modify the first error in "Errors" for UseProgramStages, p. 124 to allow

zmike · 2025-09-03T03:33:17Z

Ah, thanks, I missed that.

MCRcortex · 2025-09-16T10:25:16Z

Talked a bit about this with @zmike, however it should also be raised here, how does mesh shaders interact with GL_RASTERIZER_DISCARD in my opinion it should be ignored as it does not make much sense to have for mesh shaders, however do know there is a spec clarrification issue for this for the vulkan extension. It might be good to either wait on the clarification or make a decision so later revisions to the spec arnt needed (for this issue)

Venemo · 2025-09-16T13:33:42Z

how does mesh shaders interact with GL_RASTERIZER_DISCARD in my opinion it should be ignored as it does not make much sense to have for mesh shaders

The way I understand it, it should work the same as for any other pre-rasterization stage. Why should it be ignored?

EDIT: I'd like it to be consistent with Vulkan. Does Vulkan ignore it?

MCRcortex · 2025-09-16T13:52:41Z

its not defined in the vulkan spec from what do understand (hence zmike requesting clarification).
personally imo it should be ignored as it should be equivalient to mesh shaders emitting size of zero. if rasterization discard is not included it could be an optimization possibility for drivers.
transformer feedback is not applied either, however reading the specification again it does suggest that rasterization discard is applied

After programmable mesh processing, the same fixed-function operations are
applied to vertices of the resulting primitives as above, except the
transform feedback (see section 13.3), primitive queries (see section 13.4)
and transform feedback overflow queries (see section 13.5) are replaced by
mesh primitive queries (see section 13.Y).

furthurmore, if rasterization discard is enabled a fragment shader is not strictly required to even be attached (from what have read, this is probably incorrect however)

pdaniell-nv · 2025-10-01T15:11:41Z

With Vulkan the rasterization discard state still applies when drawing with mesh shaders.

This is an OpenGL extension forking VK_EXT_mesh_shader to provide OpenGL mesh shader functionality.

Required by NVIDIA.

zmike

@oddhack please merge

yuq marked this pull request as draft December 16, 2024 02:58

yuq force-pushed the topic/mesh-shader branch from f7c5c61 to ab5160e Compare January 2, 2025 07:02

oddhack added a commit that referenced this pull request Jan 8, 2025

Reserve enums for GL_EXT_mesh_shader (#640)

d8fdb8d

yuq force-pushed the topic/mesh-shader branch from ab5160e to d5abc46 Compare January 13, 2025 09:02

rdb mentioned this pull request Feb 5, 2025

Mesh Shader Support panda3d/panda3d#1565

Open

yuq force-pushed the topic/mesh-shader branch from fc7aafc to fbd56ad Compare February 10, 2025 07:14

RogueLogix mentioned this pull request Jun 21, 2025

[1.21.6][B3D][Breaking] Extend enums to cover most or all GL values neoforged/NeoForge#2359

Draft

yuq force-pushed the topic/mesh-shader branch from 779d416 to 7a22eab Compare July 1, 2025 06:21

yuq changed the title ~~Draft: Add GL_EXT_mesh_shader~~ Add GL_EXT_mesh_shader Jul 2, 2025

yuq marked this pull request as ready for review July 2, 2025 09:09

zmike reviewed Jul 2, 2025

View reviewed changes

extensions/EXT/EXT_mesh_shader.txt Outdated Show resolved Hide resolved

rlocatti-nv reviewed Jul 3, 2025

View reviewed changes

xml/gl.xml Show resolved Hide resolved

MCRcortex reviewed Jul 4, 2025

View reviewed changes

yuq force-pushed the topic/mesh-shader branch from 7a22eab to ce7a626 Compare July 7, 2025 09:25

yuq force-pushed the topic/mesh-shader branch from ce7a626 to 76d91ba Compare July 9, 2025 08:28

yuq force-pushed the topic/mesh-shader branch from 76d91ba to b161756 Compare July 11, 2025 05:48

BoyBaykiller mentioned this pull request Sep 2, 2025

Align meshlet indices for each meshlet to start on 4b boundary BoyBaykiller/IDKEngine#9

Merged

yuq force-pushed the topic/mesh-shader branch from b161756 to 44c5648 Compare September 16, 2025 07:41

yuq added 4 commits October 9, 2025 10:38

Add GL_EXT_mesh_shader

d017137

This is an OpenGL extension forking VK_EXT_mesh_shader to provide OpenGL mesh shader functionality.

EXT_mesh_shader: allocate enum number

4c02f92

EXT_mesh_shader: add xml registry

97af609

EXT_mesh_shader: not alias some enum values with NV_mesh_shader

0a19fa1

Required by NVIDIA.

yuq force-pushed the topic/mesh-shader branch from 44c5648 to 0a19fa1 Compare October 9, 2025 02:39

zmike approved these changes Oct 9, 2025

View reviewed changes

zmike assigned oddhack Oct 9, 2025

zmike added this to the Approved to Merge milestone Oct 9, 2025

oddhack merged commit 3839177 into KhronosGroup:main Oct 9, 2025


		* MESH_PREFERS_COMPACT_PRIMITIVE_OUTPUT_EXT, TRUE if the implementation
		will perform best if there are no unused primitives in the output array.

Add GL_EXT_mesh_shader #640

Add GL_EXT_mesh_shader #640

Uh oh!

Conversation

yuq commented Dec 16, 2024

Uh oh!

oddhack commented Dec 16, 2024

Uh oh!

yuq commented Dec 16, 2024

Uh oh!

Venemo commented Jan 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zmike commented Jan 7, 2025

Uh oh!

yuq commented Jan 8, 2025

Uh oh!

zmike commented Jan 8, 2025

Uh oh!

oddhack commented Jan 8, 2025

Uh oh!

yuq commented Jan 8, 2025

Uh oh!

oddhack commented Jan 8, 2025

Uh oh!

yuq commented Jan 8, 2025

Uh oh!

Leadwerks commented May 3, 2025

Uh oh!

yuq commented Jul 2, 2025

Uh oh!

zmike commented Jul 2, 2025

Uh oh!

Uh oh!

Headcrabed commented Jul 2, 2025

Uh oh!

Uh oh!

MCRcortex left a comment

Choose a reason for hiding this comment

Uh oh!

MCRcortex Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

yuq Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Venemo Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MCRcortex commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Venemo commented Jul 7, 2025

Uh oh!

Venemo commented Jul 7, 2025

Uh oh!

Venemo commented Jul 9, 2025

Uh oh!

MCRcortex commented Jul 13, 2025

Uh oh!

yuq commented Jul 14, 2025

Uh oh!

MCRcortex commented Jul 14, 2025

Uh oh!

MCRcortex commented Jul 14, 2025

Uh oh!

zmike commented Sep 2, 2025

Uh oh!

yuq commented Sep 3, 2025

Uh oh!

zmike commented Sep 3, 2025

Uh oh!

MCRcortex commented Sep 16, 2025

Uh oh!

Venemo commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Venemo commented Jan 6, 2025 •

edited

Loading

MCRcortex commented Jul 7, 2025 •

edited

Loading

Venemo commented Sep 16, 2025 •

edited

Loading

MCRcortex commented Sep 16, 2025 •

edited

Loading