Skip to content

Comments

Allow creating mappable buffer with more usages as optional features#5108

Open
Jiawei-Shao wants to merge 6 commits intogpuweb:mainfrom
Jiawei-Shao:add-buffer-map-extended-usages
Open

Allow creating mappable buffer with more usages as optional features#5108
Jiawei-Shao wants to merge 6 commits intogpuweb:mainfrom
Jiawei-Shao:add-buffer-map-extended-usages

Conversation

@Jiawei-Shao
Copy link
Contributor

@Jiawei-Shao Jiawei-Shao commented Mar 18, 2025

Detailed investigation can be found here.

Fixed: #2388

@github-actions
Copy link
Contributor

github-actions bot commented Mar 18, 2025

Previews, as seen when this build job started (44266f8):
WebGPU webgpu.idl | Explainer | Correspondence Reference
WGSL grammar.js | wgsl.lalr.txt

spec/index.bs Outdated
Changes to the returned {{ArrayBuffer}} will be stored in the buffer after
{{GPUBuffer/unmap()}} is called.

Note: Write-only mapping will never return values produced by the GPU.
Copy link

@mwyrzykowski mwyrzykowski Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this restriction necessary? Won't it result in an extra copy of the data for write only mappings on UMA systems?

I think it would be preferable to state that reading the contents of a write only mappings are undefined (or zero-ed?) in this case.

Copy link
Contributor Author

@Jiawei-Shao Jiawei-Shao Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it result in an extra copy of the data for write only mappings on UMA systems?

It is mainly because currently in Chromium a copy is always needed because we cannot access the mapped pointer from GPU resources directly in the JS side. We can discuss more about this issue in the WG meeting.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the same restriction in WebKit too, perhaps this is a non-issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be preferable to state that reading the contents of a write only mappings are undefined (or zero-ed?) in this case.

We couldn't make it fully undefined, at most we could add a third option for what gets returned. Hard to say if the performance would be any better better if we allowed it. It would require cache flushes or something to make sure you don't get unsafe undefined values.

And of course for portability reasons we really don't want anyone to rely on data getting read back on write-only mappings. But that's probably not a huge concern.

@Kangz
Copy link
Contributor

Kangz commented Mar 24, 2025

GPU Web WG 2025-03-19 Atlantic-time
  • And explainer for the proposal
  • Two special cases, contents of the mapping in certain situations, and …
  • CW: things we want from this extension? One thing, any GPU-based validation cannot be attacked by someone writing randomly in the mappable buffer still visible in the JavaScript process. From this, can we keep indirect in the list of mappable usages or not? Same thing for index, depending on how robustness works in various impls.
  • CW: specific questions? Maybe we can come up with requirements?
  • (No questions from the working group)
  • CW: things we want from this extension? First, do we want this always, and emulated on discrete GPUs?
  • KR: think shouldn't - hard/impossible to get good performance all the time for all applications.
  • AB: ARM has strong opinions. Important for the reasons listed. For us, imp't that it works on our GPUs. Can't require host cache and coherent memory. Our best practices are written down, but host memory interface is implemented by the OEM, so you rarely get that. Nice to avoid flushing/invalidation, but it's easy to do in the impl. We aren't exposing persistently mapped buffers. Still have to map/unmap so that's where you flush/invalidate.
  • CW: problem with just coherent + non-cached is: we're hoping to triply map buffers. Create shmem in JS process, send to GPU process, use API-specific method to import that memory into GPU address space. VK_external_memory_host (_fd?). Also Metal and D3D12. If we do this, JS has an ArrayBuffer that becomes uncacheable. Seems weird to expose uncacheable memory to JS. [surprising performance properties]
  • AB: that's fine. Invalidate/flush on map/unmap's all that's required?
  • CW: yes.
  • CW: more feedback for Jiawei?
  • (None in the meeting)

spec/index.bs Outdated
{{ArrayBuffer}} containing the buffer's current values. Changes to the returned
{{ArrayBuffer}} will be stored in the {{GPUBuffer}} after {{GPUBuffer/unmap()}} is called.
{{ArrayBuffer}} containing the default initialized data (zeros) or data written by the
webpage during a previous mapping.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, this is allowing either of two behaviors?

For better portability we could clear the map region every time. I wonder, would that be too expensive? (How much of the benefit would we lose from avoiding the copyB2B in today's map-write-then-copy pattern?) The region could even be cleared before mapping by the GPU (which might have higher memory bandwidth than the CPU).

Maybe if there are really cases where effectively READ|WRITE would be more efficient than WRITE we could have the browser provide a hint about which one to use.

Copy link
Contributor Author

@Jiawei-Shao Jiawei-Shao Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, this is allowing either of two behaviors?

This comes from a note in the current SPEC. Here I mean for the first time buffer.getMappedData() is called after a write-only mapping, the data in the returned array buffer should be all zeros, and since the second time the array buffer will contain the data written by the webpage during a previous mapping.

or better portability we could clear the map region every time. I wonder, would that be too expensive?

Obviously clearing the map region every time is expensive and unnecessary. With the mentioned behavior we don't need to either clear or read the data back from the GPU when non-triply mapping is used.

Maybe if there are really cases where effectively READ|WRITE would be more efficient than WRITE we could have the browser provide a hint about which one to use.

When triply mapping is supported on CPU-cached UMA (e.g. on Intel iGPUs), we can directly get the GPU data through buffer.getMappedData() without any other operations. So I add the feature "buffer-map-write-with-extended-usages-and-gpu-data" for the best performance of data uploading on this architecture.

The feature "buffer-map-write-with-extended-usages" also works for CPU-cached UMA with non-triply mapping, and it is for the best performance of data uploading on non-triply mapping on non-CPU-cached UMA and ReBAR, where only write in sequence or memcpy is much more performant compared with randomly write.

Maybe if there are really cases where effectively READ|WRITE would be more efficient than WRITE we could have the browser provide a hint about which one to use.

I feel it strange to use READ together with WRITE because MAP_READ should keep data on GPU unchanged. For such use case I decide to use "MAP_WRITE with current GPU data in the array buffer" instead.

Copy link

@mwyrzykowski mwyrzykowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, maybe name could change but fine with any reasonable name.

@Kangz
Copy link
Contributor

Kangz commented Mar 27, 2025

GPU Web WG 2025-03-25/26 Pacific-time
  • JB: TOCTOU concerns
  • JB: GPU based validation is all within a command buffer, won't be interrupted by anything else.
  • KN: we consider the possibility of a compromised JS process. If the process still has the mapping, possibility that changes could be made.
  • JB: so from the standpoint of Web IDL, we still have a strict boundary of when JS has access and when the GPU does.
  • Concerns about the implications of this.
  • KG: 1) do we need to figure this out now, for this issue?
  • KN: only affects usages for this direct mapping. Maybe don't want to do indirect and index buffers.
  • KG: seems reasonable constraint, could loosen it later
  • KN: think we should go with this restriction, for both of them
  • KG: would be nice to have a high-level comment about what this is supposed to do. Want to make sure this doesn't accidentally enable things at the high level we didn't intend to. What do we want to do with this for now?
  • JS: received many comments after posting this - thanks for them. Mainly 4 issues left before we can complete this. First, almost everyone thinks this should be an extension, not a core feature. I also prefer WebGPU to do fewer implicit things, so this should be an extension. Second, security concern. Should disallow index and indirect usages here. Third, behavior of map-write only. Map-read is clear: while immutable ArrayBuffer is still in Stage 2.7 at TC39, can't do the triple-mapping technique. Always need CPU cache for the snapshot. map-write-only doesn't need to read the data back from the GPU side. With triply mapped buffers come questions. map-write-only, still can read the data. Reading from non-HOST_CACHED memory is very slow. Writing data randomly is very slow. Non-triply implementation, have to show its behavior. Always keep the data consistent between CPU and GPU? Fourth question, based on the third - should we split this into 2 extensions? Only allow map-read (... triple-mapping on UMA only...) and map-write, can be implemented on UMA and discrete GPUs too.
  • KG: sounds like still work left to do. Let's let this wait until it complete.
  • AB: one concern about allowing triply mapped buffers on map-read-and-write, but not map-read (or map-write alone? didn't catch this) would be confusing for developers, who would want to only put the usages they want. Read+Write, if it makes them faster, then read the data even though I don't need it.
  • KN: agree this is a concern. Should avoid surprising behavior like this.

@Jiawei-Shao Jiawei-Shao changed the title Add optional feature "buffer-map-extended-usages" Allow creating mappable buffer with more usages as optional features Apr 8, 2025
@kainino0x kainino0x added the api WebGPU API label Apr 25, 2025
@Kangz Kangz added this to the Milestone 2 milestone Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api WebGPU API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot upload/download to UMA storage buffer without an unnecessary copy and unnecessary memory use

5 participants