Skip to content

Conversation

@cgutman
Copy link
Collaborator

@cgutman cgutman commented Dec 30, 2022

Description

The first commit in this PR implements support for detection of DXGI format at the time AcquireNextFrame() is called rather than during display::init(). This required reworking a lot of logic around the initialization of img_t objects to separate format-dependent and format-independent attributes and handle creating dummy images where we don't know the capture format yet. I also did a little bit of P010 format prep work while I was in there.

The second commit is a cherry-pick of @psyke83 's IDXGIOutput5 implementation which now works properly with the dynamic capture format support implemented in the first commit.

The third commit implements support for the display_ram_t and display_vram_t backends to influence the selected capture formats. display_vram_t is pretty flexible (it even works with DXGI_FORMAT_R16G16B16A16_FLOAT), but display_ram_t has some specific pixel format requirements due to the code that handles blending the cursor into the final image. In this commit, I also opted display_vram_t into DXGI_FORMAT_R8G8B8A8_UNORM support to hopefully improve capture performance for Vulkan and full-screen DX11 apps.

Note for future testing: It is possible to request a specific format for testing by returning a single entry from get_supported_sdr_capture_formats(). This allows testing things like DXGI_FORMAT_R16G16B16A16_FLOAT even without HDR enabled. DWM will convert the desktop surface to the requested format.

@psyke83 : I'd appreciate your review and testing before merging if you have time, since you did a lot of this initial work on IDXGIOutput5.

Screenshot

HEVC Main 10 works with NVENC (though video is still SDR):
image

Issues Fixed or Closed

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Dependency update (updates to dependencies)
  • Documentation update (changes to documentation)
  • Repository update (changes to repository files, e.g. .github/...)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated the in code docstring/documentation-blocks for new or existing methods/components

Branch Updates

LizardByte requires that branches be up-to-date before merging. This means that after any PR is merged, this branch
must be updated before it can be merged. You must also
Allow edits from maintainers.

  • I want maintainers to keep my branch updated


// Now that we know the capture format, we can finish creating the image
if(complete_img(img, false)) {
return capture_e::error;
Copy link
Contributor

@psyke83 psyke83 Dec 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem with hardware capture on my RX 6600. Using Gravity Mark in Vulkan mode, clicking "Demo" sometimes (but not always) triggers the stream to end with:

[2022:12:30:11:12:38]: Error: display_vram_t::complete_img() called with unknown capture format!

The failure happen when snapshot() is called with format set to DXGI_FORMAT_UNKNOWN, but the format doesn't get updated due to frame_update_flag being false (which seems to trigger easily with Gravity Mark's Initializing screen presumably due it being low framerate with a static background).

The issue can be resolved by moving the complete_img() check into the frame_update_flag condition directly above OR changing the error condition to a timeout or OK.

Any of those changes avoids the error, but it brings back a bug that existed before (and was fixed by) your earlier format selection fix. In Gravity Mark, if I move the mouse on the "Initializing" screen, the mouse pointer will have a ghosting effect, perhaps suggesting a format mismatch for the pointer. I don't know precisely how to fix this properly, but if you change the check here to if (update_flag) { instead of if (frame_update_flag) {, it fixes the pointer ghosting.

Edit: to clarify, the ghosting effect occurs only on the "Initializing" screen, and not when the demo itself start running. I also saw the ghosting issue sometimes when entering fullscreen in games, but it didn't happen consistently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know precisely how to fix this properly, but if you change the check here to if (update_flag) { instead of if (frame_update_flag) {, it fixes the pointer ghosting.

This was actually the correct fix. We've been breaking the rules for lifetime of the desktop image we get from the Desktop Duplication API. I'm not sure why this change specifically exposed it (or why my original format change "fixed" it).

The ReleaseFrame() docs say:

After the frame is released, the surface that contains the desktop bitmap becomes invalid; you will not be able to use the surface in a DirectX graphics operation.

Yet we're holding on to that ID3D11Texture2D to copy into our buffer when AcquireNextFrame() says the desktop image wasn't updated. We depend on the CopyResource() call to initialize the destination texture with an updated image before we draw the mouse on top. However, if the old source texture is invalid (which would be the case for a mouse-only update), that copy silently fails (leaving the old contents of the image intact, including the old mouse cursor drawn into it) and we draw the mouse again at the new location.

I pushed a commit to always use the surface we get back from AcquireNextFrame() and never hold it beyond the snapshot() call.

@cgutman
Copy link
Collaborator Author

cgutman commented Dec 30, 2022

@psyke83 I wasn't able to reproduce that issue locally, but I think I understand the cause. Can you try the latest commit?

@psyke83
Copy link
Contributor

psyke83 commented Dec 30, 2022

This is better but issues are still not fully corrected.

With the hw encoder: stream no longer closes, but the cursor ghosting issue remains present.
With the sw encoder: launching Gravity Mark doesn't close the stream, but the DXGI_FORMAT_R8G8B8A8_UNORM mode shows as a black screen (but cursor is visible).

Log shows no errors during black screen:

[2022:12:30:18:06:04]: Info: Desktop resolution [1600x900]
[2022:12:30:18:06:04]: Info: Desktop format [DXGI_FORMAT_R8G8B8A8_UNORM]
[2022:12:30:18:06:04]: Info: Capture format [DXGI_FORMAT_R8G8B8A8_UNORM]
[2022:12:30:18:06:04]: Info: Color coding [Rec. 601]
[2022:12:30:18:06:04]: Info: Color range: [MPEG]

I won't be able to troubleshoot further for a few more hours.

@cgutman
Copy link
Collaborator Author

cgutman commented Dec 30, 2022

Thanks for testing.

Can you post a screenshot of the cursor ghosting issue? I'm not sure that I understand exactly what you mean by that. Is it like pointer trails are enabled and never go away, or something else?

@psyke83
Copy link
Contributor

psyke83 commented Dec 30, 2022

This is the cursor issue (which happens only on the hardware encoder):

Screencast.from.2022-12-30.19-59-21.webm

The cursor issue can't be reproduced in GravityMark without this PR. This is the log without this PR applied:

[2022:12:30:20:05:28]: Info: Desktop resolution [1600x900]
[2022:12:30:20:05:28]: Info: Desktop format [DXGI_FORMAT_R8G8B8A8_UNORM]
[2022:12:30:20:05:28]: Info: Capture format [DXGI_FORMAT_B8G8R8A8_UNORM]
[2022:12:30:20:05:28]: Info: Color coding [Rec. 601]
[2022:12:30:20:05:28]: Info: Color range: [MPEG]

The new log shows the capture format also as DXGI_FORMAT_R8G8B8A8_UNORM; could the problem have something to do with the texture format for the cursor always being set to DXGI_FORMAT_B8G8R8A8_UNORM?

With the D3D11 debug runtime loaded, the logs are full of:
D3D11 ERROR: ID3D11DeviceContext::CopyResource: Cannot invoke CopyResource while a Subresource of the destination is mapped.Destination subresource (0) is mapped. [ RESOURCE_MANIPULATION ERROR LizardByte#285: COPYRESOURCE_INVALIDDESTINATIONSTATE]
D3D11 ERROR: ID3D11DeviceContext::CopyResource: Cannot invoke CopyResource while a Subresource of the source is mapped.Source subresource (0) is mapped. [ RESOURCE_MANIPULATION ERROR LizardByte#286: COPYRESOURCE_INVALIDSOURCESTATE]
…SetRenderTargets and PSSetShaderResources

The img texture is already bound as an input parameter from hwdevice_t::convert(), so it triggers a debug runtime warning to bind it again as a render target
@cgutman
Copy link
Collaborator Author

cgutman commented Dec 31, 2022

@psyke83 ok, this PR is ready for one more test. I enabled the D3D11 debug runtime (D3D11_CREATE_DEVICE_DEBUG) and found a ton of bugs in both display_vram_t and display_ram_t. The most severe were causing images to be encoded with missing or outdated desktop contents (which was causing the mouse ghosting issue).

There also turned out to be a case where AcquireNextFrame() could change output formats between calls on the same IDXGIOutputDuplication object, which I had previously thought to be impossible (because mode changes should cause the IDXGIOutputDuplication object to be lost). We handle that now by reinitializing capture.

There is one case that is still broken on my Intel system with software encoding where DuplicateOutput1() and DuplicateOutput() are failing when using Gravity Mark in OpenGL mode. The D3D11 debug runtime again catches the bug, but it's actually inside DuplicateOutput() where the erroneous CreateTexture2D() call (with DXGI_FORMAT_UNKNOWN) is happening. I don't see how we could be causing this, so it's likely a GPU driver or OS bug. It also works fine on my Nvidia system.

D3D11 ERROR: ID3D11Device::CreateTexture2D: Invalid format.  The Format(0, UNKNOWN) is not supported as a D3D11_RESOURCE_MISC_SHARED_NTHANDLE format. [ STATE_CREATION ERROR #92: CREATETEXTURE2D_UNSUPPORTEDFORMAT]

0:020> k
 # Child-SP          RetAddr               Call Site
00 000000a1`3d5fbcd8 00007ff9`42262f54     KERNELBASE!OutputDebugStringA
01 000000a1`3d5fbce0 00007ff9`20d278ee     DXGIDebug!CInfoQueue::AddMessage+0xa24
02 000000a1`3d5fca30 00007ff9`20c9beba     D3D11_3SDKLayers!CInfoQueue::AddMessage+0x12e
03 000000a1`3d5fcab0 00007ff9`20d09d4d     D3D11_3SDKLayers!NDebug::CInterfaceSentinel::CFunctionSentinel::ReportMessage+0x10e
04 000000a1`3d5feb30 00007ff9`20cc6db5     D3D11_3SDKLayers!CCreateTexture2DValidator::Validate+0x1549
05 000000a1`3d5fef00 00007ff9`20cc6a36     D3D11_3SDKLayers!NDebug::CDevice::CreateTexture2D1PreValidation+0x365
06 000000a1`3d5fefb0 00007ff9`20cc7119     D3D11_3SDKLayers!NDebug::CDevice::CreateTexture2DPreValidation+0x7e
07 000000a1`3d5ff030 00007ff9`743bf43a     D3D11_3SDKLayers!NDebug::CDevice::CreateTexture2D+0xc9
08 000000a1`3d5ff090 00007ff9`743c9ee8     dxgi!CDXGIOutputDuplication::Initialize+0x516
09 000000a1`3d5ff440 00007ff9`743c9c82     dxgi!CDXGIOutput::DuplicateOutputInternal+0x21c
0a 000000a1`3d5ff550 00007ff7`0dbc7b0f     dxgi!CDXGIOutput::DuplicateOutput1+0x22
0b 000000a1`3d5ff5a0 00007ff7`0dbd3549     Sunshine+0x97b0f
0c 000000a1`3d5ffae0 00007ff7`0dbc2678     Sunshine+0xa3549
0d 000000a1`3d5ffb10 00007ff7`0db93b9a     Sunshine+0x92678
0e 000000a1`3d5ffb60 00007ff7`0db94ce3     Sunshine+0x63b9a
0f 000000a1`3d5ffbf0 00007ff7`0efa51ee     Sunshine+0x64ce3
10 000000a1`3d5ffdc0 00007ff7`0eff1f9f     Sunshine+0x14751ee
11 000000a1`3d5ffe00 00007ff7`0ed9932b     Sunshine+0x14c1f9f
12 000000a1`3d5ffe40 00007ff9`7a35e634     Sunshine+0x126932b
13 000000a1`3d5ffe80 00007ff9`7a35e70c     msvcrt!_callthreadstartex+0x28
14 000000a1`3d5ffeb0 00007ff9`78f026bd     msvcrt!_threadstartex+0x7c
15 000000a1`3d5ffee0 00007ff9`7a56dfb8     KERNEL32!BaseThreadInitThunk+0x1d
16 000000a1`3d5fff10 00000000`00000000     ntdll!RtlUserThreadStart+0x28

@psyke83
Copy link
Contributor

psyke83 commented Dec 31, 2022

This looks very promising so far.

With the hardware encoder, I can confirm that the cursor ghosting issue is definitely resolved, and initial testing doesn't show any obvious issues. I think I should take some more time to test more generally, and not just focus on the Vulkan fullscreen issue.

Software encoding is definitely improved, but I think the issue may not be 100% resolved. I am not experiencing any stream closes, and GravityMark's demo now renders correctly, but it's still possible to trigger a black screen that now occurs only on GravityMark's "Initializing" screen. When connected to the host via a VNC client at the same time, I can confirm that the "Initializing" image is rendering properly while Sunshine is showing a black screen + cursor.

Here's a sample log when the black screen issue manifests: https://gist.github.com/psyke83/52f967939ab6ed3fb03999673cc228cd

It seems from the log that a capture format change to DXGI_FORMAT_B8G8R8A8_UNORM is detected, but the subsequent re-initialization still shows the old capture format (DXGI_FORMAT_R8G8B8A8_UNORM) and/or reannounces the same capture format change successively.

Could this issue be related to the software encoder using a different (async) capture method? See here and here. As a troubleshooting measure, I tried removing the PARALLEL_ENCODING flag, and I could no longer trigger the black screen during the GravityMark "Initializing" stage. Of course, removing the flag is not the solution, as the sync capture causes the software encoder to run much slower (e.g. browsing https://www.vsynctester.com/ throttles to ~35fps), but I just thought I'd bring it to your attention in case the issue is related to parallel threads running with mixed capture formats.

Edit: example video showing the black screen during "Initializing" on the software encoder:

Screencast.from.2022-12-31.11-45-35.webm

@psyke83
Copy link
Contributor

psyke83 commented Dec 31, 2022

Speaking of PARALLEL_ENCODING; I noticed since you fixed fullscreen Vulkan capture (via bb092c0), capture is slower than normal in the specific case of GravityMark. With or without this PR, capture is limited to 30fps (according to Moonlight's encoder statistics), but if I change the amdvce encoder to use the PARALLEL_ENCODING flag with this PR, the capture rate is now full speed. The Initializing screen also doesn't exhibit the black screen problem seen with software encoding (but there are graphical glitches which are especially noticeable if the "dwmflush" workaround is disabled).

Apologies if I'm polluting the PR, but it seems at least tangentially related given that these modes weren't working in the past.

The format of the texture can change until then (and even return formats not provided in the array passed to DuplicateOutput1())
@cgutman
Copy link
Collaborator Author

cgutman commented Dec 31, 2022

It seems from the log that a capture format change to DXGI_FORMAT_B8G8R8A8_UNORM is detected, but the subsequent re-initialization still shows the old capture format (DXGI_FORMAT_R8G8B8A8_UNORM) and/or reannounces the same capture format change successively.

It might be a bug with DuplicateOutput1() because DXGI_FORMAT_R8G8B8A8_UNORM isn't even a format we told it we could accept, but I've fixed it by encoding dummy images until we get a real frame (LastPresentTime != 0).

Speaking of PARALLEL_ENCODING; I noticed since you fixed fullscreen Vulkan capture (via bb092c0), capture is slower than normal in the specific case of GravityMark. With or without this PR, capture is limited to 30fps (according to Moonlight's encoder statistics), but if I change the amdvce encoder to use the PARALLEL_ENCODING flag with this PR, the capture rate is now full speed. The Initializing screen also doesn't exhibit the black screen problem seen with software encoding (but there are graphical glitches which are especially noticeable if the "dwmflush" workaround is disabled).

Yeah, the capture and encode threads race with each other using the ID3D11DeviecContext when using PARALLEL_ENCODING with display_vram_t. The debug runtime catches that issue:

D3D11 CORRUPTION: ID3D11DeviceContext::CopyResource: Two threads were found to be executing functions associated with the same Device[Context] at the same time. This will cause corruption of memory. Appropriate thread synchronization needs to occur external to the Direct3D API (or through the ID3D10Multithread interface). 31160 and 26900 are the implicated thread ids. [ MISCELLANEOUS CORRUPTION #28: CORRUPTED_MULTITHREADING]

I think we can support PARALLEL_ENCODING with display_vram_t. We just need to refactor the code to support passing a lock back to dxgi_make_hwdevice_ctx() and hold that lock while touching the device context in the capture thread. FFmpeg will acquire the lock in the encoding path via the lock and unlock functions in the AVD3D11VADeviceContext. I can investigate this in a follow up PR. I don't want to make this PR even more complex than it already is.

@psyke83
Copy link
Contributor

psyke83 commented Dec 31, 2022

I didn't mean to try and expand the scope of this PR with talk of PARALLEL_ENCODING in the vram instance, but the explanation is much appreciated.

As of the latest commit you pushed, everything seems to be working perfectly with both software and amdvce encoders on my system. I see none of the previous problems when starting GravityMark via software encoding or any other unwanted side-effects.

@ReenigneArcher ReenigneArcher merged commit 0c6d0ed into LizardByte:nightly Dec 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants