Fix NRE with UnicodeEncoding when target is an empty span by manandre · Pull Request #97950 · dotnet/runtime

manandre · 2024-02-04T20:24:52Z

ghost · 2024-02-04T20:25:02Z

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #89931

Author:	manandre
Assignees:	-
Labels:	`area-System.Text.Encoding`
Milestone:	-

tarekgh · 2024-02-04T20:55:36Z

src/libraries/System.Private.CoreLib/src/System/Text/UnicodeEncoding.cs


                    // Valid surrogate pair, add our lastChar (will need 2 chars)
-                    if (chars >= charEnd - 1)
+                    if (chars >= charEnd - 1 || chars == charEnd)


I am uncertain about the rationale behind this modification. when chars == charEnd, this implies that the condition chars >= charEnd - 1 remains true. The added part is not changing anything. am I missing something?

charEnd can be null. charEnd - 1 will underflow in that case, and the first condition won't be true.

If this is the case, should we have a simple check in the top of the method to check that and just throw if we have any bytes to decode?

I am seeing charEnd - chars is used in some other place in the method too.

If this is the case, should we have a simple check in the top of the method to check that and just throw if we have any bytes to decode?

It would not be correct, and it would be a breaking change. Just because we have some input bytes does not mean that we end up outputting any chars.

The code looks clear but it is tricky if anyone change it in the future and not aware about the case char* chars = (char*)1; or possible underflow.

Ok, so what would you like to comment to say? "If you are changing this code, be careful about integer overflows and underflows."?

FWIW, this is a general concern with any unmanaged pointer arithmetic.

If it is not worth the comment, that is ok. It was not obvious to me we can get char* is null or 1 value as I am not expecting this to happen in this code base. The comment in my mind is something like Exercise caution when manipulating pointers to prevent potential underflow issues. In certain situations, the char* can be equal to 1 and may also have a length of zero.

I'll leave it to you to decide. I am ok either way.

@manandre Do you have a preference?

I agree. This is a generic concern in unsafe world. We should not repeat it on each occurrence.

src/libraries/System.Private.CoreLib/src/System/Text/UnicodeEncoding.cs

manandre · 2024-02-04T23:39:47Z

I have added the same fix for UTF32Encoding.

jkotas · 2024-02-05T19:41:37Z

...System.Runtime/tests/System.Text.Encoding.Tests/UnicodeEncoding/UnicodeEncodingGetDecoder.cs

        }
+
+        [Fact]
+        public void GetDecoder_NegativeTests()


These negative tests should be added to NegativeEncodingTests Encoder_Convert_Invalid instead. NegativeEncodingTests theories will run it over all encodings.

Test moved to the NegativeEncodingTests.Decoder_Convert_Invalid function.

The test was failing for the UTF7 encoding, so I have fixed the corresponding code.

src/libraries/System.Private.CoreLib/src/System/Text/Encoding.cs

jkotas · 2024-02-06T07:48:22Z

src/libraries/System.Private.CoreLib/src/System/Text/Encoding.cs

                        _bytes -= numBytes;                                        // Didn't encode these bytes
-                        _enc.ThrowCharsOverflow(_decoder, _bytes <= _byteStart);    // Throw?
+                        _enc.ThrowCharsOverflow(_decoder, _chars == _charStart);    // Throw?
                        return false;                                           // No throw, but no store either


We have the same pattern here:

runtime/src/libraries/System.Text.Encoding.CodePages/src/System/Text/EncodingCharBuffer.cs

Lines 52 to 56 in 89bba37

if (_chars >= _charEnd)

{

// Throw maybe

_bytes -= numBytes; // Didn't encode these bytes

_enc.ThrowCharsOverflow(_decoder, _bytes <= _byteStart); // Throw?

Does it have the same problem - are we missing coverage for this path?

Yes, we have the same issue here. There is no test coverage for this path, but, despite multiple attempts, I do not manage to make the test failing before applying the fix :/
These encodings are quite complex and unknown to me. I am not sure to be able to prove the fix is required but I would recommend to keep the EncodingCharBuffers implementations aligned.

jeffhandley · 2024-03-08T10:48:29Z

@tarekgh Could you re-review when you have a chance to ensure your feedback has been addressed and this is ready for merge?

Fix NRE with UnicodeEncoding when target is an empty span

fd403d9

ghost added area-System.Text.Encoding community-contribution Indicates that the PR has been added by a community member labels Feb 4, 2024

tarekgh reviewed Feb 4, 2024

View reviewed changes

jkotas reviewed Feb 4, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Text/UnicodeEncoding.cs Outdated Show resolved Hide resolved

manandre added 2 commits February 5, 2024 00:09

Faster check

1f74f4b

Same fix for UTF32Encoding

9cf2842

jkotas reviewed Feb 5, 2024

View reviewed changes

Move test to NegativeEncodingTests

f520b4b

build-analysis bot mentioned this pull request Feb 6, 2024

Checkout failure: "Git fetch failed with exit code 128" dotnet/arcade#9009

Open

2 tasks

jkotas reviewed Feb 6, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Text/Encoding.cs Show resolved Hide resolved

jkotas reviewed Feb 6, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Text/Encoding.cs Show resolved Hide resolved

jkotas reviewed Feb 6, 2024

View reviewed changes

Apply same fixes in System.Text.Encoding.Codepages

2c6b547

jkotas approved these changes Feb 14, 2024

View reviewed changes

Merge branch 'main' into fix-unicode-nre

26e8a4c

stephentoub approved these changes Mar 7, 2024

View reviewed changes

jeffhandley assigned tarekgh Mar 8, 2024

tarekgh approved these changes Mar 8, 2024

View reviewed changes

tarekgh merged commit 861164c into dotnet:main Mar 8, 2024

manandre deleted the fix-unicode-nre branch March 8, 2024 19:35

github-actions bot locked and limited conversation to collaborators Apr 9, 2024

	if (_chars >= _charEnd)
	{
	// Throw maybe
	_bytes -= numBytes; // Didn't encode these bytes
	_enc.ThrowCharsOverflow(_decoder, _bytes <= _byteStart); // Throw?

Conversation

manandre commented Feb 4, 2024

Uh oh!

ghost commented Feb 4, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tarekgh Feb 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

manandre commented Feb 4, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeffhandley commented Mar 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tarekgh Feb 4, 2024 •

edited

Loading