Fix NRE with UnicodeEncoding when target is an empty span#97950
Fix NRE with UnicodeEncoding when target is an empty span#97950tarekgh merged 6 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-text-encoding Issue DetailsFixes #89931
|
|
|
||
| // Valid surrogate pair, add our lastChar (will need 2 chars) | ||
| if (chars >= charEnd - 1) | ||
| if (chars >= charEnd - 1 || chars == charEnd) |
There was a problem hiding this comment.
I am uncertain about the rationale behind this modification. when chars == charEnd, this implies that the condition chars >= charEnd - 1 remains true. The added part is not changing anything. am I missing something?
There was a problem hiding this comment.
charEnd can be null. charEnd - 1 will underflow in that case, and the first condition won't be true.
There was a problem hiding this comment.
If this is the case, should we have a simple check in the top of the method to check that and just throw if we have any bytes to decode?
There was a problem hiding this comment.
I am seeing charEnd - chars is used in some other place in the method too.
There was a problem hiding this comment.
If this is the case, should we have a simple check in the top of the method to check that and just throw if we have any bytes to decode?
It would not be correct, and it would be a breaking change. Just because we have some input bytes does not mean that we end up outputting any chars.
There was a problem hiding this comment.
The code looks clear but it is tricky if anyone change it in the future and not aware about the case char* chars = (char*)1; or possible underflow.
There was a problem hiding this comment.
Ok, so what would you like to comment to say? "If you are changing this code, be careful about integer overflows and underflows."?
FWIW, this is a general concern with any unmanaged pointer arithmetic.
There was a problem hiding this comment.
If it is not worth the comment, that is ok. It was not obvious to me we can get char* is null or 1 value as I am not expecting this to happen in this code base. The comment in my mind is something like Exercise caution when manipulating pointers to prevent potential underflow issues. In certain situations, the char* can be equal to 1 and may also have a length of zero.
I'll leave it to you to decide. I am ok either way.
There was a problem hiding this comment.
I agree. This is a generic concern in unsafe world. We should not repeat it on each occurrence.
src/libraries/System.Private.CoreLib/src/System/Text/UnicodeEncoding.cs
Outdated
Show resolved
Hide resolved
|
I have added the same fix for |
| } | ||
|
|
||
| [Fact] | ||
| public void GetDecoder_NegativeTests() |
There was a problem hiding this comment.
These negative tests should be added to NegativeEncodingTests Encoder_Convert_Invalid instead. NegativeEncodingTests theories will run it over all encodings.
There was a problem hiding this comment.
Test moved to the NegativeEncodingTests.Decoder_Convert_Invalid function.
There was a problem hiding this comment.
The test was failing for the UTF7 encoding, so I have fixed the corresponding code.
| _bytes -= numBytes; // Didn't encode these bytes | ||
| _enc.ThrowCharsOverflow(_decoder, _bytes <= _byteStart); // Throw? | ||
| _enc.ThrowCharsOverflow(_decoder, _chars == _charStart); // Throw? | ||
| return false; // No throw, but no store either |
There was a problem hiding this comment.
We have the same pattern here:
Does it have the same problem - are we missing coverage for this path?
There was a problem hiding this comment.
Yes, we have the same issue here. There is no test coverage for this path, but, despite multiple attempts, I do not manage to make the test failing before applying the fix :/
These encodings are quite complex and unknown to me. I am not sure to be able to prove the fix is required but I would recommend to keep the EncodingCharBuffers implementations aligned.
|
@tarekgh Could you re-review when you have a chance to ensure your feedback has been addressed and this is ready for merge? |
Fixes #89931