-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Background and Motivation
Currently there is no way to get view on string property value of JSON without allocating string, except cases when string property is in fact number, DateTime or anything that System.Buffers.Text.Utf8Parser supports.
But many converters even inside System.Text.Json need string representation of string property only to parse it and don't use anywhere further, for example, such converters are:
VersionConverterwhich uses allocated string fromGetString()method only to pass it toTryParsemethod which acceptsReadOnlySpan<char>in one of overloads;CharConverterwhich allocates string only to get firstchar;EnumConverter, which uses allocated string fromGetString()method only to pass it toTryParsemethod which hasReadOnlySpan<char>overload as of Add overloads for Enum.Parse/TryParse with ReadOnlySpan<char> #43255.
Non-internally non-allocating view on string properties can be used for creating custom StringConverter which will be using custom StringPool for example, which will operate on small set of strings but not known at compile time.
My proposal is to add methods to Utf8JsonReader which will accept buffer of chars where value of string property will be written to.
Proposed API
namespace System.Text.Json
{
public ref partial struct Utf8JsonReader
{
/* Existing APIs */
public ReadOnlySpan<byte> ValueSpan { get; }
public ReadOnlySequence<byte> ValueSequence { get; }
public bool ValueIsEscaped { get; } // Whether the JSON string contains escaped characters
public bool HasValueSequence { get; } // The string can either be stored in a span or a ReadOnlySequence
public string? GetString(); // How we currently decode JSON strings
/* Proposed new APIs */
public void GetString(scoped Span<byte> utf8Destination, out int bytesWritten);
public void GetString(scoped Span<char> destination, out int charsWritten);
}
public partial class JsonEncodedText
{
public static void Unescape(ReadOnlySpan<byte> utf8Value, Span<byte> utf8Destination, out int bytesWritten);
}
}Usage Examples
Get an allocation-free view of the unescaped UTF8 string
Span<byte> buffer = stackalloc byte[SomeUpperBound];
reader.GetString(buffer, out int bytesWritten); // handles both ValueSpan and ValueSequence representations,
// throws if source buffer length exceeds that of the target buffer.
ReadOnlySpan<byte> unescapedUtf8Value = buffer.Slice(0, bytesWritten);Handling of ValueSpan representations only:
Debug.Assert(!reader.HasValueSequence);
ReadOnlySpan<byte> unescapedBuffer = stackalloc byte[0];
if (reader.ValueIsEscaped)
{
Span<byte> buffer = stackalloc byte[SomeUpperBound];
JsonEncodedText.Unescape(reader.ValueSpan, buffer, out int bytesWritten);
unescapedBuffer = intermediate.Slice(0, bytesWritten);
}
else
{
// avoid copying to an intermediate buffer if escaping is not needed
unescapedBuffer = reader.ValueSpan;
}Copying to char buffers
char[] buffer = ArrayPool<char>.Rent(maxLength);
// buffer length needs to be at least as long as reader.ValueSpan/ValueSequence to succeed
reader.GetString(buffer, out int charsWritten);
// consume & return the buffer as usualAlternative Designs
Can't think of any.
Risks
Name GetChars can be confusing for some users, maybe there can be other, better fit for such method?
Notes
What should happen in case when provided buffer is not of sufficient length? Should exception be thrown or buffer should be written to max, and when its capacity is full method should return?