Skip to content

Conversation

@lezzi
Copy link
Contributor

@lezzi lezzi commented Dec 30, 2020

Deserialization of dictionaries from Stream always causes keys boxing for value types because ReadStackFrame.DictionaryKey is an object. Saving key in the state (boxing) is only needed when value can't be fully read (Read call returns false).

In the best-case scenario when an entire dictionary fits in a buffer - it should completely remove unnecessary heap allocations for keys.

@lezzi lezzi force-pushed the json-dict-key-boxing branch 2 times, most recently from da5acaa to 43ef1d7 Compare December 30, 2020 14:11
@lezzi lezzi marked this pull request as ready for review December 30, 2020 16:00
@ahsonkhan
Copy link
Contributor

This is great. How did you detect it?

In the best-case scenario when an entire dictionary fits in a buffer - it should completely remove unnecessary heap allocations for keys.

Can we run some benchmarks to get a sense of how much heap allocations we save? It might help modifying and running this benchmark on a dictionary input that you expect to get performance wins on (with value type keys). Please share the benchmarkdotnet results when you have them:
https://github.com/dotnet/performance/blob/0b094042e7dc114c8cd038b1ae90dad0e5696fa2/src/benchmarks/micro/libraries/System.Text.Json/Serializer/ReadJson.cs#L57-L64

@lezzi
Copy link
Contributor Author

lezzi commented Dec 31, 2020

This is great. How did you detect it?

In the best-case scenario when an entire dictionary fits in a buffer - it should completely remove unnecessary heap allocations for keys.

Can we run some benchmarks to get a sense of how much heap allocations we save? It might help modifying and running this benchmark on a dictionary input that you expect to get performance wins on (with value type keys). Please share the benchmarkdotnet results when you have them:
https://github.com/dotnet/performance/blob/0b094042e7dc114c8cd038b1ae90dad0e5696fa2/src/benchmarks/micro/libraries/System.Text.Json/Serializer/ReadJson.cs#L57-L64

I am currently working on the high-performance dynamodb library which involves some low-level optimized JSON deserialization. I was able to detect several inefficiencies during some internal benchmark comparisons.

Here is the benchmark result for Dictionary<int, string> with 100 items. I think the most important column for this case is Allocated, where we can see that the 2KB difference disappears after the PR. It also becomes slightly faster.

BenchmarkDotNet=v0.12.1.1466-nightly, OS=Windows 10.0.19041.572 (2004/May2020Update/20H1)
Intel Core i7-7820X CPU 3.60GHz (Kaby Lake), 1 CPU, 6 logical and 3 physical cores
.NET SDK=6.0.100-alpha.1.20630.6
  [Host]     : .NET 6.0.0 (6.0.20.62903), X64 RyuJIT
  Job-NDNXAQ : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
  Job-EVXCXL : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1  
Method Job Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
DeserializeFromString Job-NDNXAQ master 19.96 μs 0.156 μs 0.138 μs 19.93 μs 19.77 μs 20.20 μs 1.00 0.00 3.2861 0.2347 - 19 KB
DeserializeFromString Job-EVXCXL PR 20.05 μs 0.304 μs 0.312 μs 19.94 μs 19.77 μs 20.84 μs 1.01 0.02 3.2727 0.1596 - 19 KB
DeserializeFromUtf8Bytes Job-NDNXAQ master 19.56 μs 0.232 μs 0.194 μs 19.54 μs 19.30 μs 19.97 μs 1.00 0.00 3.2396 0.1580 - 19 KB
DeserializeFromUtf8Bytes Job-EVXCXL PR 19.52 μs 0.138 μs 0.122 μs 19.52 μs 19.34 μs 19.78 μs 1.00 0.01 3.2233 0.1572 - 19 KB
DeserializeFromStream Job-NDNXAQ master 21.66 μs 0.187 μs 0.175 μs 21.58 μs 21.49 μs 22.17 μs 1.00 0.00 3.6509 0.2608 - 21 KB
DeserializeFromStream Job-EVXCXL PR 21.25 μs 0.201 μs 0.178 μs 21.27 μs 20.90 μs 21.57 μs 0.98 0.01 3.2669 0.1675 - 19 KB

It is also worth noting another potential optimization that I am considering to create a PR for. On the benchmark, you can see that deserialization from Stream is visibly slower than from bytes array even when entire JSON fits in a single buffer. I suspect it happens because we don't use UseFastPath for a single buffer case. But it is unrelated and probably not worth discussing in this thread.

During slow path deserialization fallback to storing dictionary key
as object only when Read returns false.
@lezzi lezzi force-pushed the json-dict-key-boxing branch from 43ef1d7 to f2fb38f Compare January 5, 2021 19:58
@stephentoub stephentoub added this to the 6.0.0 milestone Jan 15, 2021
@steveharter steveharter added the tenet-performance Performance related issue label Jan 15, 2021
Copy link
Contributor

@steveharter steveharter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ericstj
Copy link
Member

ericstj commented Feb 17, 2021

Looks like we have a couple approvals: Is this ready to merge? Might want to get fresh test results (current are from Jan 5)

Base automatically changed from master to main March 1, 2021 09:07
Copy link
Contributor

@layomia layomia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR; this good to merge. The perf results shouldn't be stale given the changes.

It is also worth noting another potential optimization that I am considering to create a PR for. On the benchmark, you can see that deserialization from Stream is visibly slower than from bytes array even when entire JSON fits in a single buffer. I suspect it happens because we don't use UseFastPath for a single buffer case. But it is unrelated and probably not worth discussing in this thread.

We'd be happy to take a PR exploring this if we can detect a single buffer case.

@layomia layomia merged commit 6919fc2 into dotnet:main Mar 3, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants