Skip to content

Conversation

@adamsitnik
Copy link
Member

For both x64 and ARM64 I am observing 10-15% regression.

x64

Details
BenchmarkDotNet=v0.13.1.1828-nightly, OS=Windows 11 (10.0.22000.795/21H2)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK=7.0.100-preview.6.22352.1
  [Host]     : .NET 7.0.0 (7.0.22.32404), X64 RyuJIT AVX2
  
Method=GetBytes
Type Job size encName Input Mean Ratio
Perf_Encoding PR 16 ascii ? 18.22 ns 1.00
Perf_Encoding main 16 ascii ? 18.20 ns 1.00
Perf_Encoding PR 16 utf-8 ? 17.81 ns 1.09
Perf_Encoding main 16 utf-8 ? 16.63 ns 1.00
Perf_Encoding PR 512 ascii ? 50.45 ns 1.12
Perf_Encoding main 512 ascii ? 44.88 ns 1.00
Perf_Encoding PR 512 utf-8 ? 74.30 ns 1.16
Perf_Encoding main 512 utf-8 ? 64.30 ns 1.00
Perf_Utf8Encoding PR ? ? EnglishAllAscii 17,846.09 ns 1.08
Perf_Utf8Encoding main ? ? EnglishAllAscii 16,454.56 ns 1.00
Perf_Utf8Encoding PR ? ? EnglishMostlyAscii 79,019.77 ns 0.97
Perf_Utf8Encoding main ? ? EnglishMostlyAscii 81,128.97 ns 1.00
Perf_Utf8Encoding PR ? ? Chinese 82,925.92 ns 0.99
Perf_Utf8Encoding main ? ? Chinese 83,539.51 ns 1.00
Perf_Utf8Encoding PR ? ? Cyrillic 97,716.52 ns 0.98
Perf_Utf8Encoding main ? ? Cyrillic 99,311.72 ns 1.00
Perf_Utf8Encoding PR ? ? Greek 151,446.91 ns 0.96
Perf_Utf8Encoding main ? ? Greek 157,171.66 ns 1.00

Here I was able to use VTune (I also tried uProf but it leaves a lot to desire):

vectors_vtune

From what I can see the regression comes from additional 3 vpand instructions:

image

It seems that the first one comes from the new VectorContainsNonAsciiChar implementation (this is expected). But the other two from Vector128.Narrow? @tannergooding is that expected?

ARM64

Details
BenchmarkDotNet=v0.13.1.1828-nightly, OS=ubuntu 20.04
Unknown processor
.NET SDK=7.0.100-rc.1.22379.1
  [Host]     : .NET 7.0.0 (7.0.22.37802), Arm64 RyuJIT AdvSIMD

Method=GetBytes
Type Job size encName Input Mean Ratio
Perf_Encoding PR 16 ascii ? 67.68 ns 0.98
Perf_Encoding main 16 ascii ? 68.75 ns 1.00
Perf_Encoding PR 16 utf-8 ? 75.34 ns 1.00
Perf_Encoding main 16 utf-8 ? 75.26 ns 1.00
Perf_Encoding PR 512 ascii ? 202.77 ns 1.12
Perf_Encoding main 512 ascii ? 180.61 ns 1.00
Perf_Encoding PR 512 utf-8 ? 269.92 ns 1.13
Perf_Encoding main 512 utf-8 ? 239.16 ns 1.00
Perf_Utf8Encoding PR ? ? EnglishAllAscii 79,699.39 ns 1.15
Perf_Utf8Encoding main ? ? EnglishAllAscii 69,283.09 ns 1.00
Perf_Utf8Encoding PR ? ? EnglishMostlyAscii 285,553.35 ns 1.00
Perf_Utf8Encoding main ? ? EnglishMostlyAscii 285,896.31 ns 1.00
Perf_Utf8Encoding PR ? ? Chinese 308,349.96 ns 1.00
Perf_Utf8Encoding main ? ? Chinese 307,750.04 ns 1.00
Perf_Utf8Encoding PR ? ? Cyrillic 217,887.57 ns 1.00
Perf_Utf8Encoding main ? ? Cyrillic 217,912.29 ns 1.00
Perf_Utf8Encoding PR ? ? Greek 312,137.16 ns 1.00
Perf_Utf8Encoding main ? ? Greek 311,067.67 ns 1.00

@ghost
Copy link

ghost commented Jul 29, 2022

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

Issue Details

For both x64 and ARM64 I am observing 10-15% regression.

x64

Details
BenchmarkDotNet=v0.13.1.1828-nightly, OS=Windows 11 (10.0.22000.795/21H2)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK=7.0.100-preview.6.22352.1
  [Host]     : .NET 7.0.0 (7.0.22.32404), X64 RyuJIT AVX2
  
Method=GetBytes
Type Job size encName Input Mean Ratio
Perf_Encoding PR 16 ascii ? 18.22 ns 1.00
Perf_Encoding main 16 ascii ? 18.20 ns 1.00
Perf_Encoding PR 16 utf-8 ? 17.81 ns 1.09
Perf_Encoding main 16 utf-8 ? 16.63 ns 1.00
Perf_Encoding PR 512 ascii ? 50.45 ns 1.12
Perf_Encoding main 512 ascii ? 44.88 ns 1.00
Perf_Encoding PR 512 utf-8 ? 74.30 ns 1.16
Perf_Encoding main 512 utf-8 ? 64.30 ns 1.00
Perf_Utf8Encoding PR ? ? EnglishAllAscii 17,846.09 ns 1.08
Perf_Utf8Encoding main ? ? EnglishAllAscii 16,454.56 ns 1.00
Perf_Utf8Encoding PR ? ? EnglishMostlyAscii 79,019.77 ns 0.97
Perf_Utf8Encoding main ? ? EnglishMostlyAscii 81,128.97 ns 1.00
Perf_Utf8Encoding PR ? ? Chinese 82,925.92 ns 0.99
Perf_Utf8Encoding main ? ? Chinese 83,539.51 ns 1.00
Perf_Utf8Encoding PR ? ? Cyrillic 97,716.52 ns 0.98
Perf_Utf8Encoding main ? ? Cyrillic 99,311.72 ns 1.00
Perf_Utf8Encoding PR ? ? Greek 151,446.91 ns 0.96
Perf_Utf8Encoding main ? ? Greek 157,171.66 ns 1.00

Here I was able to use VTune (I also tried uProf but it leaves a lot to desire):

vectors_vtune

From what I can see the regression comes from additional 3 vpand instructions:

image

It seems that the first one comes from the new VectorContainsNonAsciiChar implementation (this is expected). But the other two from Vector128.Narrow? @tannergooding is that expected?

ARM64

Details
BenchmarkDotNet=v0.13.1.1828-nightly, OS=ubuntu 20.04
Unknown processor
.NET SDK=7.0.100-rc.1.22379.1
  [Host]     : .NET 7.0.0 (7.0.22.37802), Arm64 RyuJIT AdvSIMD

Method=GetBytes
Type Job size encName Input Mean Ratio
Perf_Encoding PR 16 ascii ? 67.68 ns 0.98
Perf_Encoding main 16 ascii ? 68.75 ns 1.00
Perf_Encoding PR 16 utf-8 ? 75.34 ns 1.00
Perf_Encoding main 16 utf-8 ? 75.26 ns 1.00
Perf_Encoding PR 512 ascii ? 202.77 ns 1.12
Perf_Encoding main 512 ascii ? 180.61 ns 1.00
Perf_Encoding PR 512 utf-8 ? 269.92 ns 1.13
Perf_Encoding main 512 utf-8 ? 239.16 ns 1.00
Perf_Utf8Encoding PR ? ? EnglishAllAscii 79,699.39 ns 1.15
Perf_Utf8Encoding main ? ? EnglishAllAscii 69,283.09 ns 1.00
Perf_Utf8Encoding PR ? ? EnglishMostlyAscii 285,553.35 ns 1.00
Perf_Utf8Encoding main ? ? EnglishMostlyAscii 285,896.31 ns 1.00
Perf_Utf8Encoding PR ? ? Chinese 308,349.96 ns 1.00
Perf_Utf8Encoding main ? ? Chinese 307,750.04 ns 1.00
Perf_Utf8Encoding PR ? ? Cyrillic 217,887.57 ns 1.00
Perf_Utf8Encoding main ? ? Cyrillic 217,912.29 ns 1.00
Perf_Utf8Encoding PR ? ? Greek 312,137.16 ns 1.00
Perf_Utf8Encoding main ? ? Greek 311,067.67 ns 1.00
Author: adamsitnik
Assignees: -
Labels:

area-System.Text.Encoding

Milestone: -


ref byte asciiBuffer = ref *pAsciiBuffer;
Vector128<byte> asciiVector = ExtractAsciiVector(utf16VectorFirst, utf16VectorFirst);
Vector128<byte> asciiVector = Vector128.Narrow(utf16VectorFirst, utf16VectorFirst);
Copy link
Member

@EgorBo EgorBo Jul 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamsitnik Narrow doesn't know that utf16VectorFirst is already ASCII at this point (from my understanding) so it applies a mask via AND to cut anything above 0xFF (not needed in this case). Consider this:
image

So for this case for better perf you probably want to keep ExtractAsciiVector

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log an issue here for Narrow. There is possibly some codegen improvements we can make in .NET 8

const ushort asciiMask = ushort.MaxValue - 127; // 0x7F80
Vector128<ushort> zeroIsAscii = utf16Vector & Vector128.Create(asciiMask);
// If a non-ASCII bit is set in any WORD of the vector, we have seen non-ASCII data.
return !Vector128.EqualsAll(zeroIsAscii, Vector128<ushort>.Zero);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said that it's expected to have a redundant AND here - why? 🙂

@adamsitnik adamsitnik marked this pull request as ready for review August 10, 2022 12:54
@adamsitnik
Copy link
Member Author

@tannergooding Since I was not able to get the same perf with Vector128, I've added Vector128 as a fallback when SSE2 and AdvSIMD are not supported which in theory some configs may benefit from.

@adamsitnik adamsitnik merged commit aae6e9b into dotnet:main Aug 10, 2022
@kunalspathak
Copy link
Contributor

Improvements - dotnet/perf-autofiling-issues#7322

@ghost ghost locked as resolved and limited conversation to collaborators Sep 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants