Skip to content

Conversation

@carlossanlop
Copy link
Contributor

The internal TarHeader struct is used as a member of the public TarEntry class. It describes all the fields of a tar header and hosts all the functionality that reads and writes the header into the archive stream buffer.

The fact that it is a struct caused some issues when adding the async functionality. Some examples:

  • It is not possible to pass it as an async method argument because the original instance does not get modified.
  • The same problem exists when calling one of the async methods from within an instance of a struct: the original instance's fields do not get modified.
  • This forced me to write some of the async methods differently to their sync version, preventing me from reducing code duplication.
  • Nullability handling is not as good when dealing with a struct, there's no enforced need to specify an initial value for, say, a non-nullable string member.

Having it defined as a struct does not help with perf since it is a large type. So changing it to a class shouldn't affect performance much. The main advantage is that I will be able to submit another PR removing all the code duplication (didn't put it all here to avoid creating a huge PR).

Additional changes:

  • I moved to TarHelpers the method that decides if a regular file entry type should be converted to the appropriate target format, and reused it in another location that had the same need.
  • Added constructors for the now TarHeader class. Initialized the mandatory fields, and converted the non-mandatory string fields to nullable.

Simplify return value of async methods (one value instead of tuple).
Use nullability where TarHeader is returned or out'd.
Move method to get correct entry type to TarHelpers. Reuse it in TarEntry conversion constructor.
@carlossanlop carlossanlop added this to the 7.0.0 milestone Jul 19, 2022
@carlossanlop carlossanlop self-assigned this Jul 19, 2022
@ghost
Copy link

ghost commented Jul 19, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

The internal TarHeader struct is used as a member of the public TarEntry class. It describes all the fields of a tar header and hosts all the functionality that reads and writes the header into the archive stream buffer.

The fact that it is a struct caused some issues when adding the async functionality. Some examples:

  • It is not possible to pass it as an async method argument because the original instance does not get modified.
  • The same problem exists when calling one of the async methods from within an instance of a struct: the original instance's fields do not get modified.
  • This forced me to write some of the async methods differently to their sync version, preventing me from reducing code duplication.
  • Nullability handling is not as good when dealing with a struct, there's no enforced need to specify an initial value for, say, a non-nullable string member.

Having it defined as a struct does not help with perf since it is a large type. So changing it to a class shouldn't affect performance much. The main advantage is that I will be able to submit another PR removing all the code duplication (didn't put it all here to avoid creating a huge PR).

Additional changes:

  • I moved to TarHelpers the method that decides if a regular file entry type should be converted to the appropriate target format, and reused it in another location that had the same need.
  • Added constructors for the now TarHeader class. Initialized the mandatory fields, and converted the non-mandatory string fields to nullable.
Author: carlossanlop
Assignees: carlossanlop
Labels:

area-System.IO

Milestone: 7.0.0

@eerhardt
Copy link
Member

So changing it to a class shouldn't affect performance much.

Do you have any perf tests to back this statement?

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits. LGTM otherwise.

@carlossanlop
Copy link
Contributor Author

carlossanlop commented Jul 19, 2022

Do you have any perf tests to back this statement?

Working on it...

Edit: @eerhardt Here are the results - Took some time because I had to build runtime in -c Release.

Before my changes (base commit before this PR changes, b6932fd):

PS D:\performance\src\benchmarks\micro> dotnet run -c release -f net7.0 --artifacts "D:\tmp\perf_before\" --coreRun "D:\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe" --filter "System.Formats.Tar*"

|                     Method |     Mean |    Error |   StdDev |   Median |      Min |        Max |  Gen 0 | Allocated |
|--------------------------- |---------:|---------:|---------:|---------:|---------:|-----------:|-------:|----------:|
|   CreateFromDirectory_Path | 937.9 μs | 42.76 μs | 49.25 μs | 936.3 μs | 821.5 μs | 1,039.7 μs |      - |  15.06 KB |
|    ExtractToDirectory_Path | 756.3 μs | 22.50 μs | 25.91 μs | 748.9 μs | 729.6 μs |   822.5 μs |      - |  12.53 KB |
| CreateFromDirectory_Stream | 265.7 μs |  4.66 μs |  4.78 μs | 265.1 μs | 257.5 μs |   274.7 μs | 2.0833 |  18.46 KB |
|  ExtractToDirectory_Stream | 728.1 μs | 16.69 μs | 19.22 μs | 725.6 μs | 703.9 μs |   762.5 μs |      - |  12.53 KB |

After my changes (last commit in this PR, 5bd06d0):

PS D:\performance\src\benchmarks\micro> dotnet run -c release -f net7.0 --artifacts "D:\tmp\perf_after\" --coreRun "D:\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe" --filter "System.Formats.Tar*"

|                     Method |     Mean |    Error |   StdDev |   Median |      Min |        Max |  Gen 0 | Allocated |
|--------------------------- |---------:|---------:|---------:|---------:|---------:|-----------:|-------:|----------:|
|   CreateFromDirectory_Path | 934.5 μs | 43.35 μs | 46.39 μs | 945.6 μs | 850.0 μs | 1,006.4 μs |      - |  14.63 KB |
|    ExtractToDirectory_Path | 749.7 μs | 19.96 μs | 22.19 μs | 748.9 μs | 719.6 μs |   801.8 μs |      - |  11.89 KB |
| CreateFromDirectory_Stream | 266.5 μs |  5.76 μs |  6.63 μs | 267.1 μs | 255.1 μs |   278.3 μs | 2.0492 |  18.04 KB |
|  ExtractToDirectory_Stream | 717.3 μs | 14.01 μs | 14.99 μs | 719.7 μs | 694.8 μs |   741.6 μs |      - |  11.91 KB |

ResultsComparer output:

PS D:\performance\src\tools\ResultsComparer> dotnet run --base D:\tmp\perf_before\ --diff D:\tmp\perf_after\ --threshold 0.0001%
No differences found between the benchmark results with threshold 9.999999999999999E-05%.

@carlossanlop
Copy link
Contributor Author

From my chat conversation with @eerhardt: we are seeing fewer allocations, and this might be because the huge structs were being copied to the async methods, but now with a class this doesn't happen. Also, it's possible that converting the non-default strings to nullable, and not assigning an initial value, helped too.

@carlossanlop carlossanlop merged commit 10f4f43 into dotnet:main Jul 19, 2022
@carlossanlop carlossanlop deleted the ConvertTarHeaderToClass branch July 19, 2022 22:33
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks much better now, thank you @carlossanlop !

@ghost ghost locked as resolved and limited conversation to collaborators Aug 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants