-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Optimize compression of embedded PDB file. #113803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Instead of buffering the compressed bytes to a `MemoryStream`, we directly write them to the `BlobBuilder`. This is achieved by defining a write-only stream type that wraps a `BlobBuilder`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR streamlines the compression of embedded PDB data by writing compressed bytes directly to a BlobBuilder instead of buffering them in a MemoryStream.
- Removed the MemoryStream buffering and its associated WriteBytes call.
- Introduced a custom write-only stream (BlobBuilderStream) to directly pipe data into the BlobBuilder.
Comments suppressed due to low confidence (1)
src/libraries/System.Reflection.Metadata/src/System/Reflection/PortableExecutable/DebugDirectory/DebugDirectoryBuilder.EmbeddedPortablePdb.cs:50
- Consider specifying 'leaveOpen: true' when constructing the DeflateStream to ensure that disposing it does not inadvertently affect the BlobBuilderStream if the builder is used after compression. Alternatively, explicitly override Dispose in BlobBuilderStream to prevent any undesired disposal behavior.
using (var deflate = new DeflateStream(new BlobBuilderStream(builder), CompressionLevel.Optimal))
|
Tagging subscribers to this area: @dotnet/area-system-reflection-metadata |
|
Could you please run a small micro-benchmark that validates the change has the expected effect? |
|
I got these results when compressing
Benchmark programusing BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.IO.Compression;
using System.Reflection.Metadata;
BenchmarkRunner.Run<PdbWriterBenchmark>();
[MemoryDiagnoser]
public class PdbWriterBenchmark
{
public const int BlockSize = 16 << 10;
private readonly BlobBuilder _data = new();
[GlobalSetup]
public void GlobalSetup()
{
using var f = File.OpenRead(@"C:\Users\teo\code\ephemeral\PdbWriterBenchmark\System.Private.CoreLib.pdb");
_data.TryWriteBytes(f, (int)f.Length);
}
[Benchmark]
public int Buffered()
{
var compressed = new BlobBuilder();
var buffer = new MemoryStream();
using (var compressor = new DeflateStream(buffer, CompressionLevel.Optimal, leaveOpen: true))
{
_data.WriteContentTo(compressor);
}
compressed.WriteBytes(buffer.GetBuffer(), 0, (int)buffer.Length);
return compressed.Count;
}
[Benchmark]
public int Unbuffered()
{
var compressed = new BlobBuilder();
using (var compressor = new DeflateStream(new BlobBuilderStream(compressed), CompressionLevel.Optimal, leaveOpen: true))
{
_data.WriteContentTo(compressor);
}
return compressed.Count;
}
}
internal sealed class BlobBuilderStream(BlobBuilder builder) : Stream
{
public override bool CanRead => false;
public override bool CanSeek => false;
public override bool CanWrite => true;
public override long Length => throw new NotSupportedException();
public override long Position { get => throw new NotSupportedException(); set => throw new NotSupportedException(); }
public override void Flush() { }
public override int Read(byte[] buffer, int offset, int count) => throw new NotSupportedException();
public override long Seek(long offset, SeekOrigin origin) => throw new NotSupportedException();
public override void SetLength(long value) => throw new NotSupportedException();
public override void Write(byte[] buffer, int offset, int count) => builder.WriteBytes(buffer, offset, count);
public override unsafe void Write(ReadOnlySpan<byte> buffer)
{
fixed (byte* ptr = buffer)
{
builder.WriteBytes(ptr, buffer.Length);
}
}
}It looks like we save LOH allocations. Once #100418 lands, user code can customize chunking and use pooled memory, achieving greater savings. |
|
/ba-g wasm timeouts |
jkotas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Instead of buffering the compressed bytes to a
MemoryStream, we directly write them to theBlobBuilder. This is achieved by defining a write-only stream type that wraps aBlobBuilder.