-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
We do have a use-case for low overhead heap profiling (production ready, continuously on) that is currently hard to archive in .NET. I would like to start a conversation about what would be needed in .NET to achieve this and if there is a way forward to get such support in future .NET versions.
Use-Case
We do have a .NET Profiler (for APM and cpu-profiling use-cases) and want to extend it with memory/allocation profiling capabilities. We want to be able to tell our users, what code leads to expensive allocations, in production environment, continuously-on, low-overhead.
What we would like to capture:
- Allocated type
- Allocation size (object size, array size+base object size)
- Callsite of an allocation (full callstack)
- Identify allocations that lead to “surviving” objects (e.g. through tagging objects and get a "free" callback for them)
It shall have the following properties:
- Can be enabled/disabled at runtime
- Has a controllable sampling rate (so that the desired overhead/accuracy can be dynamically adapted, based on user-config or overhead estimation heuristics)
Here is an example of how such data could be visualized: https://www.dynatrace.com/support/help/how-to-use-dynatrace/transactions-and-services/analysis/memory-profiling/
Status Quo
We have researched multiple approaches, but none of them fully satisfied our requirements.
One approach is to use the ObjectAllocated, MovedReferences, SurvivedReferences and GarbageCollectionFinished profiler callbacks. However, this is not viable for production scenarios, since the performance overhead for just enabling these callbacks is extremely high (more than 100%).
Since .NET 5 we can also use the EventPipeEventDelivered profiler callback. There are the AllocationTick_V3, GCBulkMovedObjectRanges and GCBulkSurvivedObjectRanges event pipe events that provide similar data as the profiler callbacks mentioned above. The measured overhead for this was significantly lower (Between ~1% to ~20%, depending on the number of allocations and GC runs. The 20% overhead was measured with a sample that allocates large arrays in a loop. For more realistic applications this overhead is closer to ~2%).
Problems of the event pipe approach:
- Obtaining size of allocated arrays not possible
AllocationTick_V3sampling rate fixed at ~100KB (problematic for applications that allocate very low/high amounts of memory)- Overhead still higher than in java
Array Size
Array size is critical for our use-case, as arrays can make up a significant portion of overall allocations. As mentioned in #43345, it is not possible to obtain the size of allocated array objects in the callback of the AllocationTick_V3 event.
We can obtain the size in the GarbageCollectionStarted profiler callback with the ICorProfilerInfo::GetObjectSize method if we track the ObjectId. However, enabling this profiler callback increases the overhead significantly.
The GCStart event pipe event would have less overhead, however it is not possible to reliably obtain the object size in that callback, since the ICorProfilerInfo::GetObjectSize method sometimes fails with a read access violation at:
coreclr.dll!Object::GetSize() Line 44
coreclr.dll!ProfToEEInterfaceImpl::GetObjectSize(unsigned __int64 objectId, unsigned long * pcSize) Line 1586
Comparable solutions
Since JDK 11, there are callbacks that provide the necessary information with minimal overhead. It matches our use-case really well.
It is possible to monitor allocated objects with the SampledObjectAlloc callback (https://docs.oracle.com/en/java/javase/11/docs/specs/jvmti.html#SampledObjectAlloc). The sampling rate for this callback can be configured with the SetHeapSamplingInterval method.
Additionally, there is the ObjectFree callback that is sent when a tagged object is freed by the garbage collector (https://docs.oracle.com/en/java/javase/11/docs/specs/jvmti.html#ObjectFree).
A detailed description of this can be found at https://openjdk.java.net/jeps/331
Summary
Currently it looks like our use-case cannot be fulfilled in .NET. With this ticket, we're hoping to have a discussion if such a capability makes sense in a future .NET version. If this isn't the right place/form to have such a discussion, please let us know :).