Skip to content

Implement 64-bit trace ID system with double-buffered storage and liveness tracking#262

Merged
jbachorik merged 8 commits intomainfrom
jb/liveness_1
Sep 8, 2025
Merged

Implement 64-bit trace ID system with double-buffered storage and liveness tracking#262
jbachorik merged 8 commits intomainfrom
jb/liveness_1

Conversation

@jbachorik
Copy link
Copy Markdown
Collaborator

@jbachorik jbachorik commented Aug 18, 2025

What does this PR do?:

This PR implements a liveness-aware double-buffered call trace storage system with several key improvements:

  1. Liveness-aware trace management with selective preservation across JFR dumps
  2. Contention handling with dropped trace visibility in JFR output
  3. Double-buffered storage with active/standby hash table instances
  4. 64-bit trace ID system with instance-based collision avoidance
  5. Modular hash table architecture with dedicated CallTraceHashTable class

Motivation:

The changes address critical issues in call trace management:

  • Random CI test failures due to lock contention causing dropped samples without visibility
  • Liveness tracking requirements for preserving traces of live objects across garbage collection
  • Trace ID stability needed for consistent liveness tracking across storage swaps
  • Performance and modularity improvements through specialized hash table implementation

Additional Notes:

Key Features:

Liveness-Aware Storage:

  • Callback-based liveness checker registration system
  • Selective trace preservation during storage transitions
  • Coordinated trace collection through processTraces() method
  • Support for multiple concurrent liveness checkers

Contention Handling:

  • Special dropped trace with reserved ID (1ULL) for contention visibility
  • <dropped due to contention> shown in JFR stack traces instead of null entries
  • Platform-specific ASGCT_CallFrame alignment using LP64_ONLY macro
  • BCI_ERROR routing for proper native method resolution

Double-Buffered Architecture:

  • Active/standby hash table pattern for lock-free JFR operations
  • Instance-based trace IDs: (instance_id << 32) | slot preventing collisions
  • Atomic storage swapping with minimal profiling overhead
  • Thread-safe instance ID generation across storage transitions

Hash Table Improvements:

  • Extracted dedicated CallTraceHashTable class (441 lines)
  • Concurrent table expansion with proper synchronization
  • Overflow trace handling for hash table limits
  • Lock-free put operations with retry-based contention handling

Implementation Details:

Core Storage Refactoring:

  • CallTraceStorage reduced from 265→142 lines through hash table extraction
  • Dual active/standby storage instances with atomic swapping
  • Liveness preservation system integrated with JFR dump cycles

64-bit Trace ID Migration:

  • Updated all profiling interfaces: recordJVMTISample(), recordSample(), recordDeferredSample()
  • Modified LivenessTracker for 64-bit trace ID handling
  • JFR integration updated for 64-bit trace ID constant pool support
  • Instance-based ID generation preventing cross-storage collisions

Platform Compatibility:

  • COMMA macro factored to arch_dd.h for consistent designated initializer syntax
  • LP64_ONLY macro usage for proper 64-bit platform struct alignment
  • Cross-platform ASGCT_CallFrame structure handling

New Files:

  • callTraceHashTable.{h,cpp} - Dedicated hash table implementation (441 lines)
  • test_callTraceStorage.cpp - Comprehensive unit tests with liveness scenarios (387 lines)
  • LivenessTrackingTest.java - Java integration test for end-to-end validation (246 lines)
  • ContendedCallTraceStorageTest.java - Contention measurement and validation test (249 lines)

Modified Files:

  • Core profiler: profiler.{h,cpp}, objectSampler.cpp, wallClock.{h,cpp} - 64-bit trace ID adoption
  • Storage: callTraceStorage.{h,cpp} - major refactoring with liveness integration
  • JFR: flightRecorder.{h,cpp} - 64-bit trace ID support and dropped trace handling
  • Liveness: livenessTracker.{h,cpp} - 64-bit trace ID migration
  • Architecture: arch_dd.h - COMMA macro consolidation

How to test the change?:

# Run comprehensive test suite
./gradlew testDebug

# C++ unit tests for storage and liveness
./gradlew gtestDebug  

# Build verification across configurations
./gradlew buildDebug buildRelease

# Code formatting
./gradlew spotlessApply

The implementation includes extensive test coverage:

  • 9 C++ unit tests for CallTraceStorage liveness scenarios
  • Java integration tests for end-to-end liveness tracking
  • Contention measurement and validation tests
  • Platform-specific compatibility tests

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: PROF-12316

Summary: +1689 lines, -447 lines (net +1239 lines)

This implementation provides a robust foundation for liveness-aware profiling with clear visibility into contention issues while maintaining high performance through lock-free operations and efficient storage management.

@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Aug 18, 2025

CppCheck Report

Errors (2)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Aug 19, 2025

CppCheck Report

Errors (2)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link