-
Notifications
You must be signed in to change notification settings - Fork 320
Support aot_training mode to be used when creating AOT caches
#10166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Bike shedding: |
PerfectSlayer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Proposed an alternative name for the argument.
Make sure to link the test follow up PR 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Does it make sense to try to detect -XX:AOTCacheOutput on JDK 9+ using ProcessHandle to enable the mode automatically? (maybe this is too heavy during startup)
That was my first thought, but I was worried it would be too heavy - need to do some benchmarking to see what it adds (also need to do some build-wrangling to avoid any reflection overhead) |
|
Maybe |
IIRC that can still end up calling |
|
Yes, sadly it might. About About relying on |
f07d03d to
e17e8ad
Compare
aot_training mode to be used when creating AOT caches
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 59 metrics, 6 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.57.0-SNAPSHOT~b79d752021, baseline=1.57.0-SNAPSHOT~fef9d162d84
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.086 s) : 0, 1085818
Total [baseline] (10.876 s) : 0, 10876217
Agent [candidate] (1.082 s) : 0, 1082443
Total [candidate] (10.811 s) : 0, 10811144
section appsec
Agent [baseline] (1.269 s) : 0, 1268564
Total [baseline] (11.003 s) : 0, 11002885
Agent [candidate] (1.271 s) : 0, 1270940
Total [candidate] (10.978 s) : 0, 10977695
section iast
Agent [baseline] (1.223 s) : 0, 1222689
Total [baseline] (11.191 s) : 0, 11191170
Agent [candidate] (1.221 s) : 0, 1221430
Total [candidate] (11.208 s) : 0, 11207779
section profiling
Agent [baseline] (1.212 s) : 0, 1211600
Total [baseline] (10.909 s) : 0, 10908956
Agent [candidate] (1.21 s) : 0, 1210271
Total [candidate] (11.047 s) : 0, 11046802
gantt
title petclinic - break down per module: candidate=1.57.0-SNAPSHOT~b79d752021, baseline=1.57.0-SNAPSHOT~fef9d162d84
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.182 ms) : 0, 1182
crashtracking [candidate] (1.173 ms) : 0, 1173
BytebuddyAgent [baseline] (652.06 ms) : 0, 652060
BytebuddyAgent [candidate] (649.119 ms) : 0, 649119
GlobalTracer [baseline] (282.752 ms) : 0, 282752
GlobalTracer [candidate] (282.631 ms) : 0, 282631
AppSec [baseline] (32.642 ms) : 0, 32642
AppSec [candidate] (32.413 ms) : 0, 32413
Debugger [baseline] (68.156 ms) : 0, 68156
Debugger [candidate] (68.258 ms) : 0, 68258
Remote Config [baseline] (616.938 µs) : 0, 617
Remote Config [candidate] (641.778 µs) : 0, 642
Telemetry [baseline] (8.969 ms) : 0, 8969
Telemetry [candidate] (8.965 ms) : 0, 8965
Flare Poller [baseline] (3.725 ms) : 0, 3725
Flare Poller [candidate] (3.782 ms) : 0, 3782
section appsec
crashtracking [baseline] (1.185 ms) : 0, 1185
crashtracking [candidate] (1.192 ms) : 0, 1192
BytebuddyAgent [baseline] (692.343 ms) : 0, 692343
BytebuddyAgent [candidate] (692.661 ms) : 0, 692661
GlobalTracer [baseline] (258.753 ms) : 0, 258753
GlobalTracer [candidate] (260.762 ms) : 0, 260762
AppSec [baseline] (175.181 ms) : 0, 175181
AppSec [candidate] (175.598 ms) : 0, 175598
Debugger [baseline] (67.015 ms) : 0, 67015
Debugger [candidate] (66.716 ms) : 0, 66716
Remote Config [baseline] (740.455 µs) : 0, 740
Remote Config [candidate] (722.426 µs) : 0, 722
Telemetry [baseline] (9.182 ms) : 0, 9182
Telemetry [candidate] (9.143 ms) : 0, 9143
Flare Poller [baseline] (3.884 ms) : 0, 3884
Flare Poller [candidate] (4.028 ms) : 0, 4028
IAST [baseline] (24.64 ms) : 0, 24640
IAST [candidate] (24.69 ms) : 0, 24690
section iast
crashtracking [baseline] (1.185 ms) : 0, 1185
crashtracking [candidate] (1.181 ms) : 0, 1181
BytebuddyAgent [baseline] (790.793 ms) : 0, 790793
BytebuddyAgent [candidate] (789.273 ms) : 0, 789273
GlobalTracer [baseline] (255.7 ms) : 0, 255700
GlobalTracer [candidate] (255.905 ms) : 0, 255905
AppSec [baseline] (34.996 ms) : 0, 34996
AppSec [candidate] (34.173 ms) : 0, 34173
Debugger [baseline] (65.096 ms) : 0, 65096
Debugger [candidate] (65.884 ms) : 0, 65884
Remote Config [baseline] (531.416 µs) : 0, 531
Remote Config [candidate] (578.868 µs) : 0, 579
Telemetry [baseline] (8.408 ms) : 0, 8408
Telemetry [candidate] (8.495 ms) : 0, 8495
Flare Poller [baseline] (3.473 ms) : 0, 3473
Flare Poller [candidate] (3.545 ms) : 0, 3545
IAST [baseline] (27.02 ms) : 0, 27020
IAST [candidate] (27.116 ms) : 0, 27116
section profiling
ProfilingAgent [baseline] (96.965 ms) : 0, 96965
ProfilingAgent [candidate] (97.719 ms) : 0, 97719
crashtracking [baseline] (1.215 ms) : 0, 1215
crashtracking [candidate] (1.214 ms) : 0, 1214
BytebuddyAgent [baseline] (706.712 ms) : 0, 706712
BytebuddyAgent [candidate] (704.669 ms) : 0, 704669
GlobalTracer [baseline] (221.932 ms) : 0, 221932
GlobalTracer [candidate] (222.472 ms) : 0, 222472
AppSec [baseline] (32.246 ms) : 0, 32246
AppSec [candidate] (32.44 ms) : 0, 32440
Debugger [baseline] (68.844 ms) : 0, 68844
Debugger [candidate] (68.242 ms) : 0, 68242
Remote Config [baseline] (652.165 µs) : 0, 652
Remote Config [candidate] (628.732 µs) : 0, 629
Telemetry [baseline] (8.99 ms) : 0, 8990
Telemetry [candidate] (9.002 ms) : 0, 9002
Flare Poller [baseline] (3.74 ms) : 0, 3740
Flare Poller [candidate] (3.803 ms) : 0, 3803
Profiling [baseline] (97.569 ms) : 0, 97569
Profiling [candidate] (98.299 ms) : 0, 98299
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.57.0-SNAPSHOT~b79d752021, baseline=1.57.0-SNAPSHOT~fef9d162d84
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.078 s) : 0, 1078397
Total [baseline] (8.747 s) : 0, 8747293
Agent [candidate] (1.091 s) : 0, 1090515
Total [candidate] (8.787 s) : 0, 8787348
section iast
Agent [baseline] (1.231 s) : 0, 1230871
Total [baseline] (9.376 s) : 0, 9375933
Agent [candidate] (1.22 s) : 0, 1219526
Total [candidate] (9.332 s) : 0, 9332364
gantt
title insecure-bank - break down per module: candidate=1.57.0-SNAPSHOT~b79d752021, baseline=1.57.0-SNAPSHOT~fef9d162d84
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.175 ms) : 0, 1175
crashtracking [candidate] (1.203 ms) : 0, 1203
BytebuddyAgent [baseline] (647.596 ms) : 0, 647596
BytebuddyAgent [candidate] (655.958 ms) : 0, 655958
GlobalTracer [baseline] (280.838 ms) : 0, 280838
GlobalTracer [candidate] (284.213 ms) : 0, 284213
AppSec [baseline] (32.191 ms) : 0, 32191
AppSec [candidate] (32.602 ms) : 0, 32602
Debugger [baseline] (67.69 ms) : 0, 67690
Debugger [candidate] (67.339 ms) : 0, 67339
Remote Config [baseline] (653.634 µs) : 0, 654
Remote Config [candidate] (608.059 µs) : 0, 608
Telemetry [baseline] (9.036 ms) : 0, 9036
Telemetry [candidate] (9.103 ms) : 0, 9103
Flare Poller [baseline] (3.751 ms) : 0, 3751
Flare Poller [candidate] (3.814 ms) : 0, 3814
section iast
crashtracking [baseline] (1.191 ms) : 0, 1191
crashtracking [candidate] (1.18 ms) : 0, 1180
BytebuddyAgent [baseline] (796.902 ms) : 0, 796902
BytebuddyAgent [candidate] (789.343 ms) : 0, 789343
GlobalTracer [baseline] (257.585 ms) : 0, 257585
GlobalTracer [candidate] (255.479 ms) : 0, 255479
IAST [baseline] (27.092 ms) : 0, 27092
IAST [candidate] (26.899 ms) : 0, 26899
AppSec [baseline] (35.594 ms) : 0, 35594
AppSec [candidate] (35.363 ms) : 0, 35363
Debugger [baseline] (64.517 ms) : 0, 64517
Debugger [candidate] (63.46 ms) : 0, 63460
Remote Config [baseline] (553.149 µs) : 0, 553
Remote Config [candidate] (626.818 µs) : 0, 627
Telemetry [baseline] (8.359 ms) : 0, 8359
Telemetry [candidate] (8.407 ms) : 0, 8407
Flare Poller [baseline] (3.487 ms) : 0, 3487
Flare Poller [candidate] (3.476 ms) : 0, 3476
LoadParameters
See matching parameters
SummaryFound 4 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 17 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.57.0-SNAPSHOT~b79d752021, baseline=1.57.0-SNAPSHOT~fef9d162d84
dateFormat X
axisFormat %s
section baseline
no_agent (19.5 ms) : 19297, 19703
. : milestone, 19500,
appsec (19.465 ms) : 19264, 19666
. : milestone, 19465,
code_origins (17.725 ms) : 17549, 17900
. : milestone, 17725,
iast (17.942 ms) : 17765, 18119
. : milestone, 17942,
profiling (19.729 ms) : 19528, 19929
. : milestone, 19729,
tracing (17.927 ms) : 17750, 18105
. : milestone, 17927,
section candidate
no_agent (18.152 ms) : 17967, 18338
. : milestone, 18152,
appsec (18.781 ms) : 18593, 18969
. : milestone, 18781,
code_origins (17.68 ms) : 17505, 17854
. : milestone, 17680,
iast (17.864 ms) : 17684, 18044
. : milestone, 17864,
profiling (19.875 ms) : 19669, 20080
. : milestone, 19875,
tracing (18.096 ms) : 17917, 18276
. : milestone, 18096,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.57.0-SNAPSHOT~b79d752021, baseline=1.57.0-SNAPSHOT~fef9d162d84
dateFormat X
axisFormat %s
section baseline
no_agent (1.182 ms) : 1171, 1194
. : milestone, 1182,
iast (3.382 ms) : 3332, 3433
. : milestone, 3382,
iast_FULL (5.735 ms) : 5677, 5793
. : milestone, 5735,
iast_GLOBAL (3.664 ms) : 3597, 3731
. : milestone, 3664,
profiling (2.006 ms) : 1989, 2024
. : milestone, 2006,
tracing (1.852 ms) : 1837, 1868
. : milestone, 1852,
section candidate
no_agent (1.209 ms) : 1198, 1221
. : milestone, 1209,
iast (3.202 ms) : 3162, 3242
. : milestone, 3202,
iast_FULL (5.569 ms) : 5514, 5624
. : milestone, 5569,
iast_GLOBAL (3.461 ms) : 3411, 3511
. : milestone, 3461,
profiling (2.144 ms) : 2123, 2165
. : milestone, 2144,
tracing (1.882 ms) : 1865, 1899
. : milestone, 1882,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.57.0-SNAPSHOT~b79d752021, baseline=1.57.0-SNAPSHOT~fef9d162d84
dateFormat X
axisFormat %s
section baseline
no_agent (15.442 s) : 15442000, 15442000
. : milestone, 15442000,
appsec (14.521 s) : 14521000, 14521000
. : milestone, 14521000,
iast (18.023 s) : 18023000, 18023000
. : milestone, 18023000,
iast_GLOBAL (17.815 s) : 17815000, 17815000
. : milestone, 17815000,
profiling (15.322 s) : 15322000, 15322000
. : milestone, 15322000,
tracing (14.987 s) : 14987000, 14987000
. : milestone, 14987000,
section candidate
no_agent (15.248 s) : 15248000, 15248000
. : milestone, 15248000,
appsec (14.791 s) : 14791000, 14791000
. : milestone, 14791000,
iast (18.509 s) : 18509000, 18509000
. : milestone, 18509000,
iast_GLOBAL (17.84 s) : 17840000, 17840000
. : milestone, 17840000,
profiling (14.698 s) : 14698000, 14698000
. : milestone, 14698000,
tracing (14.821 s) : 14821000, 14821000
. : milestone, 14821000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.57.0-SNAPSHOT~b79d752021, baseline=1.57.0-SNAPSHOT~fef9d162d84
dateFormat X
axisFormat %s
section baseline
no_agent (1.483 ms) : 1471, 1494
. : milestone, 1483,
appsec (3.679 ms) : 3464, 3893
. : milestone, 3679,
iast (2.217 ms) : 2153, 2281
. : milestone, 2217,
iast_GLOBAL (2.258 ms) : 2193, 2323
. : milestone, 2258,
profiling (2.093 ms) : 2039, 2146
. : milestone, 2093,
tracing (2.05 ms) : 1999, 2100
. : milestone, 2050,
section candidate
no_agent (1.479 ms) : 1468, 1491
. : milestone, 1479,
appsec (3.737 ms) : 3516, 3958
. : milestone, 3737,
iast (2.217 ms) : 2152, 2281
. : milestone, 2217,
iast_GLOBAL (2.255 ms) : 2191, 2320
. : milestone, 2255,
profiling (2.072 ms) : 2020, 2125
. : milestone, 2072,
tracing (2.053 ms) : 2002, 2104
. : milestone, 2053,
|
e17e8ad to
9925a51
Compare
In this mode only the agent jar is added to the boot classpath, no services are started. Also workaround potential AOT bug where TraceInterceptor is mistakenly restored from the system class-loader in production, even though it was visible from the boot class-loader during training, resulting in LinkageErrors. Any call to Tracer.addTraceInterceptor from application code in the system class-loader appears to trigger this bug. The workaround is to replace these calls during training with opcodes that pop the tracer and argument, and push the expected return value. This transformation is not persisted, so in production the original method is invoked.
2d191f2 to
7e95bec
Compare
7e95bec to
b79d752
Compare
|
wrt. dd-gitlab/validate_supported_configurations_v2_local_file - I'll add the new |
PerfectSlayer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎯 suggestion: That would be interesting to include this mode into benchmarks too 😉
dd-java-agent/src/main/java/datadog/trace/bootstrap/AgentBootstrap.java
Outdated
Show resolved
Hide resolved
f9564a7 to
436305b
Compare
What Does This Do
This PR adds support for a special "AOT training" mode where only the agent jar is added to the boot classpath, no services are started.
This mode is automatically applied on Java 25 and above when we detect the JVM itself is in AOT training mode.
We do this by reflectively checking whether the JVM's
CDS.isDumpingArchive()helper method returnstrue.It can also be explicitly enabled by adding
=aot_trainingto the end of the-javaagentoption:If necessary the automatic detection can be explicitly turned off using this environment variable:
This PR includes a workaround for a potential AOT bug with Java 25 where
TraceInterceptoris mistakenly restored from the system class-loader in production, even though it was visible from the boot class-loader during training, resulting inLinkageErrors. Any call toTracer.addTraceInterceptorfrom application code in the system class-loader appears to trigger this bug. The workaround is to replace these calls during training with opcodes that pop the tracer and argument, and push the expected return value. This transformation is not persisted, so in production the original method is invoked.Note: early access builds of Java 26 do not appear to suffer from this bug.
Motivation
Improves the experience when using AOT caching on Java 25:
You can still get a large performance boost using this approach, even though the classes aren't transformed during AOT cache creation. With Spring Petclinic the startup time can drop by 30% - from 13s to 8.5s
Additional Notes
The new mode avoids crashes in the Java 25 JDK when adding the Java agent during creation of the AOT cache, such as:
It especially helps when the application consumes
dd-trace-apibecause if you don't add the Java agent during AOT cache creation then you can get a linkage error applying the cache in production because the location of the API classes has changed:I'll add a smoke test to cover this use-case in a separate PR, but would like to get this change merged first.
Contributor Checklist
type:and (comp:orinst:) labels in addition to any useful labelsclose,fixor any linking keywords when referencing an issue.Use
solvesinstead, and assign the PR milestone to the issueJira ticket: APMS-18027