Skip to content

fix(ci): switch Kafka from native to JVM image to prevent SIGILL crashes#7708

Merged
BridgeAR merged 2 commits intomasterfrom
brian.marks/fix-kafka-native-image
Mar 17, 2026
Merged

fix(ci): switch Kafka from native to JVM image to prevent SIGILL crashes#7708
BridgeAR merged 2 commits intomasterfrom
brian.marks/fix-kafka-native-image

Conversation

@bm1549
Copy link
Copy Markdown
Contributor

@bm1549 bm1549 commented Mar 6, 2026

What does this PR do?

Switches CI and local dev Kafka images from apache/kafka-native:3.9.1 (GraalVM native binary) to apache/kafka:3.9.1 (standard JVM) to prevent SIGILL crashes on GitHub-hosted runners.

Files changed:

  • docker-compose.yml — local dev Kafka service
  • .github/workflows/apm-integrations.ymlconfluentinc-kafka-javascript and kafkajs CI jobs
  • .github/workflows/appsec.yml — AppSec Kafka CI job

Motivation

apache/kafka-native:3.9.1 is a GraalVM native binary compiled for specific CPU instruction set extensions (AVX2/AVX-512). On GitHub-hosted runners with heterogeneous CPU generations that lack those extensions, the container crashes with SIGILL before any tests run — causing an intermittent ~2-3% failure rate across Kafka-dependent CI jobs.

Switching to the JVM-based image avoids this because the JVM detects CPU capabilities at runtime.

Additional Notes

Performance impact: The JVM image is slower to start (~5-10s vs ~1-2s for the native image), but this startup happens in parallel with job setup steps (dependency installation, etc.), so it adds effectively zero observable delay to test wall-clock time. The actual Kafka test execution time is unchanged — only the container startup differs.

Memory impact (measured):

Image Idle Memory
apache/kafka:3.9.1 (JVM) ~276 MiB
apache/kafka-native:3.9.1 (native) ~176 MiB

The JVM image uses ~100 MiB more (~1.3% of the 7 GB available on GitHub-hosted runners).

🤖 Generated with Claude Code

apache/kafka-native:3.9.1 is a GraalVM native binary compiled for specific
CPU instruction set extensions (AVX2/AVX-512). On GitHub-hosted runners with
heterogeneous CPUs that lack those extensions, the container crashes with
SIGILL before any tests run. The JVM-based apache/kafka:3.9.1 detects CPU
capabilities at runtime and avoids the crash.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@bm1549 bm1549 added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Mar 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

Overall package size

Self size: 4.96 MB
Deduped: 5.81 MB
No deduping: 5.81 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.0.0 | 81.15 kB | 815.98 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.42%. Comparing base (dd965cf) to head (3553c90).
⚠️ Report is 26 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7708   +/-   ##
=======================================
  Coverage   80.42%   80.42%           
=======================================
  Files         741      741           
  Lines       32177    32177           
=======================================
+ Hits        25878    25879    +1     
+ Misses       6299     6298    -1     
Flag Coverage Δ
aiguard-macos 38.84% <ø> (-0.10%) ⬇️
aiguard-ubuntu 38.95% <ø> (-0.10%) ⬇️
aiguard-windows 38.68% <ø> (-0.10%) ⬇️
apm-capabilities-tracing-macos 48.90% <ø> (+0.04%) ⬆️
apm-capabilities-tracing-ubuntu 48.93% <ø> (ø)
apm-capabilities-tracing-windows 48.67% <ø> (ø)
apm-integrations-child-process 38.39% <ø> (-0.10%) ⬇️
apm-integrations-couchbase-18 37.31% <ø> (-0.10%) ⬇️
apm-integrations-couchbase-eol 37.78% <ø> (-0.10%) ⬇️
apm-integrations-oracledb 37.62% <ø> (-0.10%) ⬇️
appsec-express 55.20% <ø> (-0.07%) ⬇️
appsec-fastify 51.54% <ø> (-0.07%) ⬇️
appsec-graphql 51.74% <ø> (-0.06%) ⬇️
appsec-kafka 44.28% <ø> (-0.08%) ⬇️
appsec-ldapjs 43.93% <ø> (-0.08%) ⬇️
appsec-lodash 43.59% <ø> (-0.08%) ⬇️
appsec-macos 58.21% <ø> (-0.07%) ⬇️
appsec-mongodb-core 48.70% <ø> (-0.09%) ⬇️
appsec-mongoose 49.37% <ø> (-0.08%) ⬇️
appsec-mysql 50.80% <ø> (-0.07%) ⬇️
appsec-node-serialize 43.11% <ø> (-0.08%) ⬇️
appsec-passport 47.55% <ø> (-0.09%) ⬇️
appsec-postgres 50.54% <ø> (-0.05%) ⬇️
appsec-sourcing 42.51% <ø> (-0.08%) ⬇️
appsec-template 43.27% <ø> (-0.08%) ⬇️
appsec-ubuntu 58.29% <ø> (-0.07%) ⬇️
appsec-windows 58.07% <ø> (-0.07%) ⬇️
instrumentations-instrumentation-bluebird 32.26% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-body-parser 40.41% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-child_process 37.71% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-cookie-parser 34.24% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-express 34.56% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-express-mongo-sanitize 34.37% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-express-session 40.05% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-fs 31.88% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-generic-pool 29.69% <ø> (ø)
instrumentations-instrumentation-http 39.69% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-knex 32.27% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-mongoose 33.39% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-multer 40.16% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-mysql2 38.17% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-passport 43.97% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-passport-http 43.64% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-passport-local 44.17% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-pg 37.60% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-promise 32.19% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-promise-js 32.20% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-q 32.24% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-url 32.17% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-when 32.21% <ø> (-0.10%) ⬇️
llmobs-ai 42.15% <ø> (-0.09%) ⬇️
llmobs-anthropic 40.14% <ø> (-0.09%) ⬇️
llmobs-bedrock 39.13% <ø> (-0.08%) ⬇️
llmobs-google-genai 39.69% <ø> (-0.09%) ⬇️
llmobs-langchain 39.93% <ø> (-0.08%) ⬇️
llmobs-openai 43.87% <ø> (-0.09%) ⬇️
llmobs-vertex-ai 39.94% <ø> (-0.09%) ⬇️
platform-core 31.53% <ø> (ø)
platform-esbuild 34.48% <ø> (ø)
platform-instrumentations-misc 48.40% <ø> (ø)
platform-shimmer 37.63% <ø> (ø)
platform-unit-guardrails 32.95% <ø> (ø)
plugins-azure-event-hubs 25.83% <ø> (ø)
plugins-azure-service-bus 25.19% <ø> (ø)
plugins-bullmq 44.18% <ø> (+<0.01%) ⬆️
plugins-cassandra 37.66% <ø> (-0.10%) ⬇️
plugins-cookie 26.89% <ø> (ø)
plugins-cookie-parser 26.67% <ø> (ø)
plugins-crypto 26.79% <ø> (ø)
plugins-dd-trace-api 38.22% <ø> (-0.10%) ⬇️
plugins-express-mongo-sanitize 26.82% <ø> (ø)
plugins-express-session 26.63% <ø> (ø)
plugins-fastify 42.13% <ø> (-0.09%) ⬇️
plugins-fetch 38.22% <ø> (-0.09%) ⬇️
plugins-fs 38.49% <ø> (-0.10%) ⬇️
plugins-generic-pool 25.87% <ø> (ø)
plugins-google-cloud-pubsub 45.34% <ø> (-0.09%) ⬇️
plugins-grpc 40.81% <ø> (-0.09%) ⬇️
plugins-handlebars 26.86% <ø> (ø)
plugins-hapi 40.04% <ø> (-0.10%) ⬇️
plugins-hono 40.30% <ø> (-0.10%) ⬇️
plugins-ioredis 38.30% <ø> (-0.10%) ⬇️
plugins-knex 26.50% <ø> (ø)
plugins-ldapjs 24.36% <ø> (ø)
plugins-light-my-request 26.23% <ø> (ø)
plugins-limitd-client 32.54% <ø> (-0.10%) ⬇️
plugins-lodash 25.96% <ø> (ø)
plugins-mariadb 39.40% <ø> (-0.05%) ⬇️
plugins-memcached 38.03% <ø> (-0.10%) ⬇️
plugins-microgateway-core 39.11% <ø> (-0.10%) ⬇️
plugins-moleculer 40.40% <ø> (-0.09%) ⬇️
plugins-mongodb 39.05% <ø> (-0.10%) ⬇️
plugins-mongodb-core 38.88% <ø> (-0.10%) ⬇️
plugins-mongoose 38.74% <ø> (-0.10%) ⬇️
plugins-multer 26.63% <ø> (ø)
plugins-mysql 39.04% <ø> (-0.10%) ⬇️
plugins-mysql2 39.14% <ø> (-0.10%) ⬇️
plugins-node-serialize 26.93% <ø> (ø)
plugins-opensearch 37.49% <ø> (-0.10%) ⬇️
plugins-passport-http 26.68% <ø> (ø)
plugins-postgres 35.53% <ø> (-0.09%) ⬇️
plugins-process 26.79% <ø> (ø)
plugins-pug 26.89% <ø> (ø)
plugins-redis 38.77% <ø> (-0.10%) ⬇️
plugins-router 42.85% <ø> (-0.10%) ⬇️
plugins-sequelize 25.47% <ø> (ø)
plugins-test-and-upstream-amqp10 38.38% <ø> (-0.10%) ⬇️
plugins-test-and-upstream-amqplib 43.76% <ø> (-0.10%) ⬇️
plugins-test-and-upstream-apollo 39.01% <ø> (-0.09%) ⬇️
plugins-test-and-upstream-avsc 38.55% <ø> (-0.10%) ⬇️
plugins-test-and-upstream-bunyan 33.80% <ø> (-0.10%) ⬇️
plugins-test-and-upstream-connect 40.70% <ø> (-0.10%) ⬇️
plugins-test-and-upstream-graphql 40.00% <ø> (-0.10%) ⬇️
plugins-test-and-upstream-koa 40.29% <ø> (-0.10%) ⬇️
plugins-test-and-upstream-protobufjs 38.78% <ø> (-0.10%) ⬇️
plugins-test-and-upstream-rhea 43.97% <ø> (-0.07%) ⬇️
plugins-undici 38.99% <ø> (-0.09%) ⬇️
plugins-url 26.79% <ø> (ø)
plugins-valkey 37.97% <ø> (-0.10%) ⬇️
plugins-vm 26.79% <ø> (ø)
plugins-winston 33.99% <ø> (-0.10%) ⬇️
plugins-ws 41.76% <ø> (-0.10%) ⬇️
profiling-macos 39.80% <ø> (-0.10%) ⬇️
profiling-ubuntu 39.92% <ø> (-0.10%) ⬇️
profiling-windows 41.51% <ø> (-0.10%) ⬇️
serverless-azure-functions-client 25.54% <ø> (ø)
serverless-azure-functions-eventhubs 25.54% <ø> (ø)
serverless-azure-functions-servicebus 25.54% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 6, 2026

Benchmarks

Benchmark execution time: 2026-03-13 14:40:00

Comparing candidate commit 3553c90 in PR branch brian.marks/fix-kafka-native-image with baseline commit dd965cf in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 230 metrics, 30 unstable metrics.

@bm1549 bm1549 marked this pull request as ready for review March 17, 2026 00:46
@bm1549 bm1549 requested review from a team as code owners March 17, 2026 00:46
Copy link
Copy Markdown
Member

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, while it is actually difficult to confirm the claim of fixing the issue. Replacing the experimental one with the stable one is good on its own though :)

@BridgeAR BridgeAR merged commit 6dac2bd into master Mar 17, 2026
789 checks passed
@BridgeAR BridgeAR deleted the brian.marks/fix-kafka-native-image branch March 17, 2026 13:41
dd-octo-sts bot pushed a commit that referenced this pull request Mar 17, 2026
…hes (#7708)

apache/kafka-native:3.9.1 is a GraalVM native binary compiled for specific
CPU instruction set extensions (AVX2/AVX-512). On GitHub-hosted runners with
heterogeneous CPUs that lack those extensions, the container crashes with
SIGILL before any tests run. The JVM-based apache/kafka:3.9.1 detects CPU
capabilities at runtime and avoids the crash.

Co-authored-by: Claude Sonnet 4.6 <[email protected]>
@dd-octo-sts dd-octo-sts bot mentioned this pull request Mar 17, 2026
juan-fernandez pushed a commit that referenced this pull request Mar 18, 2026
…hes (#7708)

apache/kafka-native:3.9.1 is a GraalVM native binary compiled for specific
CPU instruction set extensions (AVX2/AVX-512). On GitHub-hosted runners with
heterogeneous CPUs that lack those extensions, the container crashes with
SIGILL before any tests run. The JVM-based apache/kafka:3.9.1 detects CPU
capabilities at runtime and avoids the crash.

Co-authored-by: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos semver-patch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants