Is your feature request related to a problem? Please describe.
📋 Issue Summary
Instrumentation is currently not working for Apache Spark Structured Streaming consumers, particularly when processing Kafka messages with trace context in headers. This creates a significant observability gap for distributed tracing in Spark-based streaming applications.
https://spark.apache.org/docs/latest/streaming/index.html#:~:text=Structured%20Streaming%20is%20a%20scalable%20and%20fault-tolerant%20stream,would%20express%20a%20batch%20computation%20on%20static%20data.
🔍 Problem Description
Current Behavior
- Spark Structured Streaming consumers cannot properly propagate OpenTelemetry trace context
- Trace context from Kafka message headers is not automatically extracted and propagated
- No built-in OpenTelemetry instrumentation exists for Apache Spark in streaming contexts
- Manual trace context extraction and span creation is required but not straightforward
Impact
- Observability Gap: Loss of distributed tracing across Spark streaming pipelines
- Debugging Difficulty: Inability to trace message flow through Spark transformations
- Performance Monitoring: Missing insights into processing latency and bottlenecks
- Compliance Issues: Difficulty meeting observability requirements in production environments
🛠️ Root Cause Analysis
Technical Challenges
- Header Access Complexity: Kafka headers containing trace context require manual parsing and extraction
- Span Lifecycle Management: Manual span creation, linking, and cleanup in streaming contexts
- Resource Management: Proper OpenTelemetry SDK initialization and cleanup in Spark executor environments
Current Limitations
- No native OpenTelemetry support in Apache Spark Structured Streaming
- Manual trace context extraction from Kafka headers required
- Complex span linking and parent-child relationship management
- No automatic instrumentation for Spark transformations
Describe the solution you'd like
Hoping for a solution without manual propagation.
Hoping for a solution where users can just use ./spark-submit [...] --javaagent and propagation will work. (And this is achievable!)
Describe alternatives you've considered
No response
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Is your feature request related to a problem? Please describe.
📋 Issue Summary
Instrumentation is currently not working for Apache Spark Structured Streaming consumers, particularly when processing Kafka messages with trace context in headers. This creates a significant observability gap for distributed tracing in Spark-based streaming applications.
https://spark.apache.org/docs/latest/streaming/index.html#:~:text=Structured%20Streaming%20is%20a%20scalable%20and%20fault-tolerant%20stream,would%20express%20a%20batch%20computation%20on%20static%20data.
🔍 Problem Description
Current Behavior
Impact
🛠️ Root Cause Analysis
Technical Challenges
Current Limitations
Describe the solution you'd like
Hoping for a solution without manual propagation.
Hoping for a solution where users can just use
./spark-submit [...] --javaagentand propagation will work. (And this is achievable!)Describe alternatives you've considered
No response
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding
+1orme too, to help us triage it. Learn more here.