Periodically tracing stops working in a weird way that results in every message being traced, disregarding the sampling rate and the top level root span.
What I expect to happen
Tracing is configured as so:
const TRACE_HEADER_BEHAVIOR = process.env.TRACE_HEADER_BEHAVIOR;
let contextHeaderBehavior: 'default' | 'require' = 'default';
let samplingRate: number = 1/60;
if (TRACE_HEADER_BEHAVIOR === 'require') {
contextHeaderBehavior = TRACE_HEADER_BEHAVIOR;
samplingRate = 0;
}
export const tracer = traceAgent.start({
contextHeaderBehavior,
enabled: process.env.TRACING === 'enabled',
keyFilename: process.env.GOOGLE_APPLICATION_CREDENTIALS,
logLevel: 0,
projectId: process.env.TRACING_PROJECT,
samplingRate,
});
We should expect that we will see one trace per minute, and we do when it is functioning normally. A service is configured to send a heartbeat once per second to a queue service, it runs in a root span called heartbeat.
export async function runHeartbeat(
parentLogger: Logger,
dispatcherUri: string,
rateHz: number,
) {
const chance = new Chance();
const durationMs = 1000 / rateHz;
return pForever(
async i =>
tracer.runInRootSpan({ name: 'heartbeat' }, async rootSpan => {
const logger = traceLogger(parentLogger);
const message: Payload = {
payload: JSON.stringify({
hops: 0,
i,
origin: `heartbeat-${os.hostname}-${i}`,
[chance.string()]: chance.string(),
}),
};
const uri = `${dispatcherUri}`;
try {
await postHeartbeat(logger, uri, message);
} catch (err) {
logger.error(
{ message, err, dispatcherUri },
'Error posting message to Dispatcher',
);
} finally {
rootSpan.endSpan();
}
await sleep(durationMs);
return i + 1;
}),
1,
);
}
async function postHeartbeat(
parentLogger: Logger,
uri: string,
message: Payload,
) {
const logger = traceLogger(parentLogger);
logger.trace({ message }, 'Posting heartbeat');
const response = await got.post(uri, {
body: JSON.stringify(message),
headers: {
'Content-Type': 'application/json',
},
timeout: 30_000,
});
// SAFETY: ! allowed because we are using a POST and not a st1ream
const statusCode = response.statusCode!;
const body = response.body;
if (statusCode < 200 || statusCode >= 300) {
logger.error({ body, statusCode }, 'Invalid status code');
}
return;
}
This is what a normal set of traces look like in Stackdriver Trace:

We see the root span, we see traces about once per minute (obeying the sampling rate of 1/60), and I can dig into the contents of the child spans, and there are log messages associated with the spans.
What happens when it breaks
Instead of observing the root span and all child spans across multiple services, I instead see the first HTTP call, a POST to /dispatcher/ping-pong. And instead of being traced once a minute, I see traces for every POST once a second. This is extremely strange! It's like the trace service is disregarding my root span entirely, and the sampling rate is being set to 0!

And do you see that readiness trace in the screenshot? That's supposed to be filtered out! It's a Kubernetes readiness probe. The server that emitted that has TRACE_HEADER_BEHAVIOR=require, so it shouldn't be possible for it to appear! Here's a search for just that path filtered over the last four hours:

Periodically tracing stops working in a weird way that results in every message being traced, disregarding the sampling rate and the top level root span.
What I expect to happen
Tracing is configured as so:
We should expect that we will see one trace per minute, and we do when it is functioning normally. A service is configured to send a heartbeat once per second to a queue service, it runs in a root span called
heartbeat.This is what a normal set of traces look like in Stackdriver Trace:
We see the root span, we see traces about once per minute (obeying the sampling rate of 1/60), and I can dig into the contents of the child spans, and there are log messages associated with the spans.
What happens when it breaks
Instead of observing the root span and all child spans across multiple services, I instead see the first HTTP call, a
POSTto/dispatcher/ping-pong. And instead of being traced once a minute, I see traces for everyPOSTonce a second. This is extremely strange! It's like the trace service is disregarding my root span entirely, and the sampling rate is being set to 0!And do you see that
readinesstrace in the screenshot? That's supposed to be filtered out! It's a Kubernetes readiness probe. The server that emitted that hasTRACE_HEADER_BEHAVIOR=require, so it shouldn't be possible for it to appear! Here's a search for just that path filtered over the last four hours: