-
Notifications
You must be signed in to change notification settings - Fork 53
Add Tempo tracing backend to collect span data #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
michaelsproul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but haven't had a chance to test yet
tempo.yaml
Outdated
| max_bytes_per_trace: 1000000000 # ~100 MB per trace limit | ||
| ingestion_rate_limit_bytes: 60000000 # 60MB/sec | ||
| ingestion_burst_size_bytes: 120000000 # 120MB burst No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh i should revisit these overrides, this shouldn't be needed now we've removed all the old spans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've commented them out, and it still accepts the new spans I added.
Looks like the old spans were too big and causing issues here. I'll keep them here in case we run into the issue and need them again to debug.
| - client | ||
| - result | ||
| - non_custody_indices | ||
| - imported_blocks | ||
| - missing_column_indexes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the fields that gets added as labels in the generated metrics for query purpose.
#7815 - removes all existing spans, so some span fields that appear in logs like `service_name` may be lost. - instruments a few key code paths in the beacon node, starting from **root spans** named below: * Gossip block and blobs * `process_gossip_data_column_sidecar` * `process_gossip_blob` * `process_gossip_block` * Rpc block and blobs * `process_rpc_block` * `process_rpc_blobs` * `process_rpc_custody_columns` * Rpc blocks (range and backfill) * `process_chain_segment` * `PendingComponents` lifecycle * `pending_components` To test locally: * Run Grafana and Tempo with sigp/lighthouse-metrics#57 * Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317` Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively: <img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />
|
Shall we merge this? |
Yes let's merge this. I've been using this setup for a while with no issues. (i don't have merge rights to this repo) |
No description provided.