Skip to content

Conversation

@jimmygchen
Copy link
Member

No description provided.

Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but haven't had a chance to test yet

tempo.yaml Outdated
Comment on lines 33 to 35
max_bytes_per_trace: 1000000000 # ~100 MB per trace limit
ingestion_rate_limit_bytes: 60000000 # 60MB/sec
ingestion_burst_size_bytes: 120000000 # 120MB burst No newline at end of file
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh i should revisit these overrides, this shouldn't be needed now we've removed all the old spans.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've commented them out, and it still accepts the new spans I added.
Looks like the old spans were too big and causing issues here. I'll keep them here in case we run into the issue and need them again to debug.

Comment on lines +26 to +30
- client
- result
- non_custody_indices
- imported_blocks
- missing_column_indexes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the fields that gets added as labels in the generated metrics for query purpose.

mergify bot pushed a commit to sigp/lighthouse that referenced this pull request Aug 8, 2025
#7815

- removes all existing spans, so some span fields that appear in logs like `service_name` may be lost.
- instruments a few key code paths in the beacon node, starting from **root spans** named below:

* Gossip block and blobs
* `process_gossip_data_column_sidecar`
* `process_gossip_blob`
* `process_gossip_block`
* Rpc block and blobs
* `process_rpc_block`
* `process_rpc_blobs`
* `process_rpc_custody_columns`
* Rpc blocks (range and backfill)
* `process_chain_segment`
* `PendingComponents` lifecycle
* `pending_components`

To test locally:
* Run Grafana and Tempo with sigp/lighthouse-metrics#57
* Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317`

Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg

Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively:
<img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />
@michaelsproul
Copy link
Member

Shall we merge this?

@jimmygchen
Copy link
Member Author

jimmygchen commented Aug 11, 2025

Shall we merge this?

Yes let's merge this. I've been using this setup for a while with no issues.

(i don't have merge rights to this repo)

@michaelsproul michaelsproul merged commit fb5bde3 into sigp:master Aug 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants