Skip to content

Conversation

@eserilev
Copy link
Member

Issue Addressed

#7719

A POC that allocates 4 threads to column reconstruction tasks. This should also ensure that new tasks don't use the additional threads while reconstruction is running.

@eserilev eserilev added do-not-merge optimization Something to make Lighthouse run more efficiently. das Data Availability Sampling hardening labels Jul 24, 2025
@eserilev
Copy link
Member Author

eserilev commented Jul 24, 2025

@jimmygchen I know in the issue you mentioned that maybe oversubscription isn't the best idea. But just wanted to throw out this POC incase you found the direction useful. I haven't had a chance to test yet, but if you think this is useful I can go ahead and clean this up a bit and start testing

EDIT: I just read some of the comments in the other PR, probably should have done that before opening this lol. But I think this POC is maybe in the spirit of this

For heavy tasks that requires rayon, allowWorkTypes to acquire more than 1 worker (N)

@jimmygchen
Copy link
Member

btw the 4 threads was a bit arbitrary (could be a good starting point though)
For reconstruction, each blob takes roughly 150ms, so:

Blobs CPU time (ms) 4 threads (s) 8 threads (s) 16 threads (s)
48 7200 1.80 0.90 0.45
72 10800 2.70 1.35 0.68

there's probably no harm to use more threads if they are available to beacon processor (max_workers), maybe we could consider something like max(4, max_workers / 2)?

@eserilev
Copy link
Member Author

eserilev commented Aug 13, 2025

I hit this case a few times while running a node on devnet-3

None => {
warn!(
msg = "no new work and cannot spawn worker",
"Unexpected gossip processor condition"
);
None
}

Its interesting because the metrics themselves don't reflect that we ever max out our worker threads but this case can only be reached if theres no new work events and we are unable to spawn a new worker. This must mean we have hit a situation where we have over allocated threads to the reconstruction task. So I think this PR is doing what its supposed to do, though i'm unsure if we are achieving any real performance gains

This is a snapshot of some of the metrics on that same node:
https://snapshots.raintank.io/dashboard/snapshot/CsDrVt7tVj74LLNO6J5uqqGK5gI5UBQi

Note that I was running this node with prepare-all-payloads, subscribe-all-subnets, slasher, subscribe-all-data-column-subnets, and import-all-attestations in an attempt to "over extend" my node

Not planning on spending any more time on this in the near-term, just wanted to share this info in case someone finds it helpful

@michaelsproul michaelsproul added the beacon-processor Glorious beacon processor, guardian against chaos yet chaotic itself label Sep 18, 2025
mergify bot pushed a commit that referenced this pull request Sep 18, 2025
Part of #7866

- Continuation of #7921

In the above PR, we enabled rayon for batch KZG verification in chain segment processing. However, using the global rayon thread pool for backfill is likely to create resource contention with higher-priority beacon processor work.


  This PR introduces a dedicated low-priority rayon thread pool `LOW_PRIORITY_RAYON_POOL` and uses it for processing backfill chain segments.

This prevents backfill KZG verification from using the global rayon thread pool and competing with high-priority beacon processor tasks for CPU resources.

However, this PR by itself doesn't prevent CPU oversubscription because other tasks could still fill up the global rayon thread pool, and having an extra thread pool could make things worse. To address this we need the beacon
processor to coordinate total CPU allocation across all tasks, which is covered in:
- #7789


Co-Authored-By: Jimmy Chen <[email protected]>

Co-Authored-By: Eitan Seri- Levi <[email protected]>

Co-Authored-By: Eitan Seri-Levi <[email protected]>
@eserilev eserilev closed this Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

beacon-processor Glorious beacon processor, guardian against chaos yet chaotic itself das Data Availability Sampling do-not-merge hardening optimization Something to make Lighthouse run more efficiently.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants