Rework batcher concurrency #2017

MauroToscano · 2025-07-11T10:30:40Z

Rework batcher concurrency

This PR overhauls the concurrency of the batcher, to optimize proofs processing and avoid race conditions

Description

Two locks are added, one for the batcher and one for the user states. The key things that are done are:

If both locks are taken at the same time, we (almost*) always take the user locks first and the batcher locks afterward as to avoid deadlocks

The biggest change is in the batch creation:

Batch creation for ethereum is split in 3 phases:
- Phase 1: Extract the proofs from the queue that make a potential batch (Holds the batch lock)
- Phase 2: Release the lock, and post in ethereum this extracted batch
- If phase 2 succeds:
  - Update the user states (only proofs in batch and fees used are relevant here)
- If phase 2 fails:
  - Re add the proofs to the main queue, following standard eviction rules

Notice we tolerate that the user state to be temporarily inconsistent with the queue until we confirm the block. This means that we are a bit stricter than we could with proofs the user send in parallel until we confirm the posting. (we assume the proofs is still in the queue and is not paid, so the user may need a bit more of spare balance and cannot use arbitrary fees, since we ask for the proofs in the batch to have the same or lower fee if have a bigger nonce)

For proof submission, the general idea is:

Take the user lock
Analyze if the message is valid
Take the batch lock and add it to the queue
Update the user state and release the locks

The added complexity is handling the case of a full queue. Since we cannot be sure that we can take a user lock after the batch lock, what we do is:

Take the batch lock after the user lock and check if the queue is full
Going from the proofs with less fees to more fees, take the first proof whose user state is not locked (it's a for with a try_lock)
Compare fees against it

This mechanism avoids deadlocks, since it may happen that the "candidate" for eviction has it's state locked.

As a downside, it may be imprecise, since the user may need to bid more than N proofs, but it's not critical.

Another approach would be to briefly take the batch lock, peek to see if the queue is full, drop it and try to get the lock of the user with the least amount of fees. But this may lead to some edge cases that are harder to handle.

*The only exception to this rule is when we got a failure on sending a batch, in this scenario, to recover, we need to lock all the users states since the queue is finite and we may need to evict them and update their nonces. We don't actually take all the locks, and we may only need a couple. But to avoid a deadlock, we added a flag to avoid processing more users when a recovery is in progress which works in a similar manner. In the rare event that the lock we need is taken, the user task will timeout in 15s and release it for the restoration task

Type of change

Please delete options that are not relevant.

Bug fix
Refactor

…aligned_layer into rework_batcher_concurrency

Co-authored-by: MauroFab <[email protected]>

MarcosNicolau · 2025-08-20T18:16:32Z

crates/batcher/Cargo.toml

 ciborium = "=0.2.2"
 priority-queue = "2.1.0"
 reqwest = { version = "0.12", features = ["json"] }
+dashmap = "6.0.1"


Remove dependency

crates/batcher/Cargo.toml

crates/batcher/src/lib.rs

crates/Cargo.lock

MarcosNicolau

Looks good to me, the general code is much easier to follow through.

MauroToscano and others added 6 commits July 9, 2025 18:04

Rework

fc6826d

Remove all functions that takes and drops lock

173d4c8

Add lock per user state

db61877

Merge branch 'testnet' into rework_batcher_concurrency

7cb5982

Fmt

ade5648

Update cargo lock

4afe0f0

MauroToscano marked this pull request as ready for review July 16, 2025 20:37

MauroToscano and others added 15 commits July 16, 2025 17:51

Handle everything with the state locked

2cfa935

Rename function

dc40c0a

Rename variables

60b79b9

fmt

fc18d6c

Fix lib

6016431

Re organize data

fb38103

Checkpoint: Happy path

1c4a59e

Restore proofs

e60ad68

Split locks

9493a04

Add replacement and parallel eviction logic

859590a

Move proof verification before batch verifications

fcac386

Add claude code to gitignore

9dfd4a7

Remove claude code files

f90c394

Merge branch 'staging' into rework_batcher_concurrency

529ea95

Merge branch 'testnet' into rework_batcher_concurrency

4409143

MauroToscano changed the base branch from testnet to staging July 22, 2025 17:14

MauroToscano added 7 commits July 22, 2025 14:20

Reset submodules to match staging

28b95d3

Merge branch 'rework_batcher_concurrency' of github.com:yetanotherco/…

3569c2d

…aligned_layer into rework_batcher_concurrency

Simplify extract_batch

1545dcb

Simplify extract_batch

9f0fd15

Simplify batch queue creation algorithm

26328ec

Add test for only one proof in queue

9b9e819

Remove unused lock

0c6159a

MauroToscano and others added 15 commits August 7, 2025 13:16

Merge

87a425e

Add grafana

6487615

Update batch metrics after posting

e18554b

Improve comment

7805ae2

Re add missing grafana error items

0238f72

Fix fmt

558fea8

Fix path in examples l2

92c03f0

Merge branch 'staging' into rework_batcher_concurrency

d489ad2

Merge branch 'staging' into rework_batcher_concurrency

a785129

fix: show locks metrics correctly

95e99c2

refactor: use Hashmap instead of Dashmap (#2051)

f846b69

fix(batcher): initialize dummy state with correct nonce (#2057)

cb5c84e

docs(batcher): explain locking logic (#2058)

662bdd7

Merge

8adf57f

fix(batcher): remove is_recovering_from_submission_failure (#2056)

3492e7e

Co-authored-by: MauroFab <[email protected]>

JuArce approved these changes Aug 20, 2025

View reviewed changes

MarcosNicolau reviewed Aug 20, 2025

View reviewed changes

MauroToscano commented Aug 20, 2025

View reviewed changes

crates/batcher/Cargo.toml Outdated Show resolved Hide resolved

Remove dashmap as dependency

370e440

MarcosNicolau reviewed Aug 20, 2025

View reviewed changes

crates/batcher/src/lib.rs Outdated Show resolved Hide resolved

MarcosNicolau reviewed Aug 20, 2025

View reviewed changes

crates/Cargo.lock Show resolved Hide resolved

MarcosNicolau approved these changes Aug 20, 2025

View reviewed changes

MauroToscano and others added 2 commits August 22, 2025 17:20

Merge branch 'staging' into rework_batcher_concurrency

1be2e0f

remove comment

447efd4

MauroToscano enabled auto-merge August 22, 2025 20:27

MauroToscano added this pull request to the merge queue Aug 22, 2025

Merged via the queue into staging with commit 15c8d13 Aug 22, 2025
3 checks passed

MauroToscano deleted the rework_batcher_concurrency branch August 22, 2025 20:54

JuArce mentioned this pull request Aug 26, 2025

bug: pre-verification of proofs can DoS the batcher #1744

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rework batcher concurrency #2017

Rework batcher concurrency #2017

Uh oh!

MauroToscano commented Jul 11, 2025 •

edited

Loading

Uh oh!

MarcosNicolau Aug 20, 2025

Uh oh!

MauroToscano Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarcosNicolau left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Rework batcher concurrency #2017

Rework batcher concurrency #2017

Uh oh!

Conversation

MauroToscano commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rework batcher concurrency

Description

Type of change

Uh oh!

MarcosNicolau Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

MauroToscano Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarcosNicolau left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MauroToscano commented Jul 11, 2025 •

edited

Loading