Skip to content

Conversation

@JuArce
Copy link
Collaborator

@JuArce JuArce commented Aug 5, 2025

feat(aggregation mode): add retry logic to batches download

Description

This PR adds retry logic to the download of batches on the aggregation mode

The retry parameters are

// Retry parameters for S3 requests
/// Initial delay before first retry attempt (in milliseconds)
const RETRY_MIN_DELAY_MILLIS: u64 = 500;
/// Exponential backoff multiplier for retry delays
const RETRY_FACTOR: f32 = 2.0;
/// Maximum number of retry attempts
const RETRY_MAX_TIMES: usize = 5;
/// Maximum delay between retry attempts (in seconds)
const RETRY_MAX_DELAY_SECONDS: u64 = 10;

How to Test

  1. Run Ethereum Package
make ethereum_package_start
  1. Run batcher
make batcher_start_ethereum_package
  1. Send tasks
make batcher_send_proof_with_random_address
  1. Run aggregation mode
make proof_aggregator_start_ethereum_package AGGREGATOR=sp1

In this case, the aggregation mode should be able to fetch the batch from Localstack.

  1. Stop Localstack

  2. Run aggregation mode

make proof_aggregator_start_ethereum_package AGGREGATOR=sp1

In this case, you should see the process tries to fetch the batch, waiting before doing a retry using exponential backoff,

2025-08-05T18:22:09.684662Z  INFO proof_aggregator::backend::s3: Fetching batch from S3 URL: http://localhost:4566/aligned.storage/628809ec4e40fd964766a0fe33e47d6779816b411a8310e1c6202fe00f2d7675.json
2025-08-05T18:22:09.685407Z  WARN proof_aggregator::backend::s3: Failed to send request to http://localhost:4566/aligned.storage/628809ec4e40fd964766a0fe33e47d6779816b411a8310e1c6202fe00f2d7675.json: error sending request for url (http://localhost:4566/aligned.storage/628809ec4e40fd964766a0fe33e47d6779816b411a8310e1c6202fe00f2d7675.json)
2025-08-05T18:22:10.187615Z  INFO proof_aggregator::backend::s3: Fetching batch from S3 URL: http://localhost:4566/aligned.storage/628809ec4e40fd964766a0fe33e47d6779816b411a8310e1c6202fe00f2d7675.json
2025-08-05T18:22:10.189572Z  WARN proof_aggregator::backend::s3: Failed to send request to http://localhost:4566/aligned.storage/628809ec4e40fd964766a0fe33e47d6779816b411a8310e1c6202fe00f2d7675.json: error sending request for url (http://localhost:4566/aligned.storage/628809ec4e40fd964766a0fe33e47d6779816b411a8310e1c6202fe00f2d7675.json)
  1. Start Localstack again. In this case, the storage will be available, but the batch will not exist. This is a permanent error.

  2. Run aggregation mode

make proof_aggregator_start_ethereum_package AGGREGATOR=sp1

In this case, the error is permanent, so the process will not retry. You should see the following logs:

2025-08-05T18:27:32.397180Z ERROR proof_aggregator::backend::fetcher: Error while downloading proofs from s3. Err StatusFailed((404, "Not Found"))

Type of change

  • New feature

Checklist

  • “Hotfix” to testnet, everything else to staging
  • Linked to Github Issue
  • This change depends on code or research by an external entity
    • Acknowledgements were updated to give credit
  • Unit tests added
  • This change requires new documentation.
    • Documentation has been added/updated.
  • This change is an Optimization
    • Benchmarks added/run
  • Has a known issue
  • If your PR changes the Operator compatibility (Ex: Upgrade prover versions)
    • This PR adds compatibility for operator for both versions and do not change crates/docs/examples
    • This PR updates batcher and docs/examples to the newer version. This requires the operator are already updated to be compatible

@JuArce JuArce self-assigned this Aug 6, 2025
…retry-logic-to-batches-download

# Conflicts:
#	aggregation_mode/src/backend/s3.rs
@JuArce JuArce enabled auto-merge August 7, 2025 18:25
@JuArce JuArce added this pull request to the merge queue Aug 7, 2025
Merged via the queue into staging with commit 1f4accb Aug 7, 2025
2 of 3 checks passed
@JuArce JuArce deleted the 2043-feataggregation-mode-add-retry-logic-to-batches-download branch August 7, 2025 18:53
@JuArce JuArce linked an issue Aug 11, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(aggregation mode): add retry logic to batches download

4 participants