Skip to content

Conversation

@pawanjay176
Copy link
Member

@pawanjay176 pawanjay176 commented Oct 14, 2025

Issue Addressed

N/A

Proposed Changes

Found this issue in sepolia. Note: the custody requirement for this node is 100.

Oct 14 11:25:40.053 DEBUG Reconstructed columns                         count: 28, block_root: 0x4d7946dec0ab59f2afd46610d7c54af555cb4c2851d9eea7d83dd17cf6e96aae, slot: 8725628
Oct 14 11:25:45.568 WARN  Internal availability check failure           block_root: 0x4d7946dec0ab59f2afd46610d7c54af555cb4c2851d9eea7d83dd17cf6e96aae, error: Unexpected("too many columns got 128 expected 100")

So if any of the block components arrives late, then we reconstruct all 128 columns and try to add it to da cache and have more columns than needed for availability in the cache.

There are 2 ways I can think of fixing this:

  1. pass only the required columns to the da cache after reconstruction here
    self.availability_cache
    .put_kzg_verified_data_columns(*block_root, data_columns_to_publish.clone())
  2. Ensure that we add only columns that we need to sample in the da cache. I think this is safer since we can add columns to the cache from multiple code paths and this fixes it at the source.

This PR implements (2). Thought more about it, I think (1) is cleaner since we filter gossip and rpc columns also before calling put_kzg_verified_data_columns/

@pawanjay176
Copy link
Member Author

Thinking about this a bit more, seems like (2) is swallowing the error here.

@michaelsproul michaelsproul added v8.0.0 Q4 2025 Fusaka Mainnet Release bug Something isn't working labels Oct 15, 2025
panic!("Expected successful reconstruction {:?}", e);
}
};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wanted to assert matches!(availability, Availability::MissingComponents(_)))
But that doesn't work because we are adding a pre execution block. I figured its okay because reconstruction_result wouldn't have been success otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why wouldn't this work? A pre execution block should still result in MissingComponents

let Some(CachedBlock::Executed(block)) = &self.block else {
// Block not available yet
return Ok(None);
};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah Availability is probably not important for this test, it could be either MissingComponents or Imported depending on the execution status

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant it was returning MissingComponents when I wanted it to change after the reconstruction. But you're right we don't need it

@pawanjay176 pawanjay176 added the ready-for-review The code is ready for review label Oct 15, 2025
@jimmygchen
Copy link
Member

Thinking about this a bit more, seems like (2) is swallowing the error here.

Perhaps we can be a bit defensive here and perform a check, given the DA checker now knows about the sampling columns?

Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I've added some comments, let me know what you think.

@jimmygchen jimmygchen added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Oct 15, 2025
Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!
Doesn't seem to impactful to do the extra check right now, as we already have checks in all the code paths calling DA checker. We're reworking the DA checker post fulu anyways, so it could be done in a separate PR or as part of the rework.

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Oct 16, 2025
@mergify mergify bot added the queued label Oct 16, 2025
mergify bot added a commit that referenced this pull request Oct 16, 2025
mergify bot added a commit that referenced this pull request Oct 16, 2025
@mergify mergify bot merged commit 73e75e3 into sigp:unstable Oct 16, 2025
59 of 60 checks passed
@mergify mergify bot removed the queued label Oct 16, 2025
jchavarri pushed a commit to jchavarri/lighthouse that referenced this pull request Oct 21, 2025
N/A


  Found this issue in sepolia. Note: the custody requirement for this node is 100.
```
Oct 14 11:25:40.053 DEBUG Reconstructed columns                         count: 28, block_root: 0x4d7946dec0ab59f2afd46610d7c54af555cb4c2851d9eea7d83dd17cf6e96aae, slot: 8725628
Oct 14 11:25:45.568 WARN  Internal availability check failure           block_root: 0x4d7946dec0ab59f2afd46610d7c54af555cb4c2851d9eea7d83dd17cf6e96aae, error: Unexpected("too many columns got 128 expected 100")
```

So if any of the block components arrives late, then we reconstruct all 128 columns and try to add it to da cache and have more columns than needed for availability in the cache.

There are 2 ways I can think of fixing this:
1. pass only the required columns to the da cache after reconstruction here https://github.com/sigp/lighthouse/blob/60df5f4ab609362711f4f518eb8f03df447bfedb/beacon_node/beacon_chain/src/data_availability_checker.rs#L647-L648
2. Ensure that we add only columns that we need to sample in the da cache. I think this is safer since we can add columns to the cache from multiple code paths and this fixes it at the source.

~~This PR implements (2).~~ Thought more about it, I think (1) is cleaner since we filter gossip and rpc columns also before calling `put_kzg_verified_data_columns`/


Co-Authored-By: Pawan Dhananjay <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready-for-merge This PR is ready to merge. v8.0.0 Q4 2025 Fusaka Mainnet Release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants