-
Notifications
You must be signed in to change notification settings - Fork 957
Ignore extra columns in da cache #8201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thinking about this a bit more, seems like (2) is swallowing the error here. |
| panic!("Expected successful reconstruction {:?}", e); | ||
| } | ||
| }; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wanted to assert matches!(availability, Availability::MissingComponents(_)))
But that doesn't work because we are adding a pre execution block. I figured its okay because reconstruction_result wouldn't have been success otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why wouldn't this work? A pre execution block should still result in MissingComponents
lighthouse/beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
Lines 228 to 231 in 9d0e44b
| let Some(CachedBlock::Executed(block)) = &self.block else { | |
| // Block not available yet | |
| return Ok(None); | |
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah Availability is probably not important for this test, it could be either MissingComponents or Imported depending on the execution status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant it was returning MissingComponents when I wanted it to change after the reconstruction. But you're right we don't need it
Perhaps we can be a bit defensive here and perform a check, given the DA checker now knows about the sampling columns? |
jimmygchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I've added some comments, let me know what you think.
jimmygchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
Doesn't seem to impactful to do the extra check right now, as we already have checks in all the code paths calling DA checker. We're reworking the DA checker post fulu anyways, so it could be done in a separate PR or as part of the rework.
N/A
Found this issue in sepolia. Note: the custody requirement for this node is 100.
```
Oct 14 11:25:40.053 DEBUG Reconstructed columns count: 28, block_root: 0x4d7946dec0ab59f2afd46610d7c54af555cb4c2851d9eea7d83dd17cf6e96aae, slot: 8725628
Oct 14 11:25:45.568 WARN Internal availability check failure block_root: 0x4d7946dec0ab59f2afd46610d7c54af555cb4c2851d9eea7d83dd17cf6e96aae, error: Unexpected("too many columns got 128 expected 100")
```
So if any of the block components arrives late, then we reconstruct all 128 columns and try to add it to da cache and have more columns than needed for availability in the cache.
There are 2 ways I can think of fixing this:
1. pass only the required columns to the da cache after reconstruction here https://github.com/sigp/lighthouse/blob/60df5f4ab609362711f4f518eb8f03df447bfedb/beacon_node/beacon_chain/src/data_availability_checker.rs#L647-L648
2. Ensure that we add only columns that we need to sample in the da cache. I think this is safer since we can add columns to the cache from multiple code paths and this fixes it at the source.
~~This PR implements (2).~~ Thought more about it, I think (1) is cleaner since we filter gossip and rpc columns also before calling `put_kzg_verified_data_columns`/
Co-Authored-By: Pawan Dhananjay <[email protected]>
Issue Addressed
N/A
Proposed Changes
Found this issue in sepolia. Note: the custody requirement for this node is 100.
So if any of the block components arrives late, then we reconstruct all 128 columns and try to add it to da cache and have more columns than needed for availability in the cache.
There are 2 ways I can think of fixing this:
lighthouse/beacon_node/beacon_chain/src/data_availability_checker.rs
Lines 647 to 648 in 60df5f4
This PR implements (2).Thought more about it, I think (1) is cleaner since we filter gossip and rpc columns also before callingput_kzg_verified_data_columns/