-
Notifications
You must be signed in to change notification settings - Fork 957
Improve get_custody_columns validation, caching and error handling
#6308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve get_custody_columns validation, caching and error handling
#6308
Conversation
beacon_node/lighthouse_network/src/discovery/subnet_predicate.rs
Outdated
Show resolved
Hide resolved
|
I don't really like the complexity in error handling here, will work on simplifying it tomorrow. |
b9a2cfd to
b455de6
Compare
get_custody_columns and improve error handlingget_custody_columns validation, caching and error handling
|
@pawanjay176 I've re-written the PR and I think it's in a better place now and ready for review. |
b4c6f0d to
3e040e5
Compare
f3e99e9 to
73a340c
Compare
73a340c to
82a578f
Compare
beacon_node/lighthouse_network/src/discovery/subnet_predicate.rs
Outdated
Show resolved
Hide resolved
beacon_node/lighthouse_network/src/peer_manager/peerdb/peer_info.rs
Outdated
Show resolved
Hide resolved
beacon_node/lighthouse_network/src/peer_manager/peerdb/peer_info.rs
Outdated
Show resolved
Hide resolved
# Conflicts: # beacon_node/lighthouse_network/src/discovery/subnet_predicate.rs # beacon_node/lighthouse_network/src/peer_manager/peerdb.rs # beacon_node/lighthouse_network/src/peer_manager/peerdb/peer_info.rs # beacon_node/lighthouse_network/src/types/globals.rs # beacon_node/network/src/service.rs # consensus/types/src/data_column_subnet_id.rs
…ute_custody_requirement_subnets` function.
…onnection since we get them on metadata anyway.
5def040 to
0a717ff
Compare
beacon_node/lighthouse_network/src/peer_manager/peerdb/peer_info.rs
Outdated
Show resolved
Hide resolved
| try_create_int_gauge_vec( | ||
| "peers_per_custody_subnet_count", | ||
| "The current count of peers by custody subnet count", | ||
| &["custody_subnet_count"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider doing a histogram here as we can have buckets instead of exact values. Say we have a peer with csc 128 and another with 127, can grafana bucket these two labels into the same range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think buckets are useful, but historgrams accumulate observersations over time and it doesn't have the ability to track the current count for each bucket at a specific moment. I feel like gauge is more suitable but if you have a way to represent the metric in histogram I'm interested to hear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Lodestar we reset the histogram buckets before every scrape. It feels weird but renders nicely on Grafana.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good idea. I've tried it and unfortunately couldn't find a way to set it with the prometheus crate though.
Closest thing that I could find was histogram.metric().clear_histogram(); but it doesn't do anything to the actual metric.
I'm going to merge this now, if we find any better way to do it, happy to do it in a follow up PR.
dapplion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
@mergify queue |
✅ The pull request has been merged automaticallyDetailsThe pull request has been merged automatically at c0b4f01 |
Add retropgf funding (sigp#6324) Migrate from `ethereum-types` to `alloy-primitives` (sigp#6078) * Remove use of ethers_core::RlpStream * Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core * Remove old code * Simplify keccak call * Remove unused package * Merge branch 'unstable' of https://github.com/ethDreamer/lighthouse into remove_use_of_ethers_core * Merge branch 'unstable' into remove_use_of_ethers_core * Run clippy * Merge branch 'remove_use_of_ethers_core' of https://github.com/dospore/lighthouse into remove_use_of_ethers_core * Check all cargo fmt * migrate to alloy primitives init * fix deps * integrate alloy-primitives * resolve dep issues * more changes based on dep changes * add TODOs * Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core * Revert lock * Add BeaconBlocksByRange v3 * continue migration * Revert "Add BeaconBlocksByRange v3" This reverts commit e3ce7fc. * impl hash256 extended trait * revert some uneeded diffs * merge conflict resolved * fix subnet id rshift calc * rename to FixedBytesExtended * debugging * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * fix failed test * fixing more tests * Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core * introduce a shim to convert between the two u256 types * move alloy to wrokspace * align alloy versions * update * update web3signer test certs * refactor * resolve failing tests * linting * fix graffiti string test * fmt * fix ef test * resolve merge conflicts * remove udep and revert cert * cargo patch * cyclic dep * fix build error * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * resolve conflicts, update deps * merge unstable * fmt * fix deps * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * resolve merge conflicts * resolve conflicts, make necessary changes * Remove patch * fmt * remove file * merge conflicts * sneaking in a smol change * bump versions * Merge remote-tracking branch 'origin/unstable' into migrate-to-alloy-primitives * Updates for peerDAS * Update ethereum_hashing to prevent dupe * updated alloy-consensus, removed TODOs * cargo update * endianess fix * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * fmt * fix merge * fix test * fixed_bytes crate * minor fixes * convert u256 to i64 * panic free mixin to_low_u64_le * from_str_radix * computbe_subnet api and ensuring we use big-endian * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * fix test * Simplify subnet_id test * Simplify some more tests * Add tests to fixed_bytes crate * Merge branch 'unstable' into migrate-to-alloy-primitives update libp2p to version 0.54 (sigp#6249) * update libp2p to version 0.54.0 * address review * Merge branch 'unstable' of github.com:sigp/lighthouse into update-libp2p * Merge branch 'update-libp2p' of github.com:sigp/lighthouse into update-libp2p Drop block data from BlockError and BlobError (sigp#5735) * Drop block data from BlockError and BlobError * Debug release tests * Fix release tests * Revert unnecessary changes to lighthouse_metrics Update manual checkpoint sync content in Lighthouse book (sigp#6338) * Update manual checkpiont sync * Update faq * Minor revision * Minor revision Light Client Bug Fix (sigp#6299) * Light Client Bug Fix Ignore Rust 1.82 warnings about void patterns (sigp#6357) * Ignore Rust 1.82 warnings about void patterns Enable `large_stack_frames` lint (sigp#6343) * Enable `large_stack_frames` lint Add blob count label to `DATA_COLUMN_SIDECAR_COMPUTATION` metric (sigp#6340) * Add blob count label to `DATA_COLUMN_SIDECAR_COMPUTATION` metric, and move metrics into the compute function, recording only successful computation. * Move `discard_timer_on_break` usage to caller site. * Merge branch 'unstable' into compute-data-column-metric * Merge branch 'unstable' into compute-data-column-metric Metadata request ordering (sigp#6336) * Send metadata request ordering * Merge branch 'unstable' into metadata-order Clarify validator monitor block log (sigp#6342) * Clarify validator monitor block log * Merge branch 'unstable' into clarify-block-log Check known parent on rpc blob process (sigp#5893) * Check known parent on rpc blob process * fix test * Merge branch 'unstable' of https://github.com/sigp/lighthouse into blob-unknown-parent Remove beta tag from gossipsub 1.2 (sigp#6344) * Remove the beta tag from gossipsub v1.2 * fix clippy * Merge branch 'unstable' into remove-beta-tag Fix lints for Rust 1.81 (sigp#6363) * Fix lints for Rust 1.81 Use tikv-jemallocator instead of jemallocator (sigp#6354) * Use tikv-jemallocator instead of jemallocator * Merge branch 'unstable' into tikv-jemallocator * Bump tikv-jemallocator and tikv-jemalloc-ctl Improve `get_custody_columns` validation, caching and error handling (sigp#6308) * Improve `get_custody_columns` validation, caching and error handling. * Merge branch 'unstable' into get-custody-columns-error-handing * Fix failing test and add more test. * Fix failing test and add more test. * Merge branch 'unstable' into get-custody-columns-error-handing * Add unit test to make sure the default specs won't panic on the `compute_custody_requirement_subnets` function. * Add condition when calling `compute_custody_subnets_from_metadata` and update logs. * Validate `csc` when returning from enr. Remove `csc` computation on connection since we get them on metadata anyway. * Add `peers_per_custody_subnet_count` to track peer csc and supernodes. * Disconnect peers with invalid metadata and find other peers instead. * Fix sampling tests. * Merge branch 'unstable' into get-custody-columns-error-handing * Merge branch 'unstable' into get-custody-columns-error-handing Delete legacy payload reconstruction (sigp#6213) * Delete legacy payload reconstruction * Delete unneeded failing test * Merge remote-tracking branch 'origin/unstable' into remove-more-ethers * Merge remote-tracking branch 'origin/unstable' into remove-more-ethers * Cleanups simplify rpc codec logic (sigp#6304) * simplify rpc codec logic * Merge branch 'unstable' of github.com:sigp/lighthouse into simplify-rpc-codec * Merge branch 'unstable' of github.com:sigp/lighthouse into simplify-rpc-codec * Merge branch 'unstable' of github.com:sigp/lighthouse into simply-rpc-codec * Merge branch 'unstable' into simplify-rpc-codec * Merge branch 'unstable' into simplify-rpc-codec
Add retropgf funding (sigp#6324) Migrate from `ethereum-types` to `alloy-primitives` (sigp#6078) * Remove use of ethers_core::RlpStream * Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core * Remove old code * Simplify keccak call * Remove unused package * Merge branch 'unstable' of https://github.com/ethDreamer/lighthouse into remove_use_of_ethers_core * Merge branch 'unstable' into remove_use_of_ethers_core * Run clippy * Merge branch 'remove_use_of_ethers_core' of https://github.com/dospore/lighthouse into remove_use_of_ethers_core * Check all cargo fmt * migrate to alloy primitives init * fix deps * integrate alloy-primitives * resolve dep issues * more changes based on dep changes * add TODOs * Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core * Revert lock * Add BeaconBlocksByRange v3 * continue migration * Revert "Add BeaconBlocksByRange v3" This reverts commit e3ce7fc. * impl hash256 extended trait * revert some uneeded diffs * merge conflict resolved * fix subnet id rshift calc * rename to FixedBytesExtended * debugging * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * fix failed test * fixing more tests * Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core * introduce a shim to convert between the two u256 types * move alloy to wrokspace * align alloy versions * update * update web3signer test certs * refactor * resolve failing tests * linting * fix graffiti string test * fmt * fix ef test * resolve merge conflicts * remove udep and revert cert * cargo patch * cyclic dep * fix build error * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * resolve conflicts, update deps * merge unstable * fmt * fix deps * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * resolve merge conflicts * resolve conflicts, make necessary changes * Remove patch * fmt * remove file * merge conflicts * sneaking in a smol change * bump versions * Merge remote-tracking branch 'origin/unstable' into migrate-to-alloy-primitives * Updates for peerDAS * Update ethereum_hashing to prevent dupe * updated alloy-consensus, removed TODOs * cargo update * endianess fix * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * fmt * fix merge * fix test * fixed_bytes crate * minor fixes * convert u256 to i64 * panic free mixin to_low_u64_le * from_str_radix * computbe_subnet api and ensuring we use big-endian * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * fix test * Simplify subnet_id test * Simplify some more tests * Add tests to fixed_bytes crate * Merge branch 'unstable' into migrate-to-alloy-primitives update libp2p to version 0.54 (sigp#6249) * update libp2p to version 0.54.0 * address review * Merge branch 'unstable' of github.com:sigp/lighthouse into update-libp2p * Merge branch 'update-libp2p' of github.com:sigp/lighthouse into update-libp2p Drop block data from BlockError and BlobError (sigp#5735) * Drop block data from BlockError and BlobError * Debug release tests * Fix release tests * Revert unnecessary changes to lighthouse_metrics Update manual checkpoint sync content in Lighthouse book (sigp#6338) * Update manual checkpiont sync * Update faq * Minor revision * Minor revision Light Client Bug Fix (sigp#6299) * Light Client Bug Fix Ignore Rust 1.82 warnings about void patterns (sigp#6357) * Ignore Rust 1.82 warnings about void patterns Enable `large_stack_frames` lint (sigp#6343) * Enable `large_stack_frames` lint Add blob count label to `DATA_COLUMN_SIDECAR_COMPUTATION` metric (sigp#6340) * Add blob count label to `DATA_COLUMN_SIDECAR_COMPUTATION` metric, and move metrics into the compute function, recording only successful computation. * Move `discard_timer_on_break` usage to caller site. * Merge branch 'unstable' into compute-data-column-metric * Merge branch 'unstable' into compute-data-column-metric Metadata request ordering (sigp#6336) * Send metadata request ordering * Merge branch 'unstable' into metadata-order Clarify validator monitor block log (sigp#6342) * Clarify validator monitor block log * Merge branch 'unstable' into clarify-block-log Check known parent on rpc blob process (sigp#5893) * Check known parent on rpc blob process * fix test * Merge branch 'unstable' of https://github.com/sigp/lighthouse into blob-unknown-parent Remove beta tag from gossipsub 1.2 (sigp#6344) * Remove the beta tag from gossipsub v1.2 * fix clippy * Merge branch 'unstable' into remove-beta-tag Fix lints for Rust 1.81 (sigp#6363) * Fix lints for Rust 1.81 Use tikv-jemallocator instead of jemallocator (sigp#6354) * Use tikv-jemallocator instead of jemallocator * Merge branch 'unstable' into tikv-jemallocator * Bump tikv-jemallocator and tikv-jemalloc-ctl Improve `get_custody_columns` validation, caching and error handling (sigp#6308) * Improve `get_custody_columns` validation, caching and error handling. * Merge branch 'unstable' into get-custody-columns-error-handing * Fix failing test and add more test. * Fix failing test and add more test. * Merge branch 'unstable' into get-custody-columns-error-handing * Add unit test to make sure the default specs won't panic on the `compute_custody_requirement_subnets` function. * Add condition when calling `compute_custody_subnets_from_metadata` and update logs. * Validate `csc` when returning from enr. Remove `csc` computation on connection since we get them on metadata anyway. * Add `peers_per_custody_subnet_count` to track peer csc and supernodes. * Disconnect peers with invalid metadata and find other peers instead. * Fix sampling tests. * Merge branch 'unstable' into get-custody-columns-error-handing * Merge branch 'unstable' into get-custody-columns-error-handing Delete legacy payload reconstruction (sigp#6213) * Delete legacy payload reconstruction * Delete unneeded failing test * Merge remote-tracking branch 'origin/unstable' into remove-more-ethers * Merge remote-tracking branch 'origin/unstable' into remove-more-ethers * Cleanups simplify rpc codec logic (sigp#6304) * simplify rpc codec logic * Merge branch 'unstable' of github.com:sigp/lighthouse into simplify-rpc-codec * Merge branch 'unstable' of github.com:sigp/lighthouse into simplify-rpc-codec * Merge branch 'unstable' of github.com:sigp/lighthouse into simply-rpc-codec * Merge branch 'unstable' into simplify-rpc-codec * Merge branch 'unstable' into simplify-rpc-codec
…igp#6308) * Improve `get_custody_columns` validation, caching and error handling. * Merge branch 'unstable' into get-custody-columns-error-handing * Fix failing test and add more test. * Fix failing test and add more test. * Merge branch 'unstable' into get-custody-columns-error-handing # Conflicts: # beacon_node/lighthouse_network/src/discovery/subnet_predicate.rs # beacon_node/lighthouse_network/src/peer_manager/peerdb.rs # beacon_node/lighthouse_network/src/peer_manager/peerdb/peer_info.rs # beacon_node/lighthouse_network/src/types/globals.rs # beacon_node/network/src/service.rs # consensus/types/src/data_column_subnet_id.rs * Add unit test to make sure the default specs won't panic on the `compute_custody_requirement_subnets` function. * Add condition when calling `compute_custody_subnets_from_metadata` and update logs. * Validate `csc` when returning from enr. Remove `csc` computation on connection since we get them on metadata anyway. * Add `peers_per_custody_subnet_count` to track peer csc and supernodes. * Disconnect peers with invalid metadata and find other peers instead. * Fix sampling tests. * Merge branch 'unstable' into get-custody-columns-error-handing * Merge branch 'unstable' into get-custody-columns-error-handing
Issue Addressed
This PR addresses two issues:
get_custody_columnsto prevent infinite loop when supplied custody subnet count is invalid (> total number of data column subnets).network_globals.custody_columnsto the caller site (see @pawanjay176's comment here)NetworkGlobalsinitialization rather than computing it every time.Error handling
For node's
custody_subnet_countfrom local metadata:For peer's
custody_subnet_countfrom ENR & metadata:custody_subnet_countis invalid, fall back tocustody_requirementcustody_requirement, they are still useful to us