-
Notifications
You must be signed in to change notification settings - Fork 38.7k
net: create IP to ASN database from file - makeseeds.py #24824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Concept ACK, but it looks like you're (accidentally, I guess) reverting some changes made recently. |
You are correct there were accidental reversions. These have all been remedied and this PR is now up to date with master. The issues pointed out in your initial review should now be resolved. |
|
This should be downloaded and used as needed. We aren't going to commit a 7mb blob to the repo. You should try and maintain a clean commit history as you develop, rather than pushing all of the random fixups and other commits to this PR. Your commits will need to be squashed before anything is merged in any case. |
Addressing the uncleanliness of this PR. Should this be PR remade? I can squash from here on out, I will learn from my mistakes but am still newish to git. The larger issue to determine is if downloading a blob at runtime any better than parsing http://www.team-cymru.com/IP-ASN-mapping.html as makeseeds.py does right now? The other option would be to determine an acceptable database size for the ip to ASN database? I don't foresee anything smaller than 1MB with trimming and compression. Things to compress the ASN database would be to
Edit: Would a duality solution where the user could download an ASN file and if detected makeseeds would use it, If no ip->asn database file exists then the failover would be to parse http://www.team-cymru.com/IP-ASN-mapping.html as makeseeds is doing right now. The issue as presented in #17020 suggests to not hardcode an ASN and then place it in a file in the same way as suspicious hosts are. The ASN directory is quite massive and doesn't follow any specific patterns due to the fact it's a registry of real world relations. Thus a hardcoded dataset with potentially billions of entries will require quite a bit of space. |
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
I have rebased the code with the requests from fanquake. Could I close this PR and file a new one to clean up some of my earlier 'spam' from too many commits? Thanks |
You can squash the commits, no need to open a new PR. See: https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#squashing-commits |
Loading ASN directory from a .tsv file. Added the use of a GZIP compressed ip -> ASN file. LINT Fixes - 7 words Fixed comparison to none using 'is' instead of '==' Fixed unintended changes to readme.md whitespace correction restored generate-seeds.py restored makeseeds.py Fixed a missing return type within exception Removed unused variable and if __main__ Updated to fetch and included legacy failover
f13a838 to
f170fb6
Compare
I have squashed these commits, thank you so much for the reference. The PR looks so much better.
There is now a mechanism in the class object to fetch the ip-asn directory database .gz file at runtime. The user can also load directly from a file they have created or obtained on their own if they wish without parsing an external source. A feature has been added is that if the asn db file is not loaded or used the class fails over to a 1:1 clone of the current implementation to decode the ASN information that fetches though an api request. Edit: Working on fixing this PR -> some branch elements are not in sync with the master. Note that the current implementation in this PR was able to resolve an IPv6 Address that was not resolvable via the current API service. If the ASN Database could not find a range for the current ip it would have failed over to the API call and the error in master would have been present in this branch. EDIT 2: I could use some assistance fixing this branch. I have made some mistakes and I don't know how to walk them back. |
Squashed to reduce spam - added features Loading ASN directory from a .tsv file. Added the use of a GZIP compressed ip -> ASN file. LINT Fixes - 7 words Fixed comparison to none using 'is' instead of '==' Fixed unintended changes to readme.md whitespace correction restored generate-seeds.py restored makeseeds.py Fixed a missing return type within exception Removed unused variable and if __main__ Updated to fetch and included legacy failover LINT fixes
3ed7168 to
d14742e
Compare
Squashed All Previous Commits RPC: Switch getblockfrompeer back to standard param name blockhash This commit partially reverts 923312f. Update RPC argument and field naming guideline in developer notes Co-authored-by: laanwj <[email protected]> build: fix MSVC build after subtree update Co-authored-by: Hennadii Stepanov <[email protected]> Co-authored-by: Aaron Clauson <[email protected]> build: remove --enable-experimental from libsecp256k1 configure build: remove some no-longer-needed var unexporting from configure key: use secp256k1_schnorrsig_sign32 over deprecated secp256k1_schnorrsig_sign The renaming occured in bitcoin-core/secp256k1#1089. Squased lint changes - added features Squashed to reduce spam - added features Loading ASN directory from a .tsv file. Added the use of a GZIP compressed ip -> ASN file. LINT Fixes - 7 words Fixed comparison to none using 'is' instead of '==' Fixed unintended changes to readme.md whitespace correction restored generate-seeds.py restored makeseeds.py Fixed a missing return type within exception Removed unused variable and if __main__ Updated to fetch and included legacy failover LINT fixes LINT fixes Squash - Too many commits Python include dns.resolver Failover implementation complete Removed DNS resolver Revert changes
8000957 to
623d4ff
Compare
Squashed All Previous Commits RPC: Switch getblockfrompeer back to standard param name blockhash This commit partially reverts 923312f. Update RPC argument and field naming guideline in developer notes Co-authored-by: laanwj <[email protected]> build: fix MSVC build after subtree update Co-authored-by: Hennadii Stepanov <[email protected]> Co-authored-by: Aaron Clauson <[email protected]> build: remove --enable-experimental from libsecp256k1 configure build: remove some no-longer-needed var unexporting from configure key: use secp256k1_schnorrsig_sign32 over deprecated secp256k1_schnorrsig_sign The renaming occured in bitcoin-core/secp256k1#1089. Squased lint changes - added features Squashed to reduce spam - added features Loading ASN directory from a .tsv file. Added the use of a GZIP compressed ip -> ASN file. LINT Fixes - 7 words Fixed comparison to none using 'is' instead of '==' Fixed unintended changes to readme.md whitespace correction restored generate-seeds.py restored makeseeds.py Fixed a missing return type within exception Removed unused variable and if __main__ Updated to fetch and included legacy failover LINT fixes LINT fixes Squash - Too many commits Python include dns.resolver Failover implementation complete Removed DNS resolver Revert changes
|
This is where this could get tricky to solve with the current implementation. Right now the limitation is that there can only be one input file to makeseeds.py per example instead of
There is the possibility to maintain 100% backwards compatibility with the existing commands if that is desired. |
|
Yes, adding a command line argument for the ASN database is fine! There's no pressing reason for backwards compatibility, as far as I know no one uses this as part of automatic scripting, just make sure the documentation (the instructions in the README.md) is up to date. |
|
Bitcoin Core itself can also use an ASN database, with Would it be reasonable to try to use the same format? There is Python lookup code for it in https://github.com/sipa/asmap/blob/master/testmap.py for example. |
|
@sipa Sounds like a great idea to me. I had asmap in the back of my head, but somehow thought it was solving a different issue. But it makes sense. |
author MarcoFalke <[email protected]> 1649237525 +0200 committer russeree <[email protected]> 1650013843 -0700 parent 10f629e author MarcoFalke <[email protected]> 1649237525 +0200 committer russeree <[email protected]> 1650013815 -0700 parent 10f629e author MarcoFalke <[email protected]> 1649237525 +0200 committer russeree <[email protected]> 1650013794 -0700 ci: Build all optional tools in tidy task lint: remove boost::bind linter I don't think we need to maintain a linter for reintroducing boost::bind at this point. doc: Convert remaining comments to clang-tidy format [docs] package feerate [packages/policy] use package feerate in package validation This allows CPFP within a package prior to submission to mempool. [validation] try individual validation before package validation This avoids "parents pay for children" and "siblings pay for siblings" behavior, since package feerate is calculated with totals and is topology-unaware. It also ensures that package validation never causes us to reject a transaction that we would have otherwise accepted in single-tx validation. [unit test] package feerate and package cpfp [validation] don't package validate if not policy or missing inputs Package validation policy only differs from individual policy in its evaluation of feerate. Minimize DoS surface; don't validate all over again if we know the result will be the same. lint: remove qt SIGNAL/SLOT lint I think we are past the point where we need to lint for this, the CPU can probably be better utilized. refactor: Remove deduplication of data in rollingbloom bench lint: codespell 2.1.0 lint: flake8 4.0.1 lint: mypy 0.942 refactor: fixup named args in txpackage tests Regression in bitcoin#24152. Put lock logging behind DEBUG_LOCKCONTENTION preprocessor directive Add DEBUG_LOCKCONTENTION documentation to the developer notes Squash Squashed All Previous Commits RPC: Switch getblockfrompeer back to standard param name blockhash This commit partially reverts 923312f. Update RPC argument and field naming guideline in developer notes Co-authored-by: laanwj <[email protected]> build: fix MSVC build after subtree update Co-authored-by: Hennadii Stepanov <[email protected]> Co-authored-by: Aaron Clauson <[email protected]> build: remove --enable-experimental from libsecp256k1 configure build: remove some no-longer-needed var unexporting from configure key: use secp256k1_schnorrsig_sign32 over deprecated secp256k1_schnorrsig_sign The renaming occured in bitcoin-core/secp256k1#1089. Squased lint changes - added features Squashed to reduce spam - added features Loading ASN directory from a .tsv file. Added the use of a GZIP compressed ip -> ASN file. LINT Fixes - 7 words Fixed comparison to none using 'is' instead of '==' Fixed unintended changes to readme.md whitespace correction restored generate-seeds.py restored makeseeds.py Fixed a missing return type within exception Removed unused variable and if __main__ Updated to fetch and included legacy failover LINT fixes LINT fixes Squash - Too many commits Python include dns.resolver Failover implementation complete Removed DNS resolver Revert changes refactor: Remove deduplication of data in rollingbloom bench lint: mypy 0.942 refactor: fixup named args in txpackage tests Regression in bitcoin#24152. Put lock logging behind DEBUG_LOCKCONTENTION preprocessor directive Add DEBUG_LOCKCONTENTION documentation to the developer notes Squash Squashed All Previous Commits RPC: Switch getblockfrompeer back to standard param name blockhash This commit partially reverts 923312f. Update RPC argument and field naming guideline in developer notes Co-authored-by: laanwj <[email protected]> build: fix MSVC build after subtree update Co-authored-by: Hennadii Stepanov <[email protected]> Co-authored-by: Aaron Clauson <[email protected]> build: remove --enable-experimental from libsecp256k1 configure build: remove some no-longer-needed var unexporting from configure key: use secp256k1_schnorrsig_sign32 over deprecated secp256k1_schnorrsig_sign The renaming occured in bitcoin-core/secp256k1#1089. Squased lint changes - added features Squashed to reduce spam - added features Loading ASN directory from a .tsv file. Added the use of a GZIP compressed ip -> ASN file. LINT Fixes - 7 words Fixed comparison to none using 'is' instead of '==' Fixed unintended changes to readme.md whitespace correction restored generate-seeds.py restored makeseeds.py Fixed a missing return type within exception Removed unused variable and if __main__ Updated to fetch and included legacy failover LINT fixes LINT fixes Squash - Too many commits Python include dns.resolver Failover implementation complete Removed DNS resolver Revert changes Put lock logging behind DEBUG_LOCKCONTENTION preprocessor directive Add DEBUG_LOCKCONTENTION documentation to the developer notes Squash Squashed All Previous Commits RPC: Switch getblockfrompeer back to standard param name blockhash This commit partially reverts 923312f. Update RPC argument and field naming guideline in developer notes Co-authored-by: laanwj <[email protected]> build: fix MSVC build after subtree update Co-authored-by: Hennadii Stepanov <[email protected]> Co-authored-by: Aaron Clauson <[email protected]> build: remove --enable-experimental from libsecp256k1 configure build: remove some no-longer-needed var unexporting from configure key: use secp256k1_schnorrsig_sign32 over deprecated secp256k1_schnorrsig_sign The renaming occured in bitcoin-core/secp256k1#1089. Squased lint changes - added features Squashed to reduce spam - added features Loading ASN directory from a .tsv file. Added the use of a GZIP compressed ip -> ASN file. LINT Fixes - 7 words Fixed comparison to none using 'is' instead of '==' Fixed unintended changes to readme.md whitespace correction restored generate-seeds.py restored makeseeds.py Fixed a missing return type within exception Removed unused variable and if __main__ Updated to fetch and included legacy failover LINT fixes LINT fixes Squash - Too many commits Python include dns.resolver Failover implementation complete Removed DNS resolver Revert changes RPC: Switch getblockfrompeer back to standard param name blockhash This commit partially reverts 923312f. Update RPC argument and field naming guideline in developer notes Co-authored-by: laanwj <[email protected]> build: fix MSVC build after subtree update Co-authored-by: Hennadii Stepanov <[email protected]> Co-authored-by: Aaron Clauson <[email protected]> build: remove --enable-experimental from libsecp256k1 configure build: remove some no-longer-needed var unexporting from configure key: use secp256k1_schnorrsig_sign32 over deprecated secp256k1_schnorrsig_sign The renaming occured in bitcoin-core/secp256k1#1089. test: compare `/chaininfo` response with `getblockchaininfo` RPC test: use MiniWallet for feature_fee_estimation.py This test can now be run even with the Bitcoin Core wallet disabled. Converted lint-python-mutable-default-parameters.sh to python Change permission Change argument so that it's compatiable with python 3.6 Change comment to docstring Remove .split, .append, .extend calls. Remove 'output' variable assignment build: Do not define `PROVIDE_FUZZ_MAIN_FUNCTION` macro unconditionally builder-keys: Add will8clark gui: add FormatPeerAge() utility helper Co-authored-by: randymcmillan <[email protected]> gui: add Age column to peers tab Co-authored-by: Jon Atack <[email protected]> gui: peersWidget - ResizeToContents Age and IP/Netmask columns Co-authored-by: Hennadii Stepanov <[email protected]> gui: add test runner summary gui: count test failures in test runner summary gui, refactor: rename fInvalid to num_test_failures in test_main.cpp qt: Fix headers This change is preparation for Qt 6, and it fixes an experimental build with Qt 6.2.4. qt: Use `|` instead of `+` for key modifiers This change is preparation for Qt 6 where `+` has been deprecated, and it fixes an experimental build with Qt 6.2.4. qt: Update deprecated enum value This change is preparation for Qt 6, and it fixes an experimental build with Qt 6.2.4. The `Qt::ItemIsTristate` value has been deprecated since 5.6.0 (see ae8406d82f541f6d9112bdac192e5e4e114d56aa upstream commit). print `(none)` if no warnings in -getinfo build, refactor: Drop useless `call` Make function util, refactor: Add UNIQUE_NAME helper macro This change replaces repetitive code with a helper macro. Replace uint256 specific implementations of base_uint::GetHex() and base_uint::SetHex() with proper ones that don't depend on uint256 and replace template methods instantiations of base_uint with template class instantiation guix: fix GCC 10.3.0 + mingw-w64 setjmp/longjmp issues This commit backports a patch to the GCC 10.3.0 we build for Windows cross-compilation in Guix. The commit has been backported to the GCC releases/gcc-10 branch, but hasn't yet made it into a release. The patch corrects a regression from an earlier GCC commit, see: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=357c4350680bf29f0c7a115424e3da11c53b5582 and https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=074226d5aa86cd3de517014acfe34c7f69a2ccc7, related to the way newer versions of mingw-w64 implement setjmp/longjmp. Ultimately this was causing a crash for us when Windows users were viewing the network traffic tab inside the GUI. After some period, long enough that a buffer would need reallocating, a call into FreeTypes gray_record_cell() would result in a call to ft_longjmp (longjmp), which would then trigger a crash. Fixes: bitcoin-core/gui#582. See also: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e8d1ca7d2c344a411779892616c423e157f4aea8. https://bugreports.qt.io/browse/QTBUG-93476. doc: Remove fee delta TODO from txmempool.cpp net: remove non-blocking bool from interface lint: Convert lint-logs.sh to Python test: determine path to `bitcoin-util` in test framework The path is stored in `self.options.bitcoinutil`, points to `src/bitcoin-util` by default and can be overrided with the `BITCOINUTIL` environment variable. test: add `is_bitcoin_util_compiled` helper test: add test for signet miner script depends: Add file-based logging for individual packages ci: Make log verbose in error case only This change silences depends build using LOG=1. doc: Add pre-splitoff translation update to release-process.md
|
I know this isn't a tech support forum but my branch is quite messed up at this point. Would it be acceptable to close/delete this PR and start over. I feel my code has great value but this PR is not following the contributing guidelines regarding squashing to any degree. At this point there doesn't seem to be a good solution to squashing/merging to tidy this branch up. All attempts to squash end up with a massive number of conflicts. There was a pull request that was merged into makeseeds.py this evening and trying to pull master and merge with my branch didn't go well at all. |
Using
There are many different ways to achieve this cleanup. I performed the following; you could adapt it for working on your local branch: git fetch upstream pull/24824/head:24824
git checkout 24824
git reset origin/master
git add contrib/seeds/*
git commit -m "contrib: create IP to ASN database from file"
# git stash'd anything else
git push origin the-branch-for-this-pr -f |
|
I might pick this up and implement @sipa's idea. Edit: working on this, will open a new PR shortly. |
|
Closing in favor of #24864 |
👍 If you like, reviewing and testing #24864 could be a great next step. |
Add an argument `-a` to provide a asmap file to do the IP to ASN lookups. This speeds up the script greatly, and makes the output deterministic. Also removes the dependency on `dns.lookup`. I've annotated the output with ASxxxx comments to provide a way to verify the functionality. For now I've added instructions in README.md to download and use the `demo.map` from the asmap repository. When we have some other mechanism for distributing asmap files we could switch to that. This continues bitcoin#24824. I've removed all the fallbacks and extra complexity, as everyone will be using the same instructions anyway. Co-authored-by: Pieter Wuille <[email protected]> Co-authored-by: James O'Beirne <[email protected]> Co-authored-by: russeree <[email protected]>
Add an argument `-a` to provide a asmap file to do the IP to ASN lookups. This speeds up the script greatly, and makes the output deterministic. Also removes the dependency on `dns.lookup`. I've annotated the output with ASxxxx comments to provide a way to verify the functionality. For now I've added instructions in README.md to download and use the `demo.map` from the asmap repository. When we have some other mechanism for distributing asmap files we could switch to that. This continues bitcoin#24824. I've removed all the fallbacks and extra complexity, as everyone will be using the same instructions anyway. Co-authored-by: Pieter Wuille <[email protected]> Co-authored-by: James O'Beirne <[email protected]> Co-authored-by: russeree <[email protected]>
667e316 contrib: Update makeseeds to asmap-nextgen (laanwj) ae00b9e contrib: add seeds progress indicator and remove asmap one in makeseeds script (Jon Atack) b541803 contrib: Use asmap for ASN lookup in makeseeds (laanwj) Pull request description: Add an argument `-a` to provide a asmap file to do the IP to ASN lookups. This speeds up the script greatly, and makes the output deterministic. Also removes the dependency on `dns.lookup`. I've annotated the output with ASxxxx comments to provide a way to verify the functionality. For now I've added instructions in README.md to download and use the `demo.map` from the asmap repository. When we have some other mechanism for distributing asmap files we could switch to that. This continues #24824. I've removed the fallbacks and extra complexity, as everyone will be using the same instructions anyway. Co-authored-by: Pieter Wuille <[email protected]> Co-authored-by: russeree <[email protected]> ACKs for top commit: sipa: ACK 667e316 dunxen: re-ACK 667e316 Tree-SHA512: c4cedfbd1dee6be7547aa92dd9e262c46f0ff8099e647559b2a40eab0cc9874e9a813706630dd5c880390d23f432e789fb3e7e8a09f376f567071e68f5904c65


Per task number 1 in issue #17020 , "Read ASNs from a file instead of hardcoding"
The function 'lookup_asn' has been replaced in a drop in manner by a tab separated value (.tsv) file loader class and function set. This drop in bypasses the current REST based ASN implementation where a 3rd party is contacted per ip address to fetch the current ASN for that ip address. A REST approach is slower than the localized database search found in this PR.
Tech Details
This drop in replacement utilizes a class object that upon construction loads an ASN database file into memory. This class should be instantiated in 'makeseeds.py'. Once the class has been constructed the makeseeds.py can call asn_lookup() to get the ASN of an ip address from the now localized database. This approach affords speed and decentralization at the cost of a marginal repository size increase. As the old saying goes there is no such thing as a free lunch.
Edit: 4/11/2022
The class object has been reworked in such a manner that upon instantiation the class if the constructor is provided with a True value for 'fetch'. The class will acquire the compressed dataset upon runtime from the web address stored in the class object (this could be passes as parameter theoretically). This fetching process removes the need to store the ip to asn database as a blob in the repository. If fetch is set to False then the class object will attempt to load the default file. If neither a asn db file is present or a fetch is requested the failover mechanism kicks in an performs ip to asn decoding using the previous method currently in master. Failover is slower in all cases and seemly doesn't resolve ip addresses as frequently.
Datasource: https://iptoasn.com/ Combined IPv4+IPv6 to ASN map
Error messages for failed ip lookups from the previous REST are identical. Additional asserts have been added to avoid feeding the ASN database loader invalid data. These asserts should cover some of the most common issues.
Cost and Changes
2 files were added. 'ansdecode.py' which contains a class that loads a .tsv.gz file and has member functions that replace existing ASN lookups that are performed by parsing http://www.team-cymru.com/IP-ASN-mapping.html. The second file by default is ip2asn.tsv.gz. This filename is a class constructor parameter that can be changed if a user would like to use a custom file location. This ip2asn.tsv.gz is a compressed .tsv that has a format of [rangeStart rangeEnd asnNumber ...]\n. Only the first 3 columns of the TSV are utilized additional columns can be included but will only make importing less efficient by doing so.
Advantages
Disadvantages
Notes