-
Notifications
You must be signed in to change notification settings - Fork 38.7k
contrib: add tool to convert compact-serialized UTXO set to SQLite database #27432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
contrib: add tool to convert compact-serialized UTXO set to SQLite database #27432
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/27432. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
|
This also closes #21670 ;-) |
494be8c to
3ce180a
Compare
What is the rationale for encoding as text rather than bytes? SQLite can store byte values as BLOBs. |
Fair question. There was already some discussion in #24952 about whether to store txids/scriptPubKeys as TEXT or BLOB, see #24952 (review), #24952 (comment) and #24952 (comment). The two main points were:
Considering the scriptPubKey column individually, there is no good reason to use TEXT rather than BLOB, but I went for TEXT mostly for consistency reasons, to not mix TEXT and BLOB in different columns when it's both binary data. That said, I'm also very open also for using BLOB instead, it's just a matter of trade-offs. |
|
Approach ACK. Seems like a fine idea to me.
It's a python conversion script: can't you just add a command-line option for the resulting db to have hex txids or big/little endian blobs if there's user demand for it? Hex encoding seems a fine default to me, for what it's worth. If people end up wanting lots of different options (convert scriptPubKeys to addresses? some way to update the db to a new state, rather than just create a new one?) maybe it would make sense for this script to have its own repo even; but while it stays simple/small, seems fine for contrib. |
|
Concept ACK, will test soon |
|
Concept ACK |
willcl-ark
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK 3ce180a
Left two nits which don't need addressing unless being re-touched, but overall this works well in testing and seems like a useful contrib script. Converting the output to json also worked as described in the comments above.
pablomartin4btc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
73b9cab to
a1c1cf3
Compare
Sorry for the extra-late reply, missed this message and the CI fail. Rebased on master and resolved the silent merge conflict (caused by the module move
Good idea, planning to tackle this as a follow-up. |
a1c1cf3 to
9668255
Compare
|
post-merge tACK 4080b66 |
|
FTR, it's possible to import the data from SQLite into DuckDB, to use its optimized query execution engine (for faster queries): ATTACH 'utxo.sqlite' AS utxo_sqlite (TYPE SQLITE, READONLY);
ATTACH 'utxo.duckdb' AS utxo_duckdb;
COPY FROM DATABASE utxo_sqlite TO utxo_duckdb;
USE utxo_duckdb;In [1]: import duckdb
In [2]: c = duckdb.connect()
In [3]: c.sql("ATTACH 'utxo.duckdb' AS utxo_duckdb")
In [4]: c.sql("USE utxo_duckdb")
In [5]: %time c.sql("SELECT count(*) FROM utxos").show()
┌──────────────┐
│ count_star() │
│ int64 │
├──────────────┤
│ 177884546 │
└──────────────┘
CPU times: user 2.16 ms, sys: 5.78 ms, total: 7.95 ms
Wall time: 7.1 ms
In [6]: %time c.sql("SELECT count(*) FROM utxos WHERE value < 1000").show()
┌──────────────┐
│ count_star() │
│ int64 │
├──────────────┤
│ 85656982 │
└──────────────┘
CPU times: user 926 ms, sys: 765 ms, total: 1.69 s
Wall time: 161 ms
In [7]: %time c.sql("SELECT count(*), CAST(height/50000 AS INTEGER) as i FROM utxos WHERE value < 1000 GROUP BY i ORDER BY i").show()
┌──────────────┬───────┐
│ count_star() │ i │
│ int64 │ int32 │
├──────────────┼───────┤
│ 501 │ 2 │
│ 364135 │ 3 │
│ 371358 │ 4 │
│ 883151 │ 5 │
│ 105033 │ 6 │
│ 264542 │ 7 │
│ 77163 │ 8 │
│ 256262 │ 9 │
│ 541590 │ 10 │
│ 1578882 │ 11 │
│ 3756328 │ 12 │
│ 1441847 │ 13 │
│ 712565 │ 14 │
│ 510563 │ 15 │
│ 34044107 │ 16 │
│ 34755149 │ 17 │
│ 5993806 │ 18 │
├──────────────┴───────┤
│ 17 rows 2 columns │
└──────────────────────┘
CPU times: user 3.08 s, sys: 676 ms, total: 3.75 s
Wall time: 320 ms |
|
Great to see that this was merged! |
95722d0 doc: add release note for #27432 (utxo-to-sqlite tool) (Sebastian Falbesoner) Pull request description: This PR adds a missing release note for #27432. ACKs for top commit: kevkevinpal: ACK [95722d0](95722d0) Tree-SHA512: 47714182e814fb825dbaeede46a1b496c4b87f3c5bfa61ada00138926a6bf9eb9a3f99636eb698a7bcda6642f73d0a8c5bf531a726750d594decca8ba6a56e3a
…513955891 29513955891 kernel: Add pure kernel bitcoin-chainstate 9c40433bd4a kernel: Add functions to get the block hash from a block 942df8f287f kernel: Add block index utility functions to C header 87102db87ac kernel: Add function to read block undo data from disk to C header 12b8c9442ad kernel: Add functions to read block from disk to C header d977db3feb2 kernel: Add function for copying block data to C header 8ae33627743 kernel: Add functions for the block validation state to C header 0565a0bbc01 kernel: Add validation interface to C header 837e5a0f536 kernel: Add interrupt function to C header a80b7bfe3de kernel: Add import blocks function to C header 54d1a1231ec kernel: Add chainstate load options for in-memory dbs in C header 659efa9969c kernel: Add options for reindexing in C header 2179127c079 kernel: Add block validation to C header 26143992693 kernel: Add chainstate loading when instantiating a ChainstateManager 82d2bebbe54 kernel: Add chainstate manager option for setting worker threads e875f520851 kernel: Add chainstate manager object to C header 4e486059178 kernel: Add notifications context option to C header a5eb699b978 kernel: Add chain params context option to C header 0818b8d2c07 kernel: Add kernel library context object 71c24c95b31 kernel: Add logging to kernel library C header 0cc810386f7 kernel: Introduce initial kernel C header API 82ba9257157 Merge bitcoin/bitcoin#31366: cmake: Check `-Wno-*` compiler options for `leveldb` target f236854a5bd Merge bitcoin/bitcoin#31731: doc: update translation generation cmake example eb51963d870 Merge bitcoin/bitcoin#31884: cmake: Make implicit `libbitcoinkernel` dependencies explicit 58f15d4b215 Merge bitcoin/bitcoin#31379: cmake: Fix passing `APPEND_*FLAGS` to `secp256k1` subtree e606c577cb2 Merge bitcoin/bitcoin#31899: cmake: Exclude generated sources from translation 758a93d6215 doc: update translation generation cmake example fd14995b6a8 Merge bitcoin/bitcoin#31908: Revert merge of PR #31826 3e9b12b3e0f Revert "Merge bitcoin/bitcoin#31826: random: Check `GetRNDRRS` is supported in `InitHardwareRand` to avoid infinite loop" 785649f3977 Merge bitcoin/bitcoin#29881: guix: use GCC 13 to build releases 139640079ff Merge bitcoin/bitcoin#31826: random: Check `GetRNDRRS` is supported in `InitHardwareRand` to avoid infinite loop dc3a7146337 Merge bitcoin/bitcoin#31794: wallet: abandon orphan coinbase txs, and their descendants, during startup 06757af2da5 Merge bitcoin/bitcoin#29156: tests: add functional test for miniscript decaying multisig ff4ddd3d2e3 Revert "cmake: Ensure generated sources are up to date for `translate` target" 03b3166aac5 cmake: Exclude generated sources from translation 43e287b3ff5 Merge bitcoin/bitcoin#31892: build: remove ENABLE_HARDENING condition from check-security 63d625f7610 Merge bitcoin/bitcoin#31893: test: remove scanning check on `wallet_importdescriptors` 3b42e05aa9e cmake: Make implicit `libbitcoinkernel` dependencies explicit 3fd64efb437 cmake: Avoid using `OBJECT` libraries 28dec6c5f8b Merge bitcoin/bitcoin#31268: cmake: add optional source files to bitcoin_crypto and crc32c directly 50afaf3a389 Merge bitcoin/bitcoin#31836: contrib: Add deterministic-fuzz-coverage 405dd0e647e test: remove scanning check on `wallet_importdescriptors` 113a7a363fa build: remove ENABLE_HARDENING cond from check-security 9da0820ec55 Merge bitcoin/bitcoin#31869: cmake: Add `libbitcoinkernel` target db36a92c02b Merge bitcoin/bitcoin#31879: doc: add release note for #27432 (utxo-to-sqlite tool) 95722d048a8 doc: add release note for #27432 (utxo-to-sqlite tool) e4dd5a351bd test: wallet, abandon coinbase txs and their descendants during startup 09b150bb8a
Problem description
There is demand from users to get the UTXO set in form of a SQLite database (#24628). Bitcoin Core currently only supports dumping the UTXO set in a binary compact-serialized format, which was crafted specifically for AssumeUTXO snapshots (see PR #16899), with the primary goal of being as compact as possible. Previous PRs tried to extend the
dumptxoutsetRPC with new formats, either in human-readable form (e.g. #18689, #24202), or most recently, directly as SQLite database (#24952). Both are not optimal: due to the huge size of the ever-growing UTXO set with already more than 80 million entries on mainnet, human-readable formats are practically useless, and very likely one of the first steps would be to put them in some form of database anyway. Directly adding SQLite3 dumping support on the other hand introduces an additional dependency to the non-wallet part of bitcoind and the risk of increased maintenance burden (see e.g. #24952 (comment), #24628 (comment)).Proposed solution
This PR follows the "external tooling" route by adding a simple Python script for achieving the same goal in a two-step process (first create compact-serialized UTXO set via
dumptxoutset, then convert it to SQLite via the new script). Executive summary:utxoswith the following schema:(txid TEXT, vout INT, value INT, coinbase INT, height INT, scriptpubkey TEXT)[1] note that there are some rare cases of operating systems like FreeBSD though, where the sqlite3 module has to installed explicitly (see #26819)
A functional test is also added that creates UTXO set entries with various output script types (standard and also non-standard, for e.g. large scripts) and verifies that the UTXO sets of both formats match by comparing corresponding MuHashes. One MuHash is supplied by the bitcoind instance via
gettxoutsetinfo muhash, the other is calculated in the test by reading back the created SQLite database entries and hashing them with the test framework'sMuHash3072module.Manual test instructions
I'd suggest to do manual tests also by comparing MuHashes. For that, I've written a go tool some time ago which would calculate the MuHash of a sqlite database in the created format (I've tried to do a similar tool in Python, but it's painfully slow).
For a demonstration what can be done with the resulting database, see #24952 (review) for some example queries. Thanks go to LarryRuane who gave me to the idea of rewriting this script in Python and adding it to
contrib.