Improvement to the Test Framework in the processing of test blocks #4688

SergioDemianLerner · 2014-08-12T19:56:57Z

For several reasons the current test framework does not allow to easily incorporate new unit tests that append specially crafted blocks to the blockchain using ProcessBlock().
This was pointed out by Mike Hearn on the development list in the thread with subject "[Bitcoin-development] Question on creating test cases for block.CheckBlock()" (http://article.gmane.org/gmane.comp.bitcoin.devel/5939).

After debugging the Bitcoin core, we found that this is because of three reasons:

The miner_tests.cpp leaves the transaction pool in an invalid state and assumes the blockchain is empty on start. So other independent unit tests cannot be run before nor after (without proper cleaning) if they intend to use CreateNewBlock(), which collects transactions from the pool.
Creating test blocks requires proof-of-work. Even if the starting difficulty is low (1) the required proof-of-work is still too high to allow for the dynamic creation of test blocks. Instead, unit test have to be "pre-mined". This makes the code more opaque and increases the effort of changing the code.
Only the first unit test can start with an empty blockchain, all subsequent tests have to start with the blockchain in the state where the previous test left it. This has the following problematic consequences:
a. Each unit test makes the blockchain longer. This breaks the test sequence when a checkpoint is reached because a checkpoint requires a pre-determined hash digest. Moreover, there are exception cases for certain block heights (e.g. regarding allowing two transactions with the same hash) which could be violated.
b. Certain combinations of unit tests are inherently impossible to implement in a single block-chain if not run in an specific order.
Other combinations may cause unexpected consequences. For example, since version 1 blocks do not have the height field included in the coinbase field of the generation transaction, a unit test may create a coinbase tx with a future height in the height field and prevent a coinbase tx with the same hash to be used afterward when only blocks v2 are accepted (this happened to us while testing).
c. The standard unit testing policy that unit tests should be not depend on each other's output is violated.
This makes debugging more difficult.
d. "Pre-mining" unit tests is impossible unless all previous unit tests are known and never change. (However, we are proposing to eliminate the proof-of-work check regardless.)

We believe that testing the block acceptance rules is crucial for the safety of the application and so we wrote this patch.
By restarting the blockchain before every unit test that requires testing block acceptance we have the guarantee that all tests are independent, executed in a predefined reproducable environment, and don't unintentionally hit checkpoints or other exceptions. Nevertheless, each unit test decides whether to re-use or reset the block-chain. We haven't perceived any significant delay while performing the destruction and creation of the block-chain during the execution of the test application. This is because file space allocation functions are fast on modern filesystems. Nevertheless, UNDOFILE_CHUNK_SIZE/BLOCKFILE_CHUNK_SIZE can be reduced during test case execution if the block-chain is reset many times.

In detail, this patch solves 1.,2.,3. from above by:

Providing a method to reset the blockchain to the starting state (testingSetupManager.SetupGenesisBlockChain())
Allowing to dynamically skip the proof-of-work testing (supressCheckBlockWork = true)
Fixing the bug in miner_tests.cpp which leaves in the memory pool invalid transactions (mempool.clear() missing).

We've also found the exact procedure that can be used to programmatically destroy and re-create the blockchain correctly, which was not implemented and nor documented. Some cleanup methods existed but some other were added because they were missing.

This could be of great help to re-create completely the blockchain in case a severe damage has been detected, without restarting the application.

As a bonus, 7 unit tests have been added:

ToCheckBlockUpgradeMajority (untested before)
EnforceBlockUpgradeMajority (untested before)
RejectBlockOutdatedMajority (untested before)
"bad-cb-height"
"bad-version"
"time-too-old"
"bad-txns-nonfinal"

Last, we added a way to leave the blockchain unaltered after the test suite is over to debug the unit tests themselves (testingSetupManager.keepTestEvidence = true)

Note: The BerkeleyDB environment field was converted into a heap allocated object because BerkeleyDB handles are not meant to be re-used after close, and the block-chain environment can be closed and re-opened in the unit tests. This is explained in http://docs.oracle.com/cd/E17275_01/html/api_reference/CXX/envclose.html as
"After DbEnv::close() has been called, regardless of its return, the Berkeley DB environment handle may not be accessed again."

Sergio Demian Lerner & Timo Hanke

For several reasons the current test framework does not allow to easily incorporate new unit tests that append specially crafted blocks to the blockchain using ProcessBlock(). This was pointed out by Mike Hearn on the development list in the thread with subject "[Bitcoin-development] Question on creating test cases for block.CheckBlock()" (http://article.gmane.org/gmane.comp.bitcoin.devel/5939). After debugging the Bitcoin core, we found that this is because of three reasons: 1. The miner_tests.cpp leaves the transaction pool in an invalid state and assumes the blockchain is empty on start. So other independent unit tests cannot be run before nor after (without proper cleaning) if they intend to use CreateNewBlock(), which collects transactions from the pool. 2. Creating test blocks requires proof-of-work. Even if the starting difficulty is low (1) the required proof-of-work is still too high to allow for the dynamic creation of test blocks. Instead, unit test have to be "pre-mined". This makes the code more opaque and increases the effort of changing the code. 3. Only the first unit test can start with an empty blockchain, all subsequent tests have to start with the blockchain in the state where the previous test left it. This has the following problematic consequences: a. Each unit test makes the blockchain longer. This breaks the test sequence when a checkpoint is reached because a checkpoint requires a pre-determined hash digest. Moreover, there are exception cases for certain block heights (e.g. regarding allowing two transactions with the same hash) which could be violated. b. Certain combinations of unit tests are inherently impossible to implement in a single block-chain if not run in an specific order. Other combinations may cause unexpected consequences. For example, since version 1 blocks do not have the height field included in the coinbase field of the generation transaction, a unit test may create a coinbase tx with a future height in the height field and prevent a coinbase tx with the same hash to be used afterward when only blocks v2 are accepted (this happened to us while testing). c. The standard unit testing policy that unit tests should be not depend on each other's output is violated. This makes debugging more difficult. d. "Pre-mining" unit tests is impossible unless all previous unit tests are known and never change. (However, we are proposing to eliminate the proof-of-work check regardless.) We believe that testing the block acceptance rules is crucial for the safety of the application and so we wrote this patch. By restarting the blockchain before every unit test that requires testing block acceptance we have the guarantee that all tests are independent, executed in a predefined reproducable environment, and don't unintentionally hit checkpoints or other exceptions. Nevertheless, each unit test decides whether to re-use or reset the block-chain. We haven't perceived any significant delay while performing the destruction and creation of the block-chain during the execution of the test application. This is because file space allocation functions are fast on modern filesystems. Nevertheless, UNDOFILE_CHUNK_SIZE/BLOCKFILE_CHUNK_SIZE can be reduced during test case execution if the block-chain is reset many times. In detail, this patch solves 1.,2.,3. from above by: - Providing a method to reset the blockchain to the starting state (testingSetupManager.SetupGenesisBlockChain()) - Allowing to dynamically skip the proof-of-work testing (supressCheckBlockWork = true) - Fixing the bug in miner_tests.cpp which leaves in the memory pool invalid transactions (mempool.clear() missing). We've also found the exact procedure that can be used to programmatically destroy and re-create the blockchain correctly, which was not implemented and nor documented. Some cleanup methods existed but some other were added because they were missing. This could be of great help to re-create completely the blockchain in case a severe damage has been detected, without restarting the application. As a bonus, 7 unit tests have been added: - ToCheckBlockUpgradeMajority (untested before) - EnforceBlockUpgradeMajority (untested before) - RejectBlockOutdatedMajority (untested before) - "bad-cb-height" - "bad-version" - "time-too-old" - "bad-txns-nonfinal" Last, we added a way to leave the blockchain unaltered after the test suite is over to debug the unit tests themselves (testingSetupManager.keepTestEvidence = true) Note: The BerkeleyDB environment field was converted into a heap allocated object because BerkeleyDB handles are not meant to be re-used after close, and the block-chain environment can be closed and re-opened in the unit tests. This is explained in http://docs.oracle.com/cd/E17275_01/html/api_reference/CXX/envclose.html as "After DbEnv::close() has been called, regardless of its return, the Berkeley DB environment handle may not be accessed again." Sergio Demian Lerner & Timo Hanke

theuni · 2014-08-12T20:31:58Z

You need to include test_bitcoin.h in BITCOIN_TESTS in Makefile.test.include.

gmaxwell · 2014-08-12T20:57:16Z

Creating test blocks requires proof-of-work.

This is what regtest mode is for. (The pull tester currently uses it for dynamically creating blocks to test)

SergioDemianLerner · 2014-08-12T21:07:49Z

@gmaxwell Yes I know, but test_bitcoin is run in MainNet, so to test MainNet specific things (such as the majority rule) I need to create blocks and skip the PoW verification.

SergioDemianLerner · 2014-08-12T21:08:54Z

How can I update this pull request to add test_bitcoin.h into BITCOIN_TESTS in Makefile.test.include ?
Do I have to create a new pull request ?
In that case, all current comments will be lost...

gmaxwell · 2014-08-12T21:21:46Z

There should be no mainnet behavior which is not testable by regtest, if there is— it should be fixed. The sole reason for regtest existing is exactly this case: regular mainnet blocks are too hard to create on the fly in tests.

SergioDemianLerner · 2014-08-12T21:28:44Z

Regnet class inherits from TestNet, and not from MainNet. But that is is unimportant.
One difference is nSubsidyHalvingInterval = 150;

gmaxwell · 2014-08-12T21:30:56Z

Yes, nSubsidyHalvingInterval = 150 is for exactly the same reason: otherwise the subsidy halving interval is not testable from the tests (well, without making 210000 blocks, of course). If some different number would be better for the tests it could be changed to that (obviously the blockchain tester would have to be updated).

SergioDemianLerner · 2014-08-12T21:34:27Z

Even if there is another procedure to create block test cases using another language and another framework (Java, BitcoinJ), I found really useful to be create testcases in the same unit test framework, the same language and the same source tree. It's much easier.

In fact, in another pull request we're working on, we're proposing modifications and we wrote extensive test cases using the internal and more natural framework.

BTW, where is the documentation about how to append testcases run by TheBlueMatt in his BitcoinJ node?

SergioDemianLerner · 2014-08-12T21:36:39Z

You can test it in the MainNet with the current test framework improvement. You create 2017 blocks on-the-fly, you check. Very straightforward.

SergioDemianLerner · 2014-08-12T21:39:54Z

Creating an almost empty block on-the-fly currently takes a less than a few milliseconds. We could benchmark it.

SergioDemianLerner · 2014-08-12T21:41:31Z

Even if there were another thousands ways of testing bitcoind, improving the test framework and letting people add more test cases is a good thing.

gmaxwell · 2014-08-12T21:44:17Z

Internal testing of the blockchain behavior isn't all that fantastic generally, because most mistakes will agree with themselves. The idea of an external harness is that it can be a reference point for behavior. I fully agree that more tests and more ways of testing is good. What I am not happy with a extra POW bypass dropped into the consensus critical code, especially when the justification is redundant with existing infrastructure.

The subsidy halving interval in mainnet is 210000 blocks, you're confusing it with the retargeting interval. I suppose you could create 210000 blocks in a test but that would be needlessly slow.

SergioDemianLerner · 2014-08-12T21:45:29Z

@gmaxwell Oops! yes.

SergioDemianLerner · 2014-08-12T21:52:09Z

If an attacker is able to change the value of supressCheckBlockWork at will when there is no compiled code to do it (in bitcoind), he is also probably able to read the process memory and extract the private keys.
Nevertheless I understand your concerns.

SergioDemianLerner · 2014-08-12T22:06:47Z

Another difference is that MainNet rejects forwarding non-standard transactions, and RegNet allows it.
We can switch to RegNet for test_bitcoin. Then we can omit the POW skip, and we get a better testing framework without any possible theoretical security implication. But I suppose that there are test cases testing non-standard tx rules, so we can't easily switch.
At the end, using supressCheckBlockWork in the MainNet is the closest to the "real" thing one can test dynamically.

gmaxwell · 2014-08-12T22:11:56Z

Again, regtest was created for this purpose. To the extent that regtest has shortcomings we should fix regtest. The initial version of regtest was a set of patches that overlaid on top of bitcoin and changed mainnet rules. When it was merged in as chain params it inhereted testnet but I think there is no reason regtest should be testnet based, IIRC the pull tester does no outside of block transactions, so IsStandard shouldn't be relevant... and I think it can just be switched.

test_bitcoin.h added to BITCOIN_TESTS in Makefile.test.include

mikehearn · 2014-08-17T17:00:46Z

Great work Sergio, I didn't review it yet but this has been a big need for a while now.

The pull tester does use various non-standard scripts but mostly because it's kind of lazy. It could be fixed.

Re: adding tests to it. The current code is kind of gross and Matt talks about rewriting it quite often. I'd hope it'd get better documentation as part of that, if he does it. There is at least a documented build process now which is something. But having tests both in-tree and in the pull tester is not a bad thing.

laanwj · 2014-08-18T11:13:15Z

Why are the wallet-related changes in here? I mean dbenv/wallet/walletdb. As this tests block chain behavior, I'm not sure why those have to change.

laanwj · 2014-08-18T11:18:45Z

Agree with @gmaxwell to prefer using regtest mode for these tests instead of adding a supressCheckBlockWork to main. I wouldn't want this flag to be ever toggled accidentally on mainnet, and regtest already offers this functionailty anyway.

BitcoinPullTester · 2014-08-18T18:03:47Z

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4688_a7555d6f6430aa2ddc83404ae9b627d7d0dd1f6f/ for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

SergioDemianLerner · 2014-08-18T21:39:03Z

@laanwj The wallet code required change in the way a BerkeleyDB handle was accessed: from direct access to the handle to indirect access though a pointer to a handle. This is because one of the objectives of this patch was to be able to destroy and re-create a block-chain programmaticaly. BerkeleyDB handles do not allow to be re-opened after closing them, so was the necessity to delete the handle object and create a new one. The were other solutions: e.g. changing the bitdb global variable from an object to a pointer to an object, which is the same, and requires more updates in other parts of the codebase.

Regarding the regtest mode, if you all agree I will update this pull request to:

Remove supressCheckBlockWork flag
Switch the test framework to regtest.

laanwj · 2014-08-19T08:38:35Z

@SergioDemianLerner Completely agree about programmatically destroying and re-creating block chains, but I'm still not sure how that relates to the wallet DB environment. Does the BDB environment need to be reinitialized every time you create a new block chain? I think this is just extra overhead - especially as you're not doing anything with the wallet in your tests.

Maybe we could separate out the wallet testing so all that happens only once.

SergioDemianLerner · 2014-08-19T13:14:48Z

@laanwj There are several reasons why the wallet need to be re-created in test_bitcoin when the block-chain is re-created. First, all wallet-transactions will be invalid, since those transactions will be based on coinbases that do not exist anymore. Second, when the block-chain is destroyed in test_bitcoin, all temporary files are removed (the block-chain is stored in a temporary directory whose name depends on the current date-time). When a new block-chain is created, a new temporary directory is also created. Then, the previous BDB wallet files are deleted. The new wallet is stored on a new directory. That's the main reason.

Nevertheless, we could destroy the block-chain without destroying the wallet, and re-create the block-chain in the same temporary directory. But I don't know how the Bitcoin core will handle a wallet completely unsynchronized with the block-chain. The code should probably clear the wallet anyway. Does it have a way to clear all the keys? I must check.

The reasons given are only true for testing (test_bitcoin). For production (bitcoind) is probably wise to have methods to destruct/create the block-chain without altering the wallet and vice-versa. But still, to re-create the wallet (because of a corruption error, for example), one needs to destroy and create an new BDB handle, so the same code changes will be needed to repair a wallet programmaticaly and keep the node working.

I will be sending another pull-request with just the new testcases and the change to the regtest without the block-chain re-creation code, while we discuss this.

@gmaxwell

For several reasons the current test framework does not allow to easily incorporate new unit tests that append specially crafted blocks to the blockchain using ProcessBlock(). In the pull request bitcoin#4688 we described the reasons and proposed a solution. While we think that is the optimum solution, @gmaxwell and @laanwj suggested not implementing the changes because many files not directly related to the testing framework were modified by the patch. In this pull request we implement @gmaxwell suggestion that the test framework should run under the RegTest network parameters and not under the main network parameters. Allthough this seems to be a small change, it was not. First, the RegTest block solving probability was 1/2 which still requires "mining" during the test case validation in order to dynamically create a block, which adds unnecessary complications to simple testing code. To overcome this problem, the RegTest difficulty was changed from 0x207fffff to 0x2100FFFF (in compact notation), which gives an approximate 1/2^32 probability not to solve a block. Happily there was no lucky test case. Because themaximum possible target (0xff ... 0xff) cannot be multiplied by 4 without overflow, ComputeMinWork() had to be modified to always return ProofOfWorkLimit() for the RegTest. Also we found that several test cases make explicit use of properties of the MainNet. Such test cases are: alert_tests.cpp rpc_wallet_tests.cpp main_tests.cpp key_tests.cpp DoS_tests.cpp Checkpoints_tests.cpp base58_tests.cpp bloom_tests.cpp rpc_tests.cpp miner_tests.cpp All these test cases were added a temporary switch from the RegTest network to the Main network in order to leave them working. Re-writting all of them for the RegTest probably requires more than 40 hours of work. The alert_tests.cpp test case is special, because it requires the alert private key to generate inputs to the test cases. We suggest to switch this test case to the RegNet and replace the current RegTest alert private key with a published key-pair, so everyone can generate more test cases and there is no opaque data. As the 4688 pull request, we've: a. fixed the bug in miner_tests.cpp which leaves in the memory pool invalid transactions (mempool.clear() missing). b. Added 7 unit tests: ToCheckBlockUpgradeMajority (untested before) EnforceBlockUpgradeMajority (untested before) RejectBlockOutdatedMajority (untested before) "bad-cb-height" "bad-version" "time-too-old" "bad-txns-nonfinal" Sergio Demian Lerner & Timo Hanke

laanwj · 2014-10-22T10:58:05Z

Needs rebase.

laanwj · 2015-01-30T11:01:04Z

I'm going to replace this by out-of-tree RPC tests. I've written weird block generation code in Python, so we can just test these things externally without having unit tests with side effects or a dependent order (or having to change a lot of nontest code).

SergioDemianLerner added 3 commits August 12, 2014 16:24

Bad ident. TAB -> space conversion made

8afc3fe

"delete" missing EnvShutdown()

32bb48e

laanwj added the Tests label Aug 13, 2014

SergioDemianLerner added 2 commits August 13, 2014 11:37

Indentation corrected (TAB -> space)

1197734

test_bitcoin.h added to BITCOIN_TESTS in Makefile.test.include

test_bitcoin.h added to BITCOIN_TESTS in Makefile.test.include

a7555d6

SergioDemianLerner mentioned this pull request Aug 19, 2014

Switch of the test framework from MAIN to REGTEST #4732

Closed

SergioDemianLerner mentioned this pull request Sep 4, 2014

Switch testing framework from MAIN to new UNITTEST network #4845

Closed

laanwj mentioned this pull request Sep 10, 2014

[Qt] Do you want to rebuild the block database now? No -> crash #3136

Closed

laanwj mentioned this pull request Oct 2, 2014

Switch testing framework from MAIN to new UNITTEST network (rebase) #5030

Merged

laanwj closed this Mar 18, 2015

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Improvement to the Test Framework in the processing of test blocks #4688

Improvement to the Test Framework in the processing of test blocks #4688

Uh oh!

Conversation

SergioDemianLerner commented Aug 12, 2014

Uh oh!

theuni commented Aug 12, 2014

Uh oh!

gmaxwell commented Aug 12, 2014

Uh oh!

SergioDemianLerner commented Aug 12, 2014

Uh oh!

SergioDemianLerner commented Aug 12, 2014

Uh oh!

gmaxwell commented Aug 12, 2014

Uh oh!

SergioDemianLerner commented Aug 12, 2014

Uh oh!

gmaxwell commented Aug 12, 2014

Uh oh!

SergioDemianLerner commented Aug 12, 2014

Uh oh!

SergioDemianLerner commented Aug 12, 2014

Uh oh!

SergioDemianLerner commented Aug 12, 2014

Uh oh!

SergioDemianLerner commented Aug 12, 2014

Uh oh!

gmaxwell commented Aug 12, 2014

Uh oh!

SergioDemianLerner commented Aug 12, 2014

Uh oh!

SergioDemianLerner commented Aug 12, 2014

Uh oh!

SergioDemianLerner commented Aug 12, 2014

Uh oh!

gmaxwell commented Aug 12, 2014

Uh oh!

mikehearn commented Aug 17, 2014

Uh oh!

laanwj commented Aug 18, 2014

Uh oh!

laanwj commented Aug 18, 2014

Uh oh!

BitcoinPullTester commented Aug 18, 2014

Uh oh!

SergioDemianLerner commented Aug 18, 2014

Uh oh!

laanwj commented Aug 19, 2014

Uh oh!

SergioDemianLerner commented Aug 19, 2014

Uh oh!

laanwj commented Oct 22, 2014

Uh oh!

laanwj commented Jan 30, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants