Skip to content

Conversation

@maflcko
Copy link
Member

@maflcko maflcko commented Apr 19, 2019

Most of the tests spend time waiting for the poisson tx send to timeout. However, tests that don't test p2p behaviour, don't need to.

Speed them up by syncing transactions over rpc.

Overall, the tests run three times as fast for me now:

  • Before:
ALL                                   | ✓ Passed  | 1395 s (accumulated) 
Runtime: 368 s
  • After:
ALL                                   | ✓ Passed  | 940 s (accumulated) 
Runtime: 120 s

@jnewbery
Copy link
Contributor

Nice! This has been on my list for a long time. Thanks for doing it!

🚀 concept ACK

@DrahtBot DrahtBot added the Tests label Apr 19, 2019
@laanwj
Copy link
Member

laanwj commented Apr 20, 2019

This is an impressive improvement in speed!
ACK fad26ed

a test here:

ALL                                   | ✓ Passed  | 1788 s (accumulated)
Runtime: 472 s
ALL                                   | ✓ Passed  | 1516 s (accumulated)
Runtime: 388 s

Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.


# Push this pool to all targets
for i_target, rpc_target in enumerate(rpc_connections):
missing_txids = pool[i_remote].difference(pool[i_target])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could skip when i_remove == i_target?

Copy link
Member Author

@maflcko maflcko Apr 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted for code brevity. Calculating the empty set as a difference from two equal sets should take negligible time compared to an rpc call

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK. Consider describing this argument's purpose and use in the sync_mempools function docstring, in example_test.py, and possibly in rpc_deprecated.py.

@maflcko
Copy link
Member Author

maflcko commented Apr 22, 2019

Added a commit to extend the docstring, as requested by @promag and @jonatack

pool = [set(r.getrawmempool()) for r in rpc_connections]
if pool.count(pool[0]) == len(rpc_connections):
sync_done = pool.count(pool[0]) == len(rpc_connections)
if use_rpc_sync and not sync_done:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems slightly odd to me that this code, which we only ever expect to run once, should be in a while loop. I think it'd be clearer to split it out into its own function. Something like:

def sync_mempools(rpc_connections, *, wait=1, timeout=60, use_rpc_sync=False, flush_scheduler=True):
    """
    Wait until everybody has the same transactions in their memory
    pools. If use_rpc_sync is set, sync all transactions right away.
    """
    if use_rpc_sync:
        force_sync_mempools(rpc_connections)
    else:
        stop_time = time.time() + timeout
        while time.time() <= stop_time:
            pool = [set(r.getrawmempool()) for r in rpc_connections]
            if pool.count(pool[0]) == len(rpc_connections):
                break
            time.sleep(wait)
        else:
            raise AssertionError("Mempool sync timed out:{}".format("".join("\n  {!r}".format(m) for m in pool)))

    if flush_scheduler:
        for r in rpc_connections:
            r.syncwithvalidationinterfacequeue()

def force_sync_mempools(rpc_connections):

    class TxInfo:
        def __init__(self, *, raw, ancestors):
            self.raw = raw
            self.ancestors = ancestors

    def topo_send(txs, rpc, pool_add):
        for i in txs:
            topo_send(txs[i].ancestors, rpc, pool_add)
            if i not in pool_add:
                try:
                    assert_equal(i, rpc.sendrawtransaction(txs[i].raw))
                    pool_add.add(i)
                    # Note that conflicted txs (due to RBF) are not removed
                    # from the pool
                except JSONRPCException as e:
                    # This transaction violates policy (e.g. RBF policy). The
                    # mempools should still converge when the high-fee
                    # replacement is synced in a later call
                    assert 'insufficient fee' in e.error['message']

    pool = [set(r.getrawmempool()) for r in rpc_connections]
    # Iterate over all nodes, get their raw mempool and send the
    # missing txs to all other nodes
    for i_remote, rpc_remote in enumerate(rpc_connections):
        pool_remote = {
            txid: TxInfo(raw=rpc_remote.getrawtransaction(txid), ancestors=info['depends'])
            for txid, info in rpc_remote.getrawmempool(verbose=True).items()
        }
        # Create "recursive pools" for ancestors
        for tx in pool_remote:
            pool_remote[tx].ancestors = {a: pool_remote[a] for a in pool_remote[tx].ancestors}

        # Push this pool to all targets
        for i_target, rpc_target in enumerate(rpc_connections):
            missing_txids = pool[i_remote].difference(pool[i_target])
            # Send missing txs
            topo_send(
                txs={txid: pool_remote[txid]
                     for txid in pool_remote if txid in missing_txids},
                rpc=rpc_target,
                pool_add=pool[i_target],
            )
    # If the sync fails there is a logic error in the sync or test code
    pool = [set(r.getrawmempool()) for r in rpc_connections]
    assert pool.count(pool[0]) == len(rpc_connections)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not use a while-else, which I have to look up what it does every time I encounter it. I opted for a smaller diff and added an assert, so the diff should be easier to review and it should be clear that it is only run once via the assert.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also rewriting the function would invalidate previous review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised you think Python's while-else syntax is confusing. I don't think I've ever had a problem with it: https://stackoverflow.com/a/3295949. I certainly think it's clearer than the existing code, which has a return buried in a conditional in the while loop, but hidden under the flush_scheduler cleanup code.

I'm pretty sure you've been pushed into this weird construction (execute-once code within a while loop) because of the flush_scheduler cleanup code being inside the loop. Restructuring the function so the loop breaks and then does the cleanup allows the RPC sync to be separated from the while loop, which seems clearer to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike if-else, where only one branch is executed, the while-else (or for-else) generally execute both branches. To make it even more confusing, it won't execute the else branch when you break.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I am happy to review a follow-up pull that switches both sync_ helpers to the while-else syntax. But, I'd rather not do it here.

@jnewbery
Copy link
Contributor

Aesthetic disagreements aside, this is a great improvement. utACK faac5a8

self.nodes = []
self.network_thread = None
self.rpc_timeout = 60 # Wait for up to 60 seconds for the RPC server to respond
self.use_rpc_sync = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, as the improvement is mostly gained by bypassing Poisson feature, why don't disable it via parameter e.,g,-enablepoisson=0, which would be allowed only in regtest? In this case, nodes behave closer to a real scenario, we have cleaner logs as txs are exchanged via P2P only (no race condition), would work even if sync_mempool hasn't been called and no need to maintain a custom tx exchange function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kostyantyn Good point! Done in

@maflcko maflcko closed this Apr 23, 2019
@maflcko maflcko deleted the 1904-qaSync branch April 23, 2019 21:02
@maflcko maflcko restored the 1904-qaSync branch April 24, 2019 15:19
@maflcko
Copy link
Member Author

maflcko commented Apr 24, 2019

Going back to this approach, since modifying bitcoind is interpreted as too controversial

@maflcko maflcko reopened this Apr 24, 2019
@maflcko
Copy link
Member Author

maflcko commented Apr 24, 2019

Unless there are objections this will be merged on Monday

@promag
Copy link
Contributor

promag commented Apr 24, 2019

Could make sense to skip this optimization in one of the travis job?

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 25, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #15911 (Use wallet RBF default for walletcreatefundedpsbt by Sjors)
  • #15891 (test: Require standard txs in regtest by default by MarcoFalke)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@jonatack
Copy link
Member

Functional tests on master at 8da1aa4 with Linux Debian 4.19.28-2 (2019-03-15) x86/64

ALL | ✓ Passed | 1824 s (accumulated)
Runtime: 466 s

ALL | ✓ Passed | 1867 s (accumulated)
Runtime: 476 s

ALL | ✓ Passed | 1856 s (accumulated)
Runtime: 473 s

ALL | ✓ Passed | 1774 s (accumulated)
Runtime: 453 s


Functional tests with this PR at faac5a8 rebased on master at 8da1aa4

ALL | ✓ Passed | 1372 s (accumulated)
Runtime: 352 s

ALL | ✓ Passed | 1388 s (accumulated)
Runtime: 357 s

ALL | ✓ Passed | 1472 s (accumulated)
Runtime: 377 s

ALL | ✓ Passed | 1449 s (accumulated)
Runtime: 373 s

@maflcko maflcko closed this Apr 30, 2019
@maflcko maflcko deleted the 1904-qaSync branch April 30, 2019 20:15
@laanwj
Copy link
Member

laanwj commented Apr 30, 2019

Going back to this approach, since modifying bitcoind is interpreted as too controversial

I prefer this approach as well. I think the only use for this, ever, is the tests, so I'd prefer not to change bitcoind with a special case for it.

@bitcoin bitcoin locked as resolved and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants