-
Notifications
You must be signed in to change notification settings - Fork 38.7k
Faster tests with topological mempool rpc sync 🚀 #15858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Nice! This has been on my list for a long time. Thanks for doing it! 🚀 concept ACK |
|
This is an impressive improvement in speed! a test here: |
promag
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
|
|
||
| # Push this pool to all targets | ||
| for i_target, rpc_target in enumerate(rpc_connections): | ||
| missing_txids = pool[i_remote].difference(pool[i_target]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could skip when i_remove == i_target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted for code brevity. Calculating the empty set as a difference from two equal sets should take negligible time compared to an rpc call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK. Consider describing this argument's purpose and use in the sync_mempools function docstring, in example_test.py, and possibly in rpc_deprecated.py.
| pool = [set(r.getrawmempool()) for r in rpc_connections] | ||
| if pool.count(pool[0]) == len(rpc_connections): | ||
| sync_done = pool.count(pool[0]) == len(rpc_connections) | ||
| if use_rpc_sync and not sync_done: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems slightly odd to me that this code, which we only ever expect to run once, should be in a while loop. I think it'd be clearer to split it out into its own function. Something like:
def sync_mempools(rpc_connections, *, wait=1, timeout=60, use_rpc_sync=False, flush_scheduler=True):
"""
Wait until everybody has the same transactions in their memory
pools. If use_rpc_sync is set, sync all transactions right away.
"""
if use_rpc_sync:
force_sync_mempools(rpc_connections)
else:
stop_time = time.time() + timeout
while time.time() <= stop_time:
pool = [set(r.getrawmempool()) for r in rpc_connections]
if pool.count(pool[0]) == len(rpc_connections):
break
time.sleep(wait)
else:
raise AssertionError("Mempool sync timed out:{}".format("".join("\n {!r}".format(m) for m in pool)))
if flush_scheduler:
for r in rpc_connections:
r.syncwithvalidationinterfacequeue()
def force_sync_mempools(rpc_connections):
class TxInfo:
def __init__(self, *, raw, ancestors):
self.raw = raw
self.ancestors = ancestors
def topo_send(txs, rpc, pool_add):
for i in txs:
topo_send(txs[i].ancestors, rpc, pool_add)
if i not in pool_add:
try:
assert_equal(i, rpc.sendrawtransaction(txs[i].raw))
pool_add.add(i)
# Note that conflicted txs (due to RBF) are not removed
# from the pool
except JSONRPCException as e:
# This transaction violates policy (e.g. RBF policy). The
# mempools should still converge when the high-fee
# replacement is synced in a later call
assert 'insufficient fee' in e.error['message']
pool = [set(r.getrawmempool()) for r in rpc_connections]
# Iterate over all nodes, get their raw mempool and send the
# missing txs to all other nodes
for i_remote, rpc_remote in enumerate(rpc_connections):
pool_remote = {
txid: TxInfo(raw=rpc_remote.getrawtransaction(txid), ancestors=info['depends'])
for txid, info in rpc_remote.getrawmempool(verbose=True).items()
}
# Create "recursive pools" for ancestors
for tx in pool_remote:
pool_remote[tx].ancestors = {a: pool_remote[a] for a in pool_remote[tx].ancestors}
# Push this pool to all targets
for i_target, rpc_target in enumerate(rpc_connections):
missing_txids = pool[i_remote].difference(pool[i_target])
# Send missing txs
topo_send(
txs={txid: pool_remote[txid]
for txid in pool_remote if txid in missing_txids},
rpc=rpc_target,
pool_add=pool[i_target],
)
# If the sync fails there is a logic error in the sync or test code
pool = [set(r.getrawmempool()) for r in rpc_connections]
assert pool.count(pool[0]) == len(rpc_connections)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not use a while-else, which I have to look up what it does every time I encounter it. I opted for a smaller diff and added an assert, so the diff should be easier to review and it should be clear that it is only run once via the assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also rewriting the function would invalidate previous review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised you think Python's while-else syntax is confusing. I don't think I've ever had a problem with it: https://stackoverflow.com/a/3295949. I certainly think it's clearer than the existing code, which has a return buried in a conditional in the while loop, but hidden under the flush_scheduler cleanup code.
I'm pretty sure you've been pushed into this weird construction (execute-once code within a while loop) because of the flush_scheduler cleanup code being inside the loop. Restructuring the function so the loop breaks and then does the cleanup allows the RPC sync to be separated from the while loop, which seems clearer to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike if-else, where only one branch is executed, the while-else (or for-else) generally execute both branches. To make it even more confusing, it won't execute the else branch when you break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, I am happy to review a follow-up pull that switches both sync_ helpers to the while-else syntax. But, I'd rather not do it here.
|
Aesthetic disagreements aside, this is a great improvement. utACK faac5a8 |
| self.nodes = [] | ||
| self.network_thread = None | ||
| self.rpc_timeout = 60 # Wait for up to 60 seconds for the RPC server to respond | ||
| self.use_rpc_sync = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, as the improvement is mostly gained by bypassing Poisson feature, why don't disable it via parameter e.,g,-enablepoisson=0, which would be allowed only in regtest? In this case, nodes behave closer to a real scenario, we have cleaner logs as txs are exchanged via P2P only (no race condition), would work even if sync_mempool hasn't been called and no need to maintain a custom tx exchange function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kostyantyn Good point! Done in
- net: Send txs without delay on regtest net: Send txs without delay on regtest #15881
|
Going back to this approach, since modifying bitcoind is interpreted as too controversial |
|
Unless there are objections this will be merged on Monday |
|
Could make sense to skip this optimization in one of the travis job? |
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
Functional tests on master at 8da1aa4 with Linux Debian 4.19.28-2 (2019-03-15) x86/64 ALL | ✓ Passed | 1824 s (accumulated) ALL | ✓ Passed | 1867 s (accumulated) ALL | ✓ Passed | 1856 s (accumulated) ALL | ✓ Passed | 1774 s (accumulated) Functional tests with this PR at faac5a8 rebased on master at 8da1aa4 ALL | ✓ Passed | 1372 s (accumulated) ALL | ✓ Passed | 1388 s (accumulated) ALL | ✓ Passed | 1472 s (accumulated) ALL | ✓ Passed | 1449 s (accumulated) |
I prefer this approach as well. I think the only use for this, ever, is the tests, so I'd prefer not to change |
Most of the tests spend time waiting for the poisson tx send to timeout. However, tests that don't test p2p behaviour, don't need to.
Speed them up by syncing transactions over rpc.
Overall, the tests run three times as fast for me now: