Skip to content

Conversation

@sdaftuar
Copy link
Member

This is in preparation for re-using these validation components for a new version of AcceptToMemoryPool() that can operate on multiple transactions ("package relay").

@sdaftuar
Copy link
Member Author

sdaftuar commented Jul 16, 2019

The motivation behind this refactor is to allow implementation of a package relay system (#14895). A simple implementation that I have in mind, which requires no additional p2p protocol changes, is shown at #16401.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 16, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #16658 (validation: Rename CheckInputs to CheckInputScripts by jnewbery)
  • #13525 (policy: Report reason inputs are nonstandard from AreInputsStandard by Empact)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at a109ce2 on GitHub. I think it was attempted in the past to split up this 400 line function into smaller parts, but now there is an actual motivation for it (#16401).
I left some style questions inline. Feel free to ignore.

// Compare a package's feerate against minimum allowed.
bool CheckFeeRate(size_t package_size, CAmount package_fee, CValidationState& state)
{
CAmount mempoolRejectFee = pool.GetMinFee(gArgs.GetArg("-maxmempool", DEFAULT_MAX_MEMPOOL_SIZE) * 1000000).GetFee(package_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are some of the settings parsed in the constructor and saved as a member and others are not?

Could do the same here.

Suggested change
CAmount mempoolRejectFee = pool.GetMinFee(gArgs.GetArg("-maxmempool", DEFAULT_MAX_MEMPOOL_SIZE) * 1000000).GetFee(package_size);
CAmount mempoolRejectFee = pool.GetMinFee(m_max_pool_size).GetFee(package_size);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that I end up needing to access the package size limits directly when adding AcceptMultipleTransactions() in #16401, whereas I didn't need local copies of the reject fee. So my main motivation behind not saving these as members is to avoid cluttering the state, which was already complex to untangle.

I don't feel strongly about this though; I just wanted to make it as easy as possible for reviewers and future code maintainers to be able to understand what data is required by each function.

bool& fReplacementTransaction = ws.fReplacementTransaction;
CAmount& nModifiedFees = ws.nModifiedFees;
CAmount& nConflictingFees = ws.nConflictingFees;
size_t& nConflictingSize = ws.nConflictingSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the local name and the name of the member are identical, does it really make sense to alias it? Looks like you already changed every line in this function due to whitespace fixups, so might as well prefix what we need with ws. or args. in line. No strong opinion, though.
If you decide to keep the current approach, I think you can make review easier by not duplicating all the types (use auto&, which should keep the type and constness as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did think it was going to be easier for review if I didn't end up changing all the variables out at the same time. I think my preferred approach would be to fix the variable names in MemPoolAccept, Workspace, and ATMPArgs to conform to the style guide, but then continue to use these aliases so that we're not also changing all the variables in each of these functions.

One benefit of the aliases is that it makes it easier to tell what variables we're not using as well. I could also achieve that by changing the functions so that we pass in less data, such as by directly passing in only those members of the Workspace and Args structs that each function needs. Perhaps that is a better approach?


// Check against previous transactions
// This is done last to help prevent CPU exhaustion denial-of-service attacks.
PrecomputedTransactionData txdata(tx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this is not stored in the workspace at this point? Maybe as a unique ptr or something, like the entry?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a version of this code where it was a unique_ptr, but thought this was an easy way to avoid a heap allocation that we don't really need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit weird to have m_entry as a unique_ptr, but not have txdata in there as well just to avoid it being on the heap. I think making txdata a unique_ptr in Workspace would simplify #16401 a bit too.

Copy link
Member Author

@sdaftuar sdaftuar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoFalke Thanks for the quick review! I've pushed many updates to address feedback.

I plan to squash many of the cleanup commits down, but I wonder if the variable rename is easier to review by itself?

// Compare a package's feerate against minimum allowed.
bool CheckFeeRate(size_t package_size, CAmount package_fee, CValidationState& state)
{
CAmount mempoolRejectFee = pool.GetMinFee(gArgs.GetArg("-maxmempool", DEFAULT_MAX_MEMPOOL_SIZE) * 1000000).GetFee(package_size);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that I end up needing to access the package size limits directly when adding AcceptMultipleTransactions() in #16401, whereas I didn't need local copies of the reject fee. So my main motivation behind not saving these as members is to avoid cluttering the state, which was already complex to untangle.

I don't feel strongly about this though; I just wanted to make it as easy as possible for reviewers and future code maintainers to be able to understand what data is required by each function.

bool& fReplacementTransaction = ws.fReplacementTransaction;
CAmount& nModifiedFees = ws.nModifiedFees;
CAmount& nConflictingFees = ws.nConflictingFees;
size_t& nConflictingSize = ws.nConflictingSize;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did think it was going to be easier for review if I didn't end up changing all the variables out at the same time. I think my preferred approach would be to fix the variable names in MemPoolAccept, Workspace, and ATMPArgs to conform to the style guide, but then continue to use these aliases so that we're not also changing all the variables in each of these functions.

One benefit of the aliases is that it makes it easier to tell what variables we're not using as well. I could also achieve that by changing the functions so that we pass in less data, such as by directly passing in only those members of the Workspace and Args structs that each function needs. Perhaps that is a better approach?


// Check against previous transactions
// This is done last to help prevent CPU exhaustion denial-of-service attacks.
PrecomputedTransactionData txdata(tx);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a version of this code where it was a unique_ptr, but thought this was an easy way to avoid a heap allocation that we don't really need.

@sdaftuar sdaftuar force-pushed the 2019-07-refactor-atmp branch from ca6c291 to 2bfcbc6 Compare July 17, 2019 19:07
@sdaftuar
Copy link
Member Author

I went ahead and squashed the commits up to the variable rename, unsquashed version is here: 16400.1

@sdaftuar sdaftuar force-pushed the 2019-07-refactor-atmp branch from 93e0451 to a372e03 Compare July 22, 2019 14:23
@maflcko maflcko closed this Jul 23, 2019
@maflcko maflcko reopened this Jul 23, 2019
Copy link
Contributor

@ajtowns ajtowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK. Looks pretty good; ATMPArgs seems helpful. I've only looked at the structure, not checked the logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like m_coins_to_uncache could just be a member var of MemPoolAccept directly; with either a UncacheCoins() method invoked by the caller on failure, or having a ~MemPoolAccept() destructor that automatically uncaches, with AcceptSingleTransaction calling m_coins_to_uncache.clear() before returning true to avoid uncaching things that should remain cached.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it makes sense to mix ATMP logic at this layer with managing the utxo cache -- I think that's better handled by callers in the long run, even we're just doing something very simple now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was thinking of it more as "ATMP logic messes up the cache when it decides it can't accept, so cleaning up that mess should also be part of ATMP logic". But having it be done in ATMPWithTime is still part of ATMP in my book, so either way works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have EXCLUSIVE_LOCKS_REQUIRED(cs_main)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could make it even easier by having const ATMPArgs m_args as a member of MemPoolAccept, and not passing it around at all. That doesn't work for m_bypass_limits in AcceptMultipleTransactions, but you're basically manually checking the limits there anyway, so maybe could do false for all of them; otherwise could use Workspace for per-tx logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to leave this as-is for the moment. I do think that holistically, we can think about doing one of two things with this refactor:

(a) go all in for storing state in the class and not pass around data between member functions
(b) dump this class and pass state around explicitly instead

I did something weird in this PR where I started to introduce a class (mostly to function as a namespace, I guess!), but I mostly clung to the historical style of passing everything around anyway (except for the mempool/coinsview related things, and the chain limits). So this is probably a pretty weird state to leave things, and I'm happy to dive down either direction so that the design is more cohesive, but I'd like feedback on which way to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I think either way is fine, but my preference is to keep the class for namespacing (so it's clear the private methods are only called from the public ones), and to keep the number of explicit params to the functions pretty small -- pulling the stuff you actually need from the Args/Workspace structs like you do looks good to me.

Maybe it might make sense to only pass things as args if they aren't known got MemPoolAccept's constructor or can be deallocated before the end -- like txdata -- or are per-transaction -- like Workspace?

Copy link
Member Author

@sdaftuar sdaftuar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ajtowns for the review. I've addressed most comments; the main conceptual point I would like feedback on is whether wrapping the state in this MemPoolAccept() class is a good idea, or if I should go back to the style we have historically used of passing state explicitly to these functions (as I mention below).

Also, I've updated the related pull over at #16401 to be rebased on the current version of this PR, so we can track how these changes affect the goal. Seems to work fine so far.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to leave this as-is for the moment. I do think that holistically, we can think about doing one of two things with this refactor:

(a) go all in for storing state in the class and not pass around data between member functions
(b) dump this class and pass state around explicitly instead

I did something weird in this PR where I started to introduce a class (mostly to function as a namespace, I guess!), but I mostly clung to the historical style of passing everything around anyway (except for the mempool/coinsview related things, and the chain limits). So this is probably a pretty weird state to leave things, and I'm happy to dive down either direction so that the design is more cohesive, but I'd like feedback on which way to go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it makes sense to mix ATMP logic at this layer with managing the utxo cache -- I think that's better handled by callers in the long run, even we're just doing something very simple now.

@sdaftuar sdaftuar force-pushed the 2019-07-refactor-atmp branch 2 times, most recently from ec87639 to 0840a1d Compare July 31, 2019 14:20
@sdaftuar
Copy link
Member Author

Rebased

@maflcko
Copy link
Member

maflcko commented Aug 7, 2019

ACK 0840a1d2e2270c06342710c4ebfaac1c41652dab (read diff with --ignore-all-space)

I like that this splits up the massivley large ATMP into logical chunks
(not only functional separate checks, but also structural into args and
workspace). With the added documentation, the code is probably easier to
understand now.

Show signature and timestamp

Signature:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

ACK 0840a1d2e2270c06342710c4ebfaac1c41652dab (read diff with --ignore-all-space)

I like that this splits up the massivley large ATMP into logical chunks
(not only functional separate checks, but also structural into args and
workspace). With the added documentation, the code is probably easier to
understand now.
-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUg9Qgv/euaSC1DAapYXrYcl9nrFF04qsMeVtv3lJ8hupHNjRXHpTFZoLE2uUO3T
+kf5rugC1UuImZsVOy4odQ0+rVIrnLhrI7DqECzQ+P7OX4+JN/Wf6dfciHzznGsX
h+lrDEDgs0MuCEgsoQAIgQe39IDsexNQyE5i6+dEZtbqhhiuT+hoJAH8Hj3TKQib
60Sn7yBpdOSgYnZM1TTELCYSL4lDzXIFH5lW3ojj2oYF+1AnAqjRU8EydUH0wGRX
sK1PtCdI8dXhTXChHYnku24e0cR1VCHMjPmLXXxtqp9xm0qVtDMH7coSgOdY9SRx
OO++RjA+8oag5YmCxZRVcvkZ7m/XlwRshGJDbKCSKy2vuW++MNJmTzkxUkZPUl7s
BSrBP04DYMyhEtZHfcMV3oLbb8HZ/a9RJTbBlpQubEsvBT5SbcPdFMhMv7Ok6M6C
SSQm56IF2bz/Ky6F1F1h1EVnVNpF1SXkXNHfnn5JfZoPVlbvvNk5k+iS7pVqxtOI
5TNLNS1L
=ZwOV
-----END PGP SIGNATURE-----

Timestamp of file with hash e1313ca99cc4b41a7b44fb90863a0d30e70235c91504a644337a0c8a77443411 -

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to other reviewers. This says "package", but in the current code this is a single tx (the one that is added)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc-nit: Should this say that this method fully initializes the ws? Or is this only a coincidence not worth to mention?

@TheBlueMatt
Copy link
Contributor

utACK 0840a1d2e2270c06342710c4ebfaac1c41652dab.

@maflcko maflcko added this to the 0.19.0 milestone Aug 14, 2019
Copy link
Contributor

@ajtowns ajtowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 0840a1d2e2270c06342710c4ebfaac1c41652dab ; added some suggestions, but this is already a good improvement. re-reviewed the code, checked it compiled and tests passed.


// Check against previous transactions
// This is done last to help prevent CPU exhaustion denial-of-service attacks.
PrecomputedTransactionData txdata(tx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit weird to have m_entry as a unique_ptr, but not have txdata in there as well just to avoid it being on the heap. I think making txdata a unique_ptr in Workspace would simplify #16401 a bit too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would make for a slightly smaller diff to say CTxMemPoolEntry& entry = *ws.m_entry; and only declare it after m_entry has been reset (and assert(wx.m_entry); beforehand in Finalize to keep the static analysis tools happy, I suppose)

@sdaftuar
Copy link
Member Author

sdaftuar commented Sep 3, 2019

Squashed

@fanquake fanquake changed the title [refactor] Rewrite AcceptToMemoryPoolWorker() using smaller parts refactor: Rewrite AcceptToMemoryPoolWorker() using smaller parts Sep 6, 2019
@laanwj laanwj modified the milestones: 0.19.0, 0.20.0 Sep 11, 2019
@laanwj
Copy link
Member

laanwj commented Sep 11, 2019

The motivation behind this refactor is to allow implementation of a package relay system (#14895)

Bumped the milestone to 0.20: as noted by @fanquake, as this is preparation work for a feature, it doesn't make a lot of sense to merge this last-minute for 0.19, so I think we should merge this early in the 0.20 release cycle.
(on the other hand this has two ACKs so if people feel this is ready for merge already feel free to disagree)

@maflcko
Copy link
Member

maflcko commented Sep 11, 2019

I think it makes sense to merge this for 0.19 to make policy backports to 0.19 easier

@sdaftuar Are you still working on this?

@sdaftuar
Copy link
Member Author

I independently figured we’d wait until 0.20 for the same reason @laanwj gave, but I can get this rebased sooner for consideration in 0.19.

This is in preparation for re-using these validation components for a new
version of AcceptToMemoryPool() that can operate on multiple transactions
("package relay").
@sdaftuar sdaftuar force-pushed the 2019-07-refactor-atmp branch from ef79e05 to 4a87c5c Compare September 16, 2019 15:37
@sdaftuar
Copy link
Member Author

Rebased (old version is 16400.2)

@maflcko
Copy link
Member

maflcko commented Sep 16, 2019

re-ACK 4a87c5c (did the rebase myself and arrived at the same result, mod whitespace)

Show signature and timestamp

Signature:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

re-ACK 4a87c5cfdf7dd72d999ebeaf17db6695a7c6298d (did the rebase myself and arrived at the same result, mod whitespace)
-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUjSRAv9Gg+QHZvblyy4FHN6sRMj+h49EyMuE4ZZfQIbdISoiIQggyDYUQpi1S8K
gtLYc71DYaMO7eNec+pmrBvEiwM3VUFMUePuO3ZE6cNkL8ch3VKdkm8/M9vLriDk
cJjqYKEe+dCUj/mLIzCRZUxs13P0DaQjowiiqbhckwgfejXSILu6Gb34Ykto07Eo
2N9LUOCT5yPX+oxCOLKi2ekKBcUPl1FfZpWTf+8BwQGzXCE7SwJYBbeDVBWdW9TG
5iSrEyoPaLyXJdkDV4SMqwDoaGeK2iTAczh8smM+fpDh6v5m0OYYEtOg/GvN+11z
TJkkrVecwoThJI/qjuWw3PPQGJR/7Rehpo7gf+RBpbp/0KzYspw7JKhS+U30EBjq
GloMYN94xQzkxxs3K166bl0SPo+AePDdFM+JBmisTwCiBUbFXF6RIZklA+vqZpM6
OsG4WgOUHjPO3q0PrNtdd0fl6HtCu19dbasfURcJ6fYprNmwPdqgpc11Mq202ajg
mJKxJZDS
=8KLl
-----END PGP SIGNATURE-----

Timestamp of file with hash 1e4042da15dcf1e1e7c77742a411deedbfd643673230755ac10c03edcad1c0d0 -

@laanwj
Copy link
Member

laanwj commented Sep 18, 2019

ACK 4a87c5c

laanwj added a commit that referenced this pull request Sep 18, 2019
…ler parts

4a87c5c [refactor] Rewrite AcceptToMemoryPoolWorker() using smaller parts (Suhas Daftuar)

Pull request description:

  This is in preparation for re-using these validation components for a new version of AcceptToMemoryPool() that can operate on multiple transactions ("package relay").

ACKs for top commit:
  MarcoFalke:
    re-ACK 4a87c5c (did the rebase myself and arrived at the same result, mod whitespace)
  laanwj:
    ACK 4a87c5c

Tree-SHA512: b0495c026ffe06146258bace3d5e0c9aaf23fa65f89f258abc4af5980812e68e63a799f1d923e78ac1ee6bcafaf1222b2c2690a527df9b65dff7b48a013f154e
@laanwj laanwj merged commit 4a87c5c into bitcoin:master Sep 18, 2019
@maflcko maflcko removed this from the 0.20.0 milestone Sep 18, 2019
domob1812 added a commit to domob1812/namecoin-core that referenced this pull request Sep 23, 2019
Merge the upstream refactoring to AcceptToMemoryPoolWorker from
bitcoin/bitcoin#16400 with the Namecoin-specific
changes made to it.
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Dec 8, 2020
…maller parts

Summary:
4a87c5cfdf7dd72d999ebeaf17db6695a7c6298d [refactor] Rewrite AcceptToMemoryPoolWorker() using smaller parts (Suhas Daftuar)

Pull request description:

  This is in preparation for re-using these validation components for a new version of AcceptToMemoryPool() that can operate on multiple transactions ("package relay").

---

Because of how our code handles SigChecks I could not remove the code commented under "Validate input
 scripts against standard script flags" from the new PreChecks function into the PR's new PolicyScriptChecks
because since D5128 anything related to that number can only be run after CheckInputs.

The other code in the above new function was related to SegWit therefore I just removed the
PolicyScriptChecks function altogether.

Other missing members of the Workspace and ATMPArgs that were unused due  to us not having RBF
were also taken out.

Backport of Core [[bitcoin/bitcoin#16400 | PR16400]]

Test Plan:
  ninja all check check-functional

Reviewers: deadalnix, #bitcoin_abc

Reviewed By: deadalnix, #bitcoin_abc

Differential Revision: https://reviews.bitcoinabc.org/D8203
glozow added a commit to glozow/bitcoin that referenced this pull request Apr 1, 2021
ATMPW stands for AcceptToMemoryPoolWorker, which was removed in bitcoin#16400.
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants