-
Notifications
You must be signed in to change notification settings - Fork 38.6k
Encapsulate coin balances within a new CMoney type. #4067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use double conversions in places that currently use integer arithmetic.
We want to avoid using doubles wherever possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code necessarily has to have some sort of inexact conversion. That's rather the point -- we shouldn't be assuming that money calculations are integer arithmetic! So assume that CMoney is implemented using some sort of exact rational arithmetic bignum library. How do we then write code that converts CMoney into an (compact!) ASCII-comparable string, in a way that is code-compatible regardless of the underlying integer type?
A reasonable compromise is what is implemented here -- reduce, if necessary, to the precision of a double and then use the newly added EncodeDouble to convert that into a sortable uint64_t. Note that this is 100% consistent with current behavior in bitcoin - double provides 2^53 bits of precision, where MoneyRange constrains maximum value to less than 2^52.
I respect the gut-instinct "eww" attached to floating point conversion. But in this case it's actually a reasonable compromise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But introducing double conversion is problematic for all coins that use more than 53 bits of precision, increasing the friction in sharing code and submitting code upstream. I think there are more coins which use integer representation with values beyond 2^53 than there are coins using alternative datatypes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only metric of "more coins (existing)" that I think we should care about here is how much useful code and fixes their contributors have submitted upstream. By that metric no such other coins exist, AFAIK.
Really I don't care for this whole pull as a compatibility with other forks thing: Its interesting upstream because it increases type-safety for values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the change has value for Bitcoin, I'm simply saying that because interoperability is the stated goal of the PR it looks preferable not to break existing interoperability.
It seems more reasonable to me to use the much simpler
out.tx->vout[out.i].nValue.ToInt64()
instead of
EncodeDouble(out.tx->vout[out.i].nValue.ToDouble())
in Bitcoin and use the more complicated version only in implementations which actually implement sub-satoshi precision. That would also remove the entire EncodeDouble function from Bitcoin code, reducing the overall code size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmaxwell, it's hard for there to be meaningful contributions back when most alts are stuck on ancient versions of the code base! But even without contribution back, minimizing the code difference between alts makes it easier for maintainers to rebase against newer versions of Bitcoin Core, which increases the security of the greater crypto currency ecosystem. But as you say, the greatest contribution to bitcoin specifically is enforcing some type safety for monetary values.
@leofidus, I'm not sure you understand what this change does or the context in which the code executes, as using ToInt64() would not fix the problem and would in fact make it worse. That's alright; let's walk through it:
The original line of code above fills in a hidden column with the left-padded string representation of the output's value, in units of 1e-8 bitcoin. So for example, three outputs with the values 5btc, 0.2btc, and 1mbtc would have the following values for this field:
000000500000000
000000020000000
000000000100000
When you request a numeric sort based on the output value field, what it actually does is do a string sort over this hidden field, which is why the left-padding is necessary. (I've used '0' as the padding character just for clarity's sake on Github, whereas the code actually uses ' ' [space].)
Now there are two Bitcoin-specific assumptions here: (1) monetary values are integers, and (2) no output will ever have more than 15 digits of absolute precision. These are not universally valid assumptions, and either or both of them might be invalidated on a side chain. Side chains don't exist yet so we must look to alts for examples: Dogecoin and Freicoin invalidate one and both, respectfully.
One of the goals of this pull request is to minimize the number of lines of code which are sensitive to these assumptions, and to make the format assumptions explicit when it is necessary (e.g. ToInt64()). So if we do not assume that CMoney is implemented with an integer, and that the dynamic range of monetary values may be more than 15 decimal digits on an absolute scale, how do we still carry out this string-sorting hack for the GUI tables?
What the updated code does is use another cool little hack: it turns out that because IEEE floating point is designed in a particular way, you can sort non-negative floats or doubles as if they were integers and get the same result. There's a little bit of bit-flipping magic that makes this work for negative values as well, and that's what EncodeDouble does. So the new code does this:
- It converts from whatever the native CMoney format is to 64-bit IEEE floating point, in the process rounding to approximately 16 decimal digits of precision - this is safe to do for bitcoin values in non-consensus code, since
MoneyRangerestrictions ensure this is a 1:1 function with stable round-trip transformations, at least without FPU bugs or weird, atypical, non-default compiler optimizations. - It uses the new utility function
EncodeDoubleto map thedoubletouint64_tin another 1:1 transformation, doing some bit-flipping magic so for any non-NaN floating point values a and b, ifa < bthenEncodeDouble(a) < EncodeDouble(b). - It then does the exact same print-integer-with-left-padding trick to fill the column, except it pads to 20 bytes instead of 15 to capture the entire range of
uint64_t.
For the above values, it results in the following columns:
13960540253593796608
13939505956204314624
13904980397738950656
It's less human readable, but it's a hidden field and an implementation detail. Note that the values are still in descending order. The end result is that no matter the internal representation of CMoney, the hidden column will be filled with a twenty-character string representation of the value, which preserves sorting order for up to 16 decimal digits of relative precision. This is in every way better than the current implementation, and fully generic - it does not care what internal representation is used.
|
To be clear, there are a lot of changed lines across 64 files. But that is rather the point of this pull request - any side chain that changes the type for representing money from int64_t to something else has to change all these lines, thereby making it more difficult to share patches or rebase to keep the side chain up to date. One of the few user visible changes that result from this is the use of strings instead of integers or doubles in the JSON-RPC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, this was left over from a previous version of the patch that didn't have operator/() implemented for CMoney. There's some talk on IRC about getting rid of operator*() and operator/(), so the problem still remains however. This code will get reworked as part of that, if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm going to push back on this one. Now that I've eliminated division of monetary values, this is not so trivial to implement. You have your choice of ToInt64() and ToDouble(), and assuming no bias about the underlying format I would argue that ToDouble() is the correct choice here - keep as many bits of precision for the comparison as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have little problem with using floating-point for policy rules like this.
|
Couldn't you accomplish mostly the same (but easier to review that behaviour is unchanged) with Sure, for other chains the definition of CMoney may be more complex, but that's not our problem. |
|
@laanwj, no all the code which assumes CMoney is an integer would have to change (and there's quite a bit of that). This pull request only allows a specific subset of arithmetic functionality, and requires explicit casts (with rounding modes) to treat the money value as anything else. Explicit is better than implicit, especially when the point is to maintain code compatibility across implementations with different underlying types. |
|
I'm fine with a type to encapsulate the representation and arithmetic of monetary amounts, but does it really need to deal with conversion from/to human readable format too? Especially since it's depended on by core, that feels like functionality that belongs on a higher level. Leaving that in util makes the impact of the patch smaller too. |
|
NACK from me. I don't think this passes the cost-to-review-for-correctness versus potential-benefit-for-Bitcoin test. |
|
@sipa util.h is loaded by core.cpp. I have no problem reverting that change, if that is the consensus decision. I just thought it was cleaner this way. |
|
@gavinandresen is there anything I could do along the lines of additional tests to make it easier to be confident that it's correct? |
|
How far do we want to go to facilitate altcoins? Well, I'm fine if it is a matter of renaming types, but not if it gives us more maintenance or testing overhead. Hence I'd suggest splitting this up.
I also agree with @sipa. Don't put formatting on the number type. I'm in favor of separating data and operations on it. |
|
@laanwj I think you are underestimating the importance of semantic explicitness and type safety here. A simple Regarding formatting, I'm not sure I understand. Aren't Maybe it does belong in |
|
A ToString() method is useful for debugging, but I don't think it should be used for interaction with humans. Even for conversion of data types like CKey or CKeyId there is a separate module (base58) for conversion to the human form. We're not quite there yet, but I like to see the code evolve to a point where we can split off a "core data structures" library, which only contains definitions for structures used by the protocol and their serializations. That includes serialize, core, protocol, just the data types from script (not verification/signing), and if committed UTXOs become normative, also part of coins. No logging, no fee policy constants, and preferably no dependencies at all. |
|
@maaku I'll only accept this if the introduced money type has exactly the same behavior as int64_t. Defining division to be different, for example, or rounding, is not acceptable. If you insist on using a class that's possible (and can also be straightforward), but it's somewhat harder to verify than a simple typedef. Having a few lines scattered around the code that assume that the money type is integer is fine. For bitcoin this will never change. I'm sorry to say, but making it easier for altcoins to merge our code changes is not high on my priority list. I'm okay with renaming types but not anything that has more risk or stresses our already very limited testing and review capacity. |
|
I would be very happy if multiple altcoins would do things like changing the money type :D |
|
The debugging/user-string distinction makes sense. I reverted FormatMoney() and ParseMoney() back into util.cpp, squashed the followon commits, and broke the original commit in two: the first commit does the int64_t -> CMoney change, the second adds type safety by making CMoney a separate class. |
|
I have added documentation about EncodeDouble, and moved the CompressAmount / DecompressAmount code to money.cpp, where it is used by a new CCompressedMoney type, as per comments from sipa. I have also added some basic tests of the EncodeDouble / DecodeDouble routines. |
Provides increased type safety of monetary calculations in bitcoin. Also brings code parity between bitcoin and side chains which use a different type for representing and performing arithmetic on coin balances, e.g. Freicoin's use of the GMP library's arbitrary-precision rational number type. This greatly reduces the friction in sharing code and submitting code upsream.
|
I still don't like invoking rounding code (which, imho, is human-machine conversion stuff, not consensus code, even if it's some special mode for that purpose) from within core data structures. Can't you add an IMPLEMENT_SERIALIZE to CMoney itself, which in Bitcoin's implementation just serializes as int64_t, and READWRITE(nAmount) as before in CTxOut? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might it make sense to turn this into a CMoney::CheckRange() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. It'll increase the review burden, but is much cleaner.
|
There is no rounding code. I stripped all that out; for Bitcoin, conversion to either int64_t or double is lossless and round-trip (although the consensus risk for double is higher due to poor FPU or compiler optimization, and should not be used in consensus code). The ROUND_X modes are present for future-proofing purposes, to ensure that if/when CMoney is changed from integer to some other numeric type, the then lossy conversions to int64_t or double are performed correctly. ROUND_SIGNAL, which is what is used in the consensus code, is a non-rounding mode with the semantics: "do not round under any circumstances, if the value is not lovelessly and round-trip serializable into the target type, throw an exception." Actual rounding modes are only specified for human-machine interface code (e.g. GUI, RPC api) and non-consensus code (e.g. transaction priority). About serialization, the issue is that how it gets serialized could depend on context. In particular, serialization to int64_t will always be done in some cases, such as bitcoin-compatible nVersion=1 transactions for pegging. So when serializing an output you will need to know nVersion and sometimes inclusion height of the transaction in order to make a decision about serialization format. So anyway, that's why CTxOut needs to be the way it is, but maybe there is use for native serialization in some contexts.. I haven't thought too much about that. It wouldn't be difficult to add an IMPLEMENT_SERIALIZE to CMoney. The reason it is not there is so that if someone tries such a READWRITE(nValue), they'll get an error and realize they have to think about these things ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #3959 and commit 3aa50118ca8cc8972d1c6b6dffe10963ceb86b7d where I represent fees with a CFeeRate class. It is not type-correct to express fees (which are money-per-transaction-size) as CMoney.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads-up. I'll avoid basing pull-requests on top of pull-requests, but as soon as #3959 gets merged I'll merge accordingly.
|
On Fri, May 23, 2014 at 7:00 PM, Mark Friedenbach
Saying that serializing a CTxOut requires invoking some utility to
I'm just saying: if we're introducing a type for encapsulating money |
|
Ok a CMoneyCompressor-like class (CTxOutValueSerializer?) seems like a reasonable compromise. A little messy because it'll need the transaction metadata inside the txout serializer, but I'll see if I can make it work in a clean way. |
|
Hmm, yeah agree. Putting responsibility about knowledge of transaction versions in money.* isn't very clean. But the inherent problem is that they can't be cleanly separated if they're dependent on each other. Right now, however, this is not a problem in Bitcoin, and my reason for liking this is modularization - not future compatibility with things that Bitcoin itself are unlikely to do. |
|
To be honest I'm still not enthusiastic to merge this at all. As I've said before, it sounds like a overdesign for something that we don't need in bitcoin. |
|
I agree with @laanwj -- NACK on merging this from me, doesn't pass my cost-to-review versus benefit-to-the-codebase test. |
|
@laanwj @gavinandresen This solves actual ongoing maintenance problems being encountered by people using the bitcoin code base today, and is good software engineering for bitcoin. If your only opinion is it lacking in your own review prioritization criteria, that's fine. But please save the NACK for code that you actively want to prevent from being adopted in bitcoin, lest that word lose all meaning. Or am I misunderstanding your intent? This patch may not fix any of your own problems or pet issues, but it has significant impact on mine and others. It is solving real-world maintenance headaches that are resulting in unnecessarily divergent and less-frequently updated code bases and provides type safety to help prevent future consensus-risk logic errors. |
|
@maaku So if the codebases were less diverged we might expect to see you working upstream more? :) |
|
@gmaxwell, that's the whole point!
|
|
Speaking as a maintainer of a alto implementation, strong NACK. I'd hate to find out we accidentally added a consensus critical difference from standard into semantics that mattered; changing alto implementations to in turn match those semantics would be difficult and convoluted, especially on stuff like python where each level of abstraction has a significant performance cost. Why change what isn't broken for the sake of non-bitcoin projects? Now making a money_value_t typedef is fine and useful for type safety, but only because that doesn't change behaviour. |
|
(s/alto/alt/, stupid android keyboard) |
|
@petertodd It seems we agree. I've said so too - a typedef is fine with me. Let's pull the first commit only, which does just that. |
|
@laanwj ACK on a typedef. Its worth remembering with optional changes like this that if we create a fork accidentally miners alone end up losing something like $50k/hour, and the reputation of Bitcoin will take a hit. (particularly given the reasons for making this change) I've already spoken to mining pools that stick with 0.8 because they consider the 0.9 changes to be risky; no sense giving them even more reasons to worry. |
|
There are two very different questions here. One question is whether this is a net improvement to the codebase - independent of the risk. In my opinion, it is, if it actually encapsulated certain responsibilities out of other code. If it helps forked codebases contribute upstream, that's a nice extra benefit. However, If there is any doubt at all that this might introduce consensus changes, I don't think anyone would even think about merging. If not enough people consider that proven, it will have to wait. If people consider it not worth the review time, it may not be merged at all ever - that's just cost vs. benefits. Anyway, ACK on a typedef. Let's see where we go from there. |
|
The typedef doesn't solve the problem of the code making assumptions about the type being an int64 (which is the encapsulation/readability improvement that would directly benefit bitcoin). But merging a typedef would be simpler and it would help a little bit with forked codebases. |
|
@petertodd or anyone else, please point to a single line of code that introduces consensus risk. This patch in no way changes the user- or network-visible behavior of the client It was very carefully constructed with precisely that goal in mind. CMoney is still int64, just wrapped in a class! You'll see in the PR history that I've removed features such as operator/() which were safe but less obviously so. Making CMoney a typedef has zero type safety benefits. Sorry, but that's how C++ works. It will be treated exactly the same by the compiler as before. Merging just the first commit is better than nothing, and helps me considerably. But all of the direct type safety benefits to bitcoin are in the 2nd commit, and so I strongly recommend reviewing it. |
|
@maaku In Bitcoin even subtle compiler bugs lead to consensus problems; any change is risky. For me to properly review the CMoney class - specifically the idea that it's "just" int64 wrapped in a class - would require me to get out the C++ standard and actually check that there isn't any obscure gotchas. Again, this is a change primarily for the benefit of other currencies, not Bitcoin. Why should we take that risk just so much smaller alts - or commercial ventures with non-public business plans in the case of sidechains - can benefit? (note how you've included rounding modes, yet they aren't used in the Bitcoin code) We're the ones with tens of thousands of dollars an hour downtime costs. |
|
Since some altcoins seem to benefit from this change, maybe they can merge this first and incorperate it in their next client version. After some production experience, it might be easier to accept this change here. |
|
By encapsulating functionality this increases the quality, maintainability, readability and future extensibility of bitcoin. @leofidus In the case of Freicoin, the main problem is to rebase the few technical changes it has on top of every new bitcoin release. If Freicoin accepts this change but bitcoin doesn't, the rebase won't get any simpler, probably worse. Maybe if bitcoin accepts the typedef, Freicoin can deploy the class version before bitcoin does, but I'm not sure that will be very useful since Freicoin would actually implement another version of the class, not the one equivalent to int64 that bitcoin would use. |
|
@jtimon @leofidus Actually Freicoin has effectively had this change since the first code was written almost two years ago, although we use the GMP mpq_class directly instead of encapsulating it in CMoney, but the result is the same. Any upstream (bitcoin) change which deals with monetary values -- which is just about everything except network code -- results in a merge conflict and requires manual intervention. Likewise any Freicoin changes we want to move upstream require manual editing and re-testing, which is a barrier to all-volunteer efforts. |
|
@jtimon > "Improve bitcoin's divisibility would be a trivial hardfork if it's required in the future" I have read and written this sentence many times and this PR would actually make it true. By the time Bitcoin is worth the 6x or so orders of magnitude more required to make a divisibility hardfork necessary, changing CMoney would be the least of our worries with regard to a divisibility upgrade. As for "good software engineering practices", consensus-critical code has very different good practices than normal code. Separating the wallet code - which shouldn't change consensus behavior - from the consensus critical code was a good idea, albeit one that needed to be very carefully evaluated. Making a CMoney class might be a good idea, but there's no need to rush it; a simple typedef is a conservative change that probably will have no impacts. I understand that you want a much less conservative change for your own reasons - Freicoin - but there's simply no reason why the rest of the Bitcoin community should take on risk for your benefit. Remember that the fact that Freicoin has done this change a long while ago does not help in testing the actual concern, which is that a CMoney class may have different consensus critical behavior to int64. To test that you'd have to actually test the transition period between having int64 and CMoney; Freicoin never did that. |
|
I agree that defining a money type is somewhat 'better software engineering' when looked at it some ways. But I see the future of Bitcoin Core as just Bitcoin Core, not 'Altcoin Core'. Eventually I want to enshrine the Bitcoin-specific consensus code in a library that as is as small and independent as possible and completely specific to Bitcoin. Altcoins can create their own consensus libraries. I expect there to be little or no code sharing between these. The consensus code does not need to be general. For this goal I think inside the consensus code it is good to have more low-level concreteness, not more abstractions. Recently we already went from the CBigNum type to well-defined, concrete uint256/uint160/CScriptNum types. However in the rest of the code that interfaces with client applications (RPC), or the user, or provides some other interface, a more abstract interface that isolates from the specific representation of money inside the system is useful. |
|
Since it seems that nobody would oppose to the innocuous typedef change, I think that probably it would be wise to separate that (which is not that interesting IMO but could be easily accepted) from the class approach to encapsulate serialization and arithmetic, which is clearly much more controversial. Please tell me if I'm going off-topic or I should move this to the mailing list, I'm trying to understand the reasoning better since I feel it can help me with my own pull requests. On the other hand, I don't understand the "review-costs" arguments. If someone doesn't have time to review certain change or doesn't think it's a priority to do so, that's completely understandable. Not reviewing a PR for this reason and leaving it in limbo is one thing. But justifying a NACK to "cut review costs" seems at the very least premature to me. Again, my apologies if this is not the place for these later observations. |
|
Am I understanding this correctly that leaving consensus-code unmodified (or typedef'ed) and introducing CMoney in the rest of the codebase would be a compromise which would help everyone (bitcoin has no fork risk but better modularization, freicoin has less work merging changes)? I can at least say that that version would benefit Dogecoin, which would mean that you would also get more help from us testing it. |
|
@jtimon It's not so much about 'review costs', because review will almost certainly not find all bugs/inconsistencies. It is the costs of some unexpected change causing a fork. Years after Satoshi left people are still finding small unexpected cases in the consensus code. Just that no one can point them out from glancing at the code doesn't mean that no problems exist. Making the code (maybe) a bit more readable and consistent is just not worth that. I hope you understand. If the potential benefits don't outweigh the potential trouble no one wants to take the risk to merge it, and a NACK is in order. That's better than keeping it in limbo forever, IMO. @leofidus Yes if this could be done only outside the consensus code, it would be acceptable. That'd require an interface layer between the consensus code and the rest, which would be a good thing anyway. It is aligned with the goal of making a consensus library. |
|
Automatic sanity-testing: FAILED MERGE, see http://jenkins.bluematt.me/pull-tester/p4067_fc1cd73d0ff73e26da2548193c630b3b8a4d630b/ for test log. This pull does not merge cleanly onto current master |
|
Closing this in favor of #4234. |
|
After #4234 has been merged, a rebased version of this should be much easier to review. |
|
@jtimon You could save yourself the trouble. As I've said multiple times (also above here), I'm willing to go as far as type-defing the money type. That's what we did, so as far as I'm concerned this issue is now closed. |
|
Maybe this goes too far, but wouldn't be nice to have a class and convert the functions in utilmoneystr to methods of that class? |
|
At least please keep core data structure definition and human interaction separate. So no, don't move any utilmoneystr to amount.o or CAmount - that would imply having everything using any core datastructure at all depend on that conversion code. |
|
I'm for keeping the data type and operations such as formatting on it decoupled. Especially as formatting could be done in different ways, depending on the context, it's not an innate property of the type, and indeed not necessary by all clients. |
Provides code parity between bitcoin and side chains which use a different type for representing and performing arithmetic on coin balances, e.g. Freicoin's use of the GMP library's arbitrary-precision rational number type for increased divisibility and interest calculations. This greatly reduces the friction in sharing code and submitting code upstream. Also organizes related functionality such as FormatMoney and ParseMoney into a single class framework.