Skip to content

Conversation

@JeremyRubin
Copy link
Contributor

@JeremyRubin JeremyRubin commented Aug 8, 2019

In a few places in src/wallet/wallet.h, we use a char when semantically we want a bool.

This is kind of an issue because it means we can unserialize the same transaction with different fFromMe flags (as differing chars) and evaluate the following section in wallet/wallet.cpp

        if (wtxIn.fFromMe && wtxIn.fFromMe != wtx.fFromMe)
         {
             wtx.fFromMe = wtxIn.fFromMe;
             fUpdated = true;
         }

incorrectly (triggering an fUpdated where both fFromMe values represent true, via different chars).

I don't think this is a vulnerability, but it's just a little messy and unsemantic, and could lead to issues with stored wtxIns not being findable in a map by their hash.

The serialize/unserialize code for bool internally uses a char, so it should be safe to make this substitution.

NOTE: Technically, this is a behavior change -- I haven't checked too closely that nowhere is depending on storing information in this char. Theoretically, this could break something because after this change a tx unserialized with such a char would preserve it's value, but now it is converted to a true canonical bool.

@fanquake fanquake added the Wallet label Aug 8, 2019
@laanwj
Copy link
Member

laanwj commented Aug 8, 2019

Seems like an improvement, any idea why this was a char in the first place (or how it came to be)?

I don't think this is a vulnerability, but it's just a little messy and unsemantic, and could lead to issues with stored wtxIns not being findable in a map by their hash.

Wait, this metadata is never part of the transaction hash right?

@JeremyRubin
Copy link
Contributor Author

It seems like it is just old/crufty. It was previously proposed by @ryanofsky to be removed (dummied out), but that went nowhere. #9351

Keeping the distinction seems OK to me, but the consistency issue should be fixed in any case.

The metadata isn't a part of the transaction hash, but it could be a part of the hash (like a siphash, not a crypto hash) of the object itself, e.g., for storing in a map.

@kallewoof
Copy link
Contributor

Concept ACK

There's also this which I never got why it was a char:

std::vector<char> vfIncluded;

@JeremyRubin
Copy link
Contributor Author

Actually there is a reason for that one!

Vec bool is typically compacted, so it's probably slower running because of masking.

Maybe worth profiling to find out!

@sipa
Copy link
Member

sipa commented Aug 9, 2019

From a march 2011 mail from Satoshi to me about how the wallet tracks unspent outputs:

Everything that uses fSpent must be changed to use
vfSpent instead.  I reorganised IMPLEMENT_SERIALIZE
a little better, now that it's clear how the pattern is going. 
Debatable whether to use vector<char> or vector<bool>. 
A lot of people hate vector<bool> because it has
irregularities to trip over.

So I guess the reason was indeed some worry about the std::vector weirdness (it's special cased to be bitbacked).

@JeremyRubin
Copy link
Contributor Author

JeremyRubin commented Aug 9, 2019

Hm; may be worth it to go ahead and modify these locations to use vec<char>...

blockencodings.cpp
merkleblock.cpp
rest.cpp
script/interpreter.cpp
test/pmt_tests.cpp
txdb.cpp
wallet/coinselection.cpp

Would require some profiling to ensure that there isn't a regression.

The only place that is using vector<bool> for compactness is the cuckoocache.

@kallewoof
Copy link
Contributor

Gotta love the unexpected history lessons. :)

@laanwj
Copy link
Member

laanwj commented Aug 9, 2019

Hm; may be worth it to go ahead and modify these locations to use vec...

Why? Unless you can demonstrate specific performance improvements, where the (possible) compaction causes bottlenecks, I strongly disagree that this is a reason to go around refactoring things.

From a developer angle vector<bool> is definitely less surprising to encounter (use bools for what should be bools) and unless there's a specific reason the compaction gets in the way (serialization voodoo ?), it shouldn't be a reason for concern at all.

@JeremyRubin
Copy link
Contributor Author

Ah; I don't suspect there's a performance reason to modify them, but if there's general concern for surprising behavior with the std::vector<bool>, it would be good to avoid that.

In any case, this is a bit of a side-quest from the scope of this PR.

@fanquake fanquake changed the title Fix Char as Bool in Wallet wallet: Fix Char as Bool in Wallet Aug 14, 2019
@laanwj
Copy link
Member

laanwj commented Aug 14, 2019

Ah; I don't suspect there's a performance reason to modify them, but if there's general concern for surprising behavior with the std::vector, it would be good to avoid that.

What kind of surprising behavior, really? the implementation might be different but as far as I know, the "API contract" is exactly the same?

@sipa
Copy link
Member

sipa commented Aug 14, 2019

I think the only surprising thing about it is that you can't cast a pointer to the data to a char pointer and expect any consistent representation of the data... which you shouldn't be doing anyway, but I think the early codebase did stuff like that.

@JeremyRubin
Copy link
Contributor Author

The main one is the data pointer to a bool pointer.

There are a few other differences:

  1. Concurrent modification of different bits is not guaranteed to be safe
  2. The "API Contract" differs, such that generic code can't be written accepting vector<bool> (e.g., range for loops don't work).

@achow101
Copy link
Member

Code review ACK 2dbfb37

I checked that fFromMe is only used as a bool. I also checked bools and chars are being serialized in exactly the same way so this does not change any behavior.

@laanwj
Copy link
Member

laanwj commented Aug 15, 2019

The "API Contract" differs, such that generic code can't be written accepting vector (e.g., range for loops don't work).

FWIW this works fine here … (but maybe it's a gcc-ism?)

#include <iostream>
#include <vector>

int main() {
    std::vector<bool> test;
    test.push_back(false);
    test.push_back(true);
    test.push_back(false);
    for (auto const &x: test) {
        std::cout << x << std::endl;
    };
    return 0;
}

@practicalswift
Copy link
Contributor

practicalswift commented Aug 15, 2019

@laanwj Note that the x you get in your example is likely a "bit reference" (std::_Bit_reference) and not a bool.

The gotcha:

    std::vector<char> v1;
    for (const auto& x : v1) {
      // const char& x (as expected)
    }

    std::vector<int> v2;
    for (const auto& x : v2) {
      // const int& x (as expected)
    }

    std::vector<bool> v3;
    for (const auto& x : v3) {
      // const std::_Bit_reference& x (🎉)
    }

Also note that it is implementation-defined if any space efficiency optimisations are done.

@DrahtBot
Copy link
Contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #16624 (wallet : encapsulate transactions state by ariard)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Contributor

@meshcollider meshcollider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the above discussion on vectors, this change itself looks safe to me, and does make things more sensible IMO.

Code review ACK 2dbfb37

(Your commit isn't linked to your GitHub account btw Jeremy)

fanquake added a commit that referenced this pull request Aug 21, 2019
2dbfb37 Fix Char as Bool in interfaces (Jeremy Rubin)

Pull request description:

  In a few places in src/wallet/wallet.h, we use a char when semantically we want a bool.

  This is kind of an issue because it means we can unserialize the same transaction with different fFromMe flags (as differing chars) and evaluate the following section in wallet/wallet.cpp
  ```c++
          if (wtxIn.fFromMe && wtxIn.fFromMe != wtx.fFromMe)
           {
               wtx.fFromMe = wtxIn.fFromMe;
               fUpdated = true;
           }
  ```
  incorrectly (triggering an fUpdated where both fFromMe values represent true, via different chars).

  I don't think this is a vulnerability, but it's just a little messy and unsemantic, and could lead to issues with stored wtxIns not being findable in a map by their hash.

  The serialize/unserialize code for bool internally uses a char, so it should be safe to make this substitution.

  NOTE: Technically, this is a behavior change -- I haven't checked too closely that nowhere is depending on storing information in this char. Theoretically, this could break something because after this change a tx unserialized with such a char would preserve it's value, but now it is converted to a ~true~ canonical bool.

ACKs for top commit:
  achow101:
    Code review ACK 2dbfb37
  meshcollider:
    Code review ACK 2dbfb37

Tree-SHA512: 8c0dc9cf672aa2276c694facbf50febe7456eaa8bf2bd2504f81a61052264b8b30cdb5326e1936893adc3d33504667aee3c7e207a194c71d87b3e7b5fe199c9d
@fanquake fanquake merged commit 2dbfb37 into bitcoin:master Aug 21, 2019
@laanwj
Copy link
Member

laanwj commented Aug 21, 2019

Regardless of the above discussion on vectors, this change itself looks safe to me, and does make things more sensible IMO.

Yes, agree, sorry it got derailed. Posthumous ACK 2dbfb37. This change is the right way around, I was arguing against going bool to char.

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Aug 22, 2019
2dbfb37 Fix Char as Bool in interfaces (Jeremy Rubin)

Pull request description:

  In a few places in src/wallet/wallet.h, we use a char when semantically we want a bool.

  This is kind of an issue because it means we can unserialize the same transaction with different fFromMe flags (as differing chars) and evaluate the following section in wallet/wallet.cpp
  ```c++
          if (wtxIn.fFromMe && wtxIn.fFromMe != wtx.fFromMe)
           {
               wtx.fFromMe = wtxIn.fFromMe;
               fUpdated = true;
           }
  ```
  incorrectly (triggering an fUpdated where both fFromMe values represent true, via different chars).

  I don't think this is a vulnerability, but it's just a little messy and unsemantic, and could lead to issues with stored wtxIns not being findable in a map by their hash.

  The serialize/unserialize code for bool internally uses a char, so it should be safe to make this substitution.

  NOTE: Technically, this is a behavior change -- I haven't checked too closely that nowhere is depending on storing information in this char. Theoretically, this could break something because after this change a tx unserialized with such a char would preserve it's value, but now it is converted to a ~true~ canonical bool.

ACKs for top commit:
  achow101:
    Code review ACK 2dbfb37
  meshcollider:
    Code review ACK 2dbfb37

Tree-SHA512: 8c0dc9cf672aa2276c694facbf50febe7456eaa8bf2bd2504f81a61052264b8b30cdb5326e1936893adc3d33504667aee3c7e207a194c71d87b3e7b5fe199c9d
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants