Skip to content

Conversation

@kallewoof
Copy link
Contributor

This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg.
Also adds ToUpper() string version.

Additionally, it clarifies that the locale independency of the case functions is a feature and not a limitation. I interpreted it as the latter and rewrote code to be locale-aware before realizing this.

This is done in preparation for #11413 and as a general refactor. I don't think the optimization that the pre-refactor state gave warrants the unwieldy usage.

This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg.
Also adds ToUpper() string version.
@practicalswift
Copy link
Contributor

practicalswift commented Aug 8, 2019

Concept ACK

Thanks for cleaning this up.

Will review the code.

I don't think the optimization that the pre-refactor state gave warrants the unwieldy usage.

Very much so. (Also unclear if the pre-refactor state is a meaningful optimisation in practice.)

@Sjors
Copy link
Member

Sjors commented Aug 8, 2019

Concept ACK for improved readability and usability.

@laanwj
Copy link
Member

laanwj commented Aug 8, 2019

Concept ACK having these as functions seems better than doing the transformation in-place

Additionally, it clarifies that the locale independency of the case functions is a feature and not a limitation

Yes, please do not ever add any locale dependent code to bitcoind. Deterministic behavior is the most important thing there (it's okay for bitcoin-qt but only for UI purposes).

Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

std::string ToLower(const std::string& str)
{
std::transform(str.begin(), str.end(), str.begin(), [](char c){return ToLower(c);});
std::string r;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could reserve, same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does that mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@practicalswift practicalswift Aug 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@promag I don't think using r.reserve(str.size()); necessarily results in faster execution here. I think it depends on the SSO implementation and the length of str.

If str.size() <= 15 I think not using reserve is faster when using libstdc++ or MSVC. And str.size() <= 22 for libc++.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but I'm not particularly worried about performance, i think it improves the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserving seems sensible. It would actually slow down the code, @practicalswift ? Crazy.
I would have never expected that. I'm indifferent on this, though, and opinions are mixed so leaving as is for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kallewoof You can test this using the following Google Benchmark snippet:

#include <algorithm>
#include <string>

constexpr char ToLower(char c) {
    return (c >= 'A' && c <= 'Z' ? (c - 'A') + 'a' : c);
}

std::string ToLower_1(const std::string& str) {
    std::string r;
    for (auto ch : str) {
        r += ToLower((unsigned char)ch);
    }
    return r;
}

std::string ToLower_2(const std::string& str) {
    std::string r;
    r.reserve(str.size());
    for (auto ch : str) {
        r += ToLower((unsigned char)ch);
    }
    return r;
}

static void BenchToLower_1(benchmark::State& state) {
  for (auto _ : state) {
    const std::string s1{"HELLO"};
    benchmark::DoNotOptimize(s1);
    const std::string s2 = ToLower_1(s1);
    benchmark::DoNotOptimize(s2);
  }
}
BENCHMARK(BenchToLower_1);

static void BenchToLower_2(benchmark::State& state) {
  for (auto _ : state) {
    const std::string s1{"HELLO"};
    benchmark::DoNotOptimize(s1);
    const std::string s2 = ToLower_2(s1);
    benchmark::DoNotOptimize(s2);
  }
}
BENCHMARK(BenchToLower_2);

Notice that ToLower_1 which doesn't reserve is slightly faster for strings up to 15 chars for libstdc++ and MSVC and up to 22 chars for libc++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@practicalswift Thanks for that! Very educational. Memory allocations are tricky to optimize, I guess. :)

enum Network ParseNetwork(std::string net) {
Downcase(net);
enum Network ParseNetwork(const std::string& net_in) {
std::string net = ToLower(net_in);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be net = ToLower(net) and avoid changing the signature? Both are fine though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the signature is an improvement here IMO

@luke-jr
Copy link
Member

luke-jr commented Aug 8, 2019

While there are cases where Downcase was unwieldy used, I don't think it is unwieldy as a rule, and there's no real reason to go out of the way to make things less efficient. The new functions can be added without making other code worse.

Copy link
Contributor

@l2a5b1 l2a5b1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid that I don't see the significant developer improvement of the refactored Downcase function. Usage of the refactored ToLower function at the two call sites also seems more unwieldy than the pre-refactor state, because both call sites expected the string argument to be transformed.

Refactoring Downcase is also not required for #11413.

I think we should create a ToLower that returns the result when we actually need it (and even then - as in boost for example - it could make sense to have both functions).

enum Network ParseNetwork(std::string net) {
Downcase(net);
enum Network ParseNetwork(const std::string& net_in) {
std::string net = ToLower(net_in);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the optimization that the pre-refactor state gave warrants the unwieldy usage.

It seems to me that the usage of the refactored function is more unwieldy than the pre-refactor state.

Downcase(net) straightforwardly downcases the string. The refactored-state needs two variables (const std::string& net_in, std::string net) and an assignment to achieve the same result.

}

void Downcase(std::string& str)
std::string ToLower(const std::string& str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the significant developer benefit of this change and why is the required in preparation for #11413? This function is called at just two call sites and the pre-refactor state expects that the argument is directly transformed.

}
#ifdef WIN32
std::transform(key.begin(), key.end(), key.begin(), ToLower);
key = ToLower(key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-refactor this would just be Downcase(key).

@laanwj
Copy link
Member

laanwj commented Aug 8, 2019

I think we should create a ToLower that returns the result when we actually need it (and even then - as in boost for example - it could make sense to have both functions).

Having two sets of uppercase/lowercase functions would be overkill, especially given how little they are used. Efficiency isn't a point, either, because where they are used only, like, a single time per invocation of the executable, for argument processing.

Returning their result is the normal, surprise-free way for functions to work. Changing things in-place is an optimization, unnecessary in this case. The new way is imo not more unwieldy, definitely not if you need to keep the old string.

Even with that, even if performance mattered

+ enum Network ParseNetwork(std::string net) {
+    Downcase(net);
- enum Network ParseNetwork(const std::string& net_in) {
-    std::string net = ToLower(net_in);

Is not an overall loss of efficiency. In the old case, net is copied as input argument then transformed in place. In the new case (=1 string allocation), in the new case the function takes the argument as reference but ToLower creates and returns a new string with the uppercase data (=1 string allocation).

@l2a5b1
Copy link
Contributor

l2a5b1 commented Aug 8, 2019

@laanwj thanks, appreciate the feedback.

Having two sets of uppercase/lowercase functions would be overkill, especially given how little they are used.

There are no two sets of uppercase/lowercase functions and it is also not what I did propose.

The new way is imo not more unwieldy, definitely not if you need to keep the old string.

The code at the call sites does not need to keep the old string.

As far as performance or efficiency argument is concerned, I didn't mention that in my feedback.

What I do argue is that there don't seem to be any benefits to refactoring Downcase:

  • It is not required for [wallet] [rpc] sendtoaddress/sendmany: Add explicit feerate option #11413;
  • There are no significant developer benefits, because the call sites don't need the lower cased string to be returned;
  • Use of the refactored function at the two call sites is more unwieldy than with the Downcase equivalent, because with ToLower you need to do more to achieve the same result.

Hence:

I think we should create a ToLower that returns the result when we actually need it

@laanwj
Copy link
Member

laanwj commented Aug 8, 2019

Right, okay, I'm not going to argue this any further

ACK 0481fa2 from me

@kallewoof
Copy link
Contributor Author

@l2a5b1

Thanks for your feedback. The function (Downcase()) is used in one single place, and its behavior is not as anyone would expect, because someone at some point thought in-place replacement would be an optimization worthy of going for. I disagree, and as you can see, others here disagree as well.

Generally speaking, if you have a choice between straight-forward/expected behavior and unexpected-but-optimized behavior, you usually wanna go for the former, unless the code is in some way performance critical (e.g. it slows things down significantly).

Seeing as ParseNetwork, the only function using the Downcase function in the code, is called one single time in init.cpp, it makes no sense to keep a non-intuitive optimization around.

I am changing it in preparation for #11413 because I need the uppercase variant, and having one void Downcase(std::string& str) function which performs in-place replacement, and one std::string ToUpper(const std::string& str) function which does not, would be bloody confusing for everybody.

@luke-jr Honestly, this function is called one single time. Let's not keep things confusing on purpose.

@practicalswift
Copy link
Contributor

ACK 0481fa2 -- diff looks correct

Agree fully with @kallewoof and @laanwj regarding the choice between "straight-forward/expected" vs "unexpected-but-optimized" for code paths that are not performance critical.

@l2a5b1
Copy link
Contributor

l2a5b1 commented Aug 9, 2019

@kallewoof thanks! :)

ACK 0481fa2 - Although, I think @luke-jr's feedback is spot on; Downcase is just an artifact of ParseNetwork, which has been happily downcasing net via a string argument for over 7 years; and I do recommend to add ToLower when somebody actually needs it in new work, there is no point in keeping a trivial utility function if it is not appreciated.

@promag
Copy link
Contributor

promag commented Aug 12, 2019

ACK 0481fa2.

@fanquake fanquake merged commit 0481fa2 into bitcoin:master Aug 13, 2019
fanquake added a commit that referenced this pull request Aug 13, 2019
0481fa2 util: refactor upper/lowercase functions (Karl-Johan Alm)

Pull request description:

  This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg.
  Also adds ToUpper() string version.

  Additionally, it clarifies that the locale independency of the case functions is a *feature* and not a limitation. I interpreted it as the latter and rewrote code to be locale-aware before realizing this.

  This is done in preparation for #11413 and as a general refactor. I don't think the optimization that the pre-refactor state gave warrants the unwieldy usage.

ACKs for top commit:
  laanwj:
    ACK 0481fa2 from me
  practicalswift:
    ACK 0481fa2 -- diff looks correct
  l2a5b1:
    ACK 0481fa2 - Although, I think @luke-jr's [feedback](#16566 (comment)) is spot on; `Downcase` is just an artifact of `ParseNetwork`, which has been happily downcasing `net` via a string argument for over 7 years; and I do recommend to add `ToLower` *when* somebody actually needs it in new work, there is no point in keeping a trivial utility function if it is not appreciated.
  promag:
    ACK 0481fa2.

Tree-SHA512: 9b834ecc1b97db043e261bcbc59e42372e11e2fb9a6943688f18a835bf5c9205f68e4614f58e90ba260d1b8f0e060c6f67b390b62436c21b56891db23bc41628
@kallewoof kallewoof deleted the 2019-08-casefuns-refactor branch August 13, 2019 04:58
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Aug 13, 2019
0481fa2 util: refactor upper/lowercase functions (Karl-Johan Alm)

Pull request description:

  This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg.
  Also adds ToUpper() string version.

  Additionally, it clarifies that the locale independency of the case functions is a *feature* and not a limitation. I interpreted it as the latter and rewrote code to be locale-aware before realizing this.

  This is done in preparation for bitcoin#11413 and as a general refactor. I don't think the optimization that the pre-refactor state gave warrants the unwieldy usage.

ACKs for top commit:
  laanwj:
    ACK 0481fa2 from me
  practicalswift:
    ACK 0481fa2 -- diff looks correct
  l2a5b1:
    ACK 0481fa2 - Although, I think @luke-jr's [feedback](bitcoin#16566 (comment)) is spot on; `Downcase` is just an artifact of `ParseNetwork`, which has been happily downcasing `net` via a string argument for over 7 years; and I do recommend to add `ToLower` *when* somebody actually needs it in new work, there is no point in keeping a trivial utility function if it is not appreciated.
  promag:
    ACK 0481fa2.

Tree-SHA512: 9b834ecc1b97db043e261bcbc59e42372e11e2fb9a6943688f18a835bf5c9205f68e4614f58e90ba260d1b8f0e060c6f67b390b62436c21b56891db23bc41628
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jun 4, 2020
Summary:
This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg.
Also adds ToUpper() string version.

This is a backport of Core [[bitcoin/bitcoin#16566 | PR16566]]

Test Plan:
  ninja all check-all

Reviewers: #bitcoin_abc, jasonbcox

Reviewed By: #bitcoin_abc, jasonbcox

Differential Revision: https://reviews.bitcoinabc.org/D6376
ShengguangXiao pushed a commit to DeFiCh/ain that referenced this pull request Aug 28, 2020
f39100e util: refactor upper/lowercase functions (Karl-Johan Alm)

Pull request description:

  This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg.
  Also adds ToUpper() string version.

  Additionally, it clarifies that the locale independency of the case functions is a *feature* and not a limitation. I interpreted it as the latter and rewrote code to be locale-aware before realizing this.

  This is done in preparation for #11413 and as a general refactor. I don't think the optimization that the pre-refactor state gave warrants the unwieldy usage.

ACKs for top commit:
  laanwj:
    ACK f39100e from me
  practicalswift:
    ACK f39100e -- diff looks correct
  l2a5b1:
    ACK f39100e - Although, I think @luke-jr's [feedback](bitcoin/bitcoin#16566 (comment)) is spot on; `Downcase` is just an artifact of `ParseNetwork`, which has been happily downcasing `net` via a string argument for over 7 years; and I do recommend to add `ToLower` *when* somebody actually needs it in new work, there is no point in keeping a trivial utility function if it is not appreciated.
  promag:
    ACK f39100e.

Tree-SHA512: 9b834ecc1b97db043e261bcbc59e42372e11e2fb9a6943688f18a835bf5c9205f68e4614f58e90ba260d1b8f0e060c6f67b390b62436c21b56891db23bc41628
kwvg added a commit to kwvg/dash that referenced this pull request Nov 1, 2021
UdjinM6 added a commit to dashpay/dash that referenced this pull request Nov 2, 2021
pravblockc pushed a commit to pravblockc/dash that referenced this pull request Nov 18, 2021
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants