-
Notifications
You must be signed in to change notification settings - Fork 38.8k
util: refactor upper/lowercase functions #16566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg. Also adds ToUpper() string version.
|
Concept ACK Thanks for cleaning this up. Will review the code.
Very much so. (Also unclear if the pre-refactor state is a meaningful optimisation in practice.) |
|
Concept ACK for improved readability and usability. |
|
Concept ACK having these as functions seems better than doing the transformation in-place
Yes, please do not ever add any locale dependent code to |
promag
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
| std::string ToLower(const std::string& str) | ||
| { | ||
| std::transform(str.begin(), str.end(), str.begin(), [](char c){return ToLower(c);}); | ||
| std::string r; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could reserve, same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does that mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@promag I don't think using r.reserve(str.size()); necessarily results in faster execution here. I think it depends on the SSO implementation and the length of str.
If str.size() <= 15 I think not using reserve is faster when using libstdc++ or MSVC. And str.size() <= 22 for libc++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but I'm not particularly worried about performance, i think it improves the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reserving seems sensible. It would actually slow down the code, @practicalswift ? Crazy.
I would have never expected that. I'm indifferent on this, though, and opinions are mixed so leaving as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kallewoof You can test this using the following Google Benchmark snippet:
#include <algorithm>
#include <string>
constexpr char ToLower(char c) {
return (c >= 'A' && c <= 'Z' ? (c - 'A') + 'a' : c);
}
std::string ToLower_1(const std::string& str) {
std::string r;
for (auto ch : str) {
r += ToLower((unsigned char)ch);
}
return r;
}
std::string ToLower_2(const std::string& str) {
std::string r;
r.reserve(str.size());
for (auto ch : str) {
r += ToLower((unsigned char)ch);
}
return r;
}
static void BenchToLower_1(benchmark::State& state) {
for (auto _ : state) {
const std::string s1{"HELLO"};
benchmark::DoNotOptimize(s1);
const std::string s2 = ToLower_1(s1);
benchmark::DoNotOptimize(s2);
}
}
BENCHMARK(BenchToLower_1);
static void BenchToLower_2(benchmark::State& state) {
for (auto _ : state) {
const std::string s1{"HELLO"};
benchmark::DoNotOptimize(s1);
const std::string s2 = ToLower_2(s1);
benchmark::DoNotOptimize(s2);
}
}
BENCHMARK(BenchToLower_2);
Notice that ToLower_1 which doesn't reserve is slightly faster for strings up to 15 chars for libstdc++ and MSVC and up to 22 chars for libc++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@practicalswift Thanks for that! Very educational. Memory allocations are tricky to optimize, I guess. :)
| enum Network ParseNetwork(std::string net) { | ||
| Downcase(net); | ||
| enum Network ParseNetwork(const std::string& net_in) { | ||
| std::string net = ToLower(net_in); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be net = ToLower(net) and avoid changing the signature? Both are fine though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the signature is an improvement here IMO
|
While there are cases where |
l2a5b1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid that I don't see the significant developer improvement of the refactored Downcase function. Usage of the refactored ToLower function at the two call sites also seems more unwieldy than the pre-refactor state, because both call sites expected the string argument to be transformed.
Refactoring Downcase is also not required for #11413.
I think we should create a ToLower that returns the result when we actually need it (and even then - as in boost for example - it could make sense to have both functions).
| enum Network ParseNetwork(std::string net) { | ||
| Downcase(net); | ||
| enum Network ParseNetwork(const std::string& net_in) { | ||
| std::string net = ToLower(net_in); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the optimization that the pre-refactor state gave warrants the unwieldy usage.
It seems to me that the usage of the refactored function is more unwieldy than the pre-refactor state.
Downcase(net) straightforwardly downcases the string. The refactored-state needs two variables (const std::string& net_in, std::string net) and an assignment to achieve the same result.
| } | ||
|
|
||
| void Downcase(std::string& str) | ||
| std::string ToLower(const std::string& str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the significant developer benefit of this change and why is the required in preparation for #11413? This function is called at just two call sites and the pre-refactor state expects that the argument is directly transformed.
| } | ||
| #ifdef WIN32 | ||
| std::transform(key.begin(), key.end(), key.begin(), ToLower); | ||
| key = ToLower(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-refactor this would just be Downcase(key).
Having two sets of uppercase/lowercase functions would be overkill, especially given how little they are used. Efficiency isn't a point, either, because where they are used only, like, a single time per invocation of the executable, for argument processing. Returning their result is the normal, surprise-free way for functions to work. Changing things in-place is an optimization, unnecessary in this case. The new way is imo not more unwieldy, definitely not if you need to keep the old string. Even with that, even if performance mattered + enum Network ParseNetwork(std::string net) {
+ Downcase(net);
- enum Network ParseNetwork(const std::string& net_in) {
- std::string net = ToLower(net_in);Is not an overall loss of efficiency. In the old case, |
|
@laanwj thanks, appreciate the feedback.
There are no two sets of uppercase/lowercase functions and it is also not what I did propose.
The code at the call sites does not need to keep the old string. As far as performance or efficiency argument is concerned, I didn't mention that in my feedback. What I do argue is that there don't seem to be any benefits to refactoring
Hence:
|
|
Right, okay, I'm not going to argue this any further ACK 0481fa2 from me |
|
Thanks for your feedback. The function ( Generally speaking, if you have a choice between straight-forward/expected behavior and unexpected-but-optimized behavior, you usually wanna go for the former, unless the code is in some way performance critical (e.g. it slows things down significantly). Seeing as I am changing it in preparation for #11413 because I need the uppercase variant, and having one @luke-jr Honestly, this function is called one single time. Let's not keep things confusing on purpose. |
|
ACK 0481fa2 -- diff looks correct Agree fully with @kallewoof and @laanwj regarding the choice between "straight-forward/expected" vs "unexpected-but-optimized" for code paths that are not performance critical. |
|
@kallewoof thanks! :) ACK 0481fa2 - Although, I think @luke-jr's feedback is spot on; |
|
ACK 0481fa2. |
0481fa2 util: refactor upper/lowercase functions (Karl-Johan Alm) Pull request description: This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg. Also adds ToUpper() string version. Additionally, it clarifies that the locale independency of the case functions is a *feature* and not a limitation. I interpreted it as the latter and rewrote code to be locale-aware before realizing this. This is done in preparation for #11413 and as a general refactor. I don't think the optimization that the pre-refactor state gave warrants the unwieldy usage. ACKs for top commit: laanwj: ACK 0481fa2 from me practicalswift: ACK 0481fa2 -- diff looks correct l2a5b1: ACK 0481fa2 - Although, I think @luke-jr's [feedback](#16566 (comment)) is spot on; `Downcase` is just an artifact of `ParseNetwork`, which has been happily downcasing `net` via a string argument for over 7 years; and I do recommend to add `ToLower` *when* somebody actually needs it in new work, there is no point in keeping a trivial utility function if it is not appreciated. promag: ACK 0481fa2. Tree-SHA512: 9b834ecc1b97db043e261bcbc59e42372e11e2fb9a6943688f18a835bf5c9205f68e4614f58e90ba260d1b8f0e060c6f67b390b62436c21b56891db23bc41628
0481fa2 util: refactor upper/lowercase functions (Karl-Johan Alm) Pull request description: This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg. Also adds ToUpper() string version. Additionally, it clarifies that the locale independency of the case functions is a *feature* and not a limitation. I interpreted it as the latter and rewrote code to be locale-aware before realizing this. This is done in preparation for bitcoin#11413 and as a general refactor. I don't think the optimization that the pre-refactor state gave warrants the unwieldy usage. ACKs for top commit: laanwj: ACK 0481fa2 from me practicalswift: ACK 0481fa2 -- diff looks correct l2a5b1: ACK 0481fa2 - Although, I think @luke-jr's [feedback](bitcoin#16566 (comment)) is spot on; `Downcase` is just an artifact of `ParseNetwork`, which has been happily downcasing `net` via a string argument for over 7 years; and I do recommend to add `ToLower` *when* somebody actually needs it in new work, there is no point in keeping a trivial utility function if it is not appreciated. promag: ACK 0481fa2. Tree-SHA512: 9b834ecc1b97db043e261bcbc59e42372e11e2fb9a6943688f18a835bf5c9205f68e4614f58e90ba260d1b8f0e060c6f67b390b62436c21b56891db23bc41628
Summary: This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg. Also adds ToUpper() string version. This is a backport of Core [[bitcoin/bitcoin#16566 | PR16566]] Test Plan: ninja all check-all Reviewers: #bitcoin_abc, jasonbcox Reviewed By: #bitcoin_abc, jasonbcox Differential Revision: https://reviews.bitcoinabc.org/D6376
f39100e util: refactor upper/lowercase functions (Karl-Johan Alm) Pull request description: This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg. Also adds ToUpper() string version. Additionally, it clarifies that the locale independency of the case functions is a *feature* and not a limitation. I interpreted it as the latter and rewrote code to be locale-aware before realizing this. This is done in preparation for #11413 and as a general refactor. I don't think the optimization that the pre-refactor state gave warrants the unwieldy usage. ACKs for top commit: laanwj: ACK f39100e from me practicalswift: ACK f39100e -- diff looks correct l2a5b1: ACK f39100e - Although, I think @luke-jr's [feedback](bitcoin/bitcoin#16566 (comment)) is spot on; `Downcase` is just an artifact of `ParseNetwork`, which has been happily downcasing `net` via a string argument for over 7 years; and I do recommend to add `ToLower` *when* somebody actually needs it in new work, there is no point in keeping a trivial utility function if it is not appreciated. promag: ACK f39100e. Tree-SHA512: 9b834ecc1b97db043e261bcbc59e42372e11e2fb9a6943688f18a835bf5c9205f68e4614f58e90ba260d1b8f0e060c6f67b390b62436c21b56891db23bc41628
merge bitcoin#16566...bitcoin#18004: auxillary backports
This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg.
Also adds ToUpper() string version.
Additionally, it clarifies that the locale independency of the case functions is a feature and not a limitation. I interpreted it as the latter and rewrote code to be locale-aware before realizing this.
This is done in preparation for #11413 and as a general refactor. I don't think the optimization that the pre-refactor state gave warrants the unwieldy usage.