-
Notifications
You must be signed in to change notification settings - Fork 38.7k
Remove the boost/algorithm/string/case_conv.hpp dependency #13671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the boost/algorithm/string/case_conv.hpp dependency #13671
Conversation
c125aee to
e5085d3
Compare
|
Concept ACK modulo …
|
Empact
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to include <algorithm>/<ctype.h> where appropriate.
src/rpc/server.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::toupper(category.front()) should work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Empact, I have addressed your feedback. The code to capitalize the first letter of a string has been isolated in void Capitalize(std::string& str) in utilstrencodings.cpp.
src/netbase.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can pass tolower directly like so: std::transform(net.begin(), net.end(), net.begin(), tolower);
The same doesn't work with std::tolower fyi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could cast like so, I believe that has the same effect: std::transform(net.begin(), net.end(), net.begin(), (int (*)(int)) std::tolower);
Note tolower accepts an int but handles it as an unsigned char: https://en.cppreference.com/w/c/string/byte/tolower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Empact!
I opted for the lambda expression, because it allowed me to pass unsigned char characters to the tolower function.
I understood from the referenced documentation that it doesn't handle the int as an unsigned char (character to be converted. If the value of ch is not representable as unsigned char and does not equal EOF, the behavior is undefined.).
I hope you like the new implementation better.
|
Concept ACK |
|
Note to reviewers: This pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
Many thanks for the review comments!
As used in this project at the moment both
bitcoin/src/rpc/blockchain.cpp Lines 1923 to 1929 in 171b03d
In this particular case I have also raised the suggestion to drop if (net == "ipv4" || net == "IPv4") return NET_IPV4;
if (net == "ipv6" || net == "IPv6") return NET_IPV6;I would even prefer this alternative if the modified external behaviour is acceptable.
I had definitely considered this but felt that it would be a potential rabbit hole without guidance. Implementing utility functions Is this good enough for our purposes? If the objective is to write custom functions that accept Either way, I am happy to give this another shot by addressing @Empact 's comments; or pursuing alternatives proposed by @practicalswift and @laanwj if I can get further guidance in which direction to take this. |
|
@251labs For say the custom locale independent
More generally: as long as your custom functions return the same value as the standard library equivalents would have done under the default "C" locale everything is fine. |
afd6b5e to
4388ad5
Compare
|
Thanks again for the feedback! Per the review comments, I have implemented custom I have tried to make the |
src/test/util_tests.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use BOOST_CHECK_EQUAL(ToLower('@'), '@') (it gives a clearer diagnostic message in case of failure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, addressed in 0234af5
src/utilstrencodings.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of code!
What about return (c >= 'A' && c <= 'Z' ? (c - 'A') + 'a' : c);?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sipa,
I asked myself the same question and I am happy to change it!
Just to share why I decided to use a mappings array:
Support for extended ASCII
Should there be a time where it becomes important to have support for characters in the extended ASCII range then it will be trivial to implement it. It will then just be a matter of adjusting entries in mappings array vs extending the implementation with control statements like if (c == 'Ü') return 'ü'; or alternative logic.
Performance
I wanted our locale independent functions to be a good citizen in terms of performance for code that requires it.
I did some naive tests to get an impression what the code would compile to and it seemed that the compiler was able to generate more efficient code in the case return mappings[c];
At all optimization levels I would get a pointer to the mappings array where mappings are performed on:
lea rax, [rip + __ZZ7ToLowerhE15tolowermappings]
movzx eax, byte ptr [rdi + rax]
Alternatively in the case of (c >= 'A' && c <= 'Z' ? (c - 'A') + 'a' : c) the compiler would generate something like this at O3 optimization:
mov eax, edi
add al, -65
cmp al, 26
jae LBB3_2
add dil, 32
LBB3_2:
movzx eax, dil
Particularly in the case of iterations I felt the compiler was able to do better optimizations, for example minimizing iterations by operating in sequences. In the case of the former it could do 4 mapping operations per sequence; while in the latter of at most two mappings per iteration.
On this basis of this I assumed the code would also perform better, but I realize I can be very wrong because I am no expert at this.
I hope this feedback is useful. It it isn't, I will gladly change the code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly prefer @sipa:s more readable shorter version. I'd bet any potential performance difference is insignificant in the grand scheme of things :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @practicalswift, will update the code per @sipa's feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @sipa. If we need to add extended ascii support, we can address that then, and this is not a performance-critical operation that calls for extra code to optimize it. Simple, repetitive code adds visual noise and can hide bugs as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Empact, I will update the pull request!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the feedback. I have updated the pull request and addressed your comments in 53b3bfa.
src/utilstrencodings.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it's slightly confusing to have a version of ToUpper that operates on chars (and does not modify the argument) while the version operating on vectors does modify. Perhaps call this ConvertToUpper or so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, addressed in 0234af5
4388ad5 to
f296a93
Compare
src/rpc/server.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would return the string from Capitalize and inline that into the string construction.
src/utilstrencodings.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mu-nit: Ruby uses the terms Downcase and Upcase for these operations, which I like. Just a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should define ConvertToUpper if we're not using it. The construction can be trivially re-created from ConvertToLower if we need it, and in the meantime it's just dead code.
src/utilstrencodings.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: constexpr
src/utilstrencodings.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: constexpr
f296a93 to
400dadf
Compare
src/test/util_tests.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some ToUpper(...) tests as well :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @practicalswift! I was a bit too eager when I removed ToUpper(std::string&) and its tests (facepalm). Fixed in dff017f.
400dadf to
76d70e5
Compare
src/rpc/server.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this modifies the category in-place, I think this breaks the comparison logic, e.g. on line 189.
Sorry, I think my original nit was misguided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize(…) (std::string Capitalize(std::string str)) does not modify category in-place :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Empact, we should be good. When I addressed your feedback, I did change the method signature from void Capitalize(std::string&) to std::string Capitalize(std::string), which passes the argument by value (category is copied into str).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I misread - agree the argument is passed by value, so it's fine. 👍
|
utACK 76d70e5fb1923cdbc4771f438f10ae0039beccaa |
76d70e5 to
aa32fe3
Compare
This commit implements a unit test that validates the `ParseNetwork(std::string)` implementation in `netbase.cpp`.
This commit implements custom equivalents for the C and C++ `tolower` and `toupper` Standard Library functions. In addition it implements a utility function to capitalize the first letter of a string.
This commit removes the `boost/algorithm/string/case_conv.hpp` dependency from the project. It replaces the `boost::to_lower` and `boost::to_upper` functions with custom functions that are locale independent and ASCII deterministic.
aa32fe3 to
b193d5a
Compare
| BOOST_CHECK_EQUAL(ParseNetwork("TOR"), NET_ONION); | ||
|
|
||
| BOOST_CHECK_EQUAL(ParseNetwork(":)"), NET_UNROUTABLE); | ||
| BOOST_CHECK_EQUAL(ParseNetwork("tÖr"), NET_UNROUTABLE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, tÖr, that's like the Swedish version of Tor right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol @laanwj, best review comment ever! I do hidden messages or easter eggs from time to time, but this isn't one of them 😂😂
|
utACK b193d5a |
merge bitcoin#13671, bitcoin#17405, bitcoin#18786, bitcoin#18792, bitcoin#20067: Deboostification
This pull request removes the
boost/algorithm/string/case_conv.hppdependency from the project.boost/algorithm/string/case_conv.hppis included for theboost::to_lowerandboost::to_uppertemplate functions.We can replace the calls to these functions with straightforward alternative implementations that use the C++ Standard Library, because the functions are called with
std::stringobjects that use standard 7-bit ASCII characters as argument.The refactored implementation should work without the explicit
static_cast<unsigned char>cast andunsigned charlambda return type. Both have been added defensively and to be explicit. Especially in case of the former, behaviour is undefined (potentially result in a crash) if thestd::toupperargument is not anunsigned char.A potential alternative, maybe even preferred, implementation to address the
boost::to_lowerfunction call inParseNetwork(std::string)could have been:This alternative implementation would however change the external behaviour of
ParseNetwork(std::string).This pull requests includes a unit test to validate the implementation of
ParseNetwork(std::string)prior and after the removal of thecase_conv.hppdependency.boost/algorithm/string/case_conv.hpphas been removed from theEXPECTED_BOOST_INCLUDESintest/lint/lint-includes.shbecause it is no longer required.