-
Notifications
You must be signed in to change notification settings - Fork 38.7k
refactor: replace remaining boost::split with SplitString #25057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: replace remaining boost::split with SplitString #25057
Conversation
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I was also working on this yesterday, but I like your version better.
|
Concept ACK |
vincenzopalazzo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Note that `SplitString` doesn't support token compression, but in this case it does not matter as empty strings are already skipped anyways. Also removes split.hpp and classification.hpp from expected includes
Also removes boost/algorithm/string.hpp from expected includes
Co-authored-by: MarcoFalke <[email protected]>
8976d15 to
f849e63
Compare
|
Rebased with fixing nits and adding fuzzing |
theStack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-review ACK f849e63
I tend to think that the single-character interface could be removed completely (especially since it uses the multi-separators version internally in the end anyways), but on the other hand this would make the diff larger as lots of calling instances had to be changed, so NBD.
Finally this removes 3 more files from lint-includes.py
🎉 🎉 🎉
…litString f849e63 fuzz: SplitString with multiple separators (Martin Leitner-Ankerl) d1a9850 http: replace boost::split with SplitString (Martin Leitner-Ankerl) 0d7efcd core_read: Replace boost::split with SplitString (Martin Leitner-Ankerl) b7ab9db Extend Split to work with multiple separators (Martin Leitner-Ankerl) Pull request description: As a followup of bitcoin#22953, this removes the remaining occurrences of `boost::split` and replaces them with our own `SplitString`. To be able to do so, this extends the function `spanparsing::Split` to work with multiple separators. Finally this removes 3 more files from `lint-includes.py`. ACKs for top commit: theStack: Code-review ACK f849e63 Tree-SHA512: f37d4dbe11cab2046e646045b0f018a75f978d521443a2c5001512737a1370e22b09247d5db0e5c9e4153229a4e2d66731903c1bba3713711c4cae8cedcc775d
vasild
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK f849e63
| auto start = it; | ||
| while (it != sp.end()) { | ||
| if (*it == sep) { | ||
| if (separators.find(*it) != std::string::npos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is doing full scan of the iterators for each input character, thus the overall complexity is O(input_length * number_of_separators). It can be optimized to O(input_length + number_of_separators) if a map of the separators is built beforehand that would allow checking whether a given character is a separator in a constant time.
This is mostly theoretical because it would matter if lots of separators are used and currently we only use a few. Anyway, out of curiosity, I tried that and compared the performance - in the current variant it is 734 ns/op while with the map it is 130 ns/op for 10 separators.
optimize Split()
template <typename T = Span<const char>>
std::vector<T> Split(const Span<const char>& sp, std::string_view separators)
{
+ std::bitset<256> m;
+ for (const auto& sep : separators) {
+ m[static_cast<unsigned char>(sep)] = true;
+ }
std::vector<T> ret;
auto it = sp.begin();
auto start = it;
while (it != sp.end()) {
- if (separators.find(*it) != std::string::npos) {
+ if (m[static_cast<unsigned char>(*it)]) {
ret.emplace_back(start, it);
start = it + 1;
}
++it;
}
ret.emplace_back(start, it);benchmark
static void Split(benchmark::Bench& bench)
{
bench.run([&] {
(void)spanparsing::Split(
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"bcdefghijk"
);
});
}
BENCHMARK(Split);Feel free to ignore.
backport: merge bitcoin#18289, bitcoin#19090, bitcoin#22859, bitcoin#18758, bitcoin#21016, bitcoin#22953, bitcoin#25027, bitcoin#20461, bitcoin#25025, bitcoin#25057, bitcoin#25068 (deboostification)
As a followup of #22953, this removes the remaining occurrences of
boost::splitand replaces them with our ownSplitString. To be able to do so, this extends the functionspanparsing::Splitto work with multiple separators. Finally this removes 3 more files fromlint-includes.py.