refactor: replace remaining boost::split with SplitString #25057

martinus · 2022-05-03T05:19:56Z

As a followup of #22953, this removes the remaining occurrences of boost::split and replaces them with our own SplitString. To be able to do so, this extends the function spanparsing::Split to work with multiple separators. Finally this removes 3 more files from lint-includes.py.

maflcko

LGTM. I was also working on this yesterday, but I like your version better.

src/httprpc.cpp

src/util/string.h

fanquake · 2022-05-03T08:29:30Z

Concept ACK

vincenzopalazzo

Concept ACK

Note that `SplitString` doesn't support token compression, but in this case it does not matter as empty strings are already skipped anyways. Also removes split.hpp and classification.hpp from expected includes

Also removes boost/algorithm/string.hpp from expected includes

Co-authored-by: MarcoFalke <[email protected]>

martinus · 2022-05-04T05:56:06Z

Rebased with fixing nits and adding fuzzing

theStack

Code-review ACK f849e63

I tend to think that the single-character interface could be removed completely (especially since it uses the multi-separators version internally in the end anyways), but on the other hand this would make the diff larger as lots of calling instances had to be changed, so NBD.

Finally this removes 3 more files from lint-includes.py

🎉 🎉 🎉

…litString f849e63 fuzz: SplitString with multiple separators (Martin Leitner-Ankerl) d1a9850 http: replace boost::split with SplitString (Martin Leitner-Ankerl) 0d7efcd core_read: Replace boost::split with SplitString (Martin Leitner-Ankerl) b7ab9db Extend Split to work with multiple separators (Martin Leitner-Ankerl) Pull request description: As a followup of bitcoin#22953, this removes the remaining occurrences of `boost::split` and replaces them with our own `SplitString`. To be able to do so, this extends the function `spanparsing::Split` to work with multiple separators. Finally this removes 3 more files from `lint-includes.py`. ACKs for top commit: theStack: Code-review ACK f849e63 Tree-SHA512: f37d4dbe11cab2046e646045b0f018a75f978d521443a2c5001512737a1370e22b09247d5db0e5c9e4153229a4e2d66731903c1bba3713711c4cae8cedcc775d

vasild

ACK f849e63

vasild · 2022-05-05T07:15:13Z

src/util/spanparsing.h

    auto start = it;
    while (it != sp.end()) {
-        if (*it == sep) {
+        if (separators.find(*it) != std::string::npos) {


This is doing full scan of the iterators for each input character, thus the overall complexity is O(input_length * number_of_separators). It can be optimized to O(input_length + number_of_separators) if a map of the separators is built beforehand that would allow checking whether a given character is a separator in a constant time.

This is mostly theoretical because it would matter if lots of separators are used and currently we only use a few. Anyway, out of curiosity, I tried that and compared the performance - in the current variant it is 734 ns/op while with the map it is 130 ns/op for 10 separators.

optimize Split()

template <typename T = Span<const char>> std::vector<T> Split(const Span<const char>& sp, std::string_view separators) { + std::bitset<256> m; + for (const auto& sep : separators) { + m[static_cast<unsigned char>(sep)] = true; + } std::vector<T> ret; auto it = sp.begin(); auto start = it; while (it != sp.end()) { - if (separators.find(*it) != std::string::npos) { + if (m[static_cast<unsigned char>(*it)]) { ret.emplace_back(start, it); start = it + 1; } ++it; } ret.emplace_back(start, it);

benchmark

static void Split(benchmark::Bench& bench) { bench.run([&] { (void)spanparsing::Split( "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "bcdefghijk" ); }); } BENCHMARK(Split);

Feel free to ignore.

backport: merge bitcoin#18289, bitcoin#19090, bitcoin#22859, bitcoin#18758, bitcoin#21016, bitcoin#22953, bitcoin#25027, bitcoin#20461, bitcoin#25025, bitcoin#25057, bitcoin#25068 (deboostification)

martinus changed the title ~~refactor: replace remaining boost::split with StringSplit~~ refactor: replace remaining boost::split with SplitString May 3, 2022

DrahtBot added Refactoring RPC/REST/ZMQ Utils/log/libs labels May 3, 2022

maflcko approved these changes May 3, 2022

View reviewed changes

src/httprpc.cpp Outdated Show resolved Hide resolved

src/util/string.h Outdated Show resolved Hide resolved

src/util/string.h Outdated Show resolved Hide resolved

vincenzopalazzo reviewed May 3, 2022

View reviewed changes

martinus and others added 4 commits May 4, 2022 07:34

Extend Split to work with multiple separators

b7ab9db

core_read: Replace boost::split with SplitString

0d7efcd

Note that `SplitString` doesn't support token compression, but in this case it does not matter as empty strings are already skipped anyways. Also removes split.hpp and classification.hpp from expected includes

http: replace boost::split with SplitString

d1a9850

Also removes boost/algorithm/string.hpp from expected includes

fuzz: SplitString with multiple separators

f849e63

Co-authored-by: MarcoFalke <[email protected]>

martinus force-pushed the 2022-04-boost-split-exorcism branch from 8976d15 to f849e63 Compare May 4, 2022 05:55

maflcko approved these changes May 4, 2022

View reviewed changes

fanquake requested a review from theStack May 4, 2022 14:05

theStack approved these changes May 4, 2022

View reviewed changes

fanquake merged commit bde5836 into bitcoin:master May 4, 2022

martinus deleted the 2022-04-boost-split-exorcism branch May 4, 2022 18:50

vasild reviewed May 5, 2022

View reviewed changes

fanquake mentioned this pull request Aug 8, 2022

Reduce boost dependency (boost/algorithm/string/split) #22683

Closed

kwvg added a commit to kwvg/dash that referenced this pull request Jan 19, 2023

merge bitcoin#25057: replace remaining boost::split with SplitString

43d657f

bitcoin locked and limited conversation to collaborators May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: replace remaining boost::split with SplitString #25057

refactor: replace remaining boost::split with SplitString #25057

Uh oh!

martinus commented May 3, 2022

Uh oh!

maflcko left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fanquake commented May 3, 2022

Uh oh!

vincenzopalazzo left a comment

Uh oh!

martinus commented May 4, 2022

Uh oh!

theStack left a comment

Uh oh!

vasild left a comment

Uh oh!

vasild May 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

refactor: replace remaining boost::split with SplitString #25057

refactor: replace remaining boost::split with SplitString #25057

Uh oh!

Conversation

martinus commented May 3, 2022

Uh oh!

maflcko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fanquake commented May 3, 2022

Uh oh!

vincenzopalazzo left a comment

Choose a reason for hiding this comment

Uh oh!

martinus commented May 4, 2022

Uh oh!

theStack left a comment

Choose a reason for hiding this comment

Uh oh!

vasild left a comment

Choose a reason for hiding this comment

Uh oh!

vasild May 5, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants