-
Notifications
You must be signed in to change notification settings - Fork 38.8k
Prepare string and net utils for future HTTP operations #34242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34242. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. LLM Linter (✨ experimental)Possible typos and grammar issues:
2026-01-13 |
src/netbase.cpp
Outdated
| CService addr_bind; | ||
| struct sockaddr_storage sockaddr_bind; | ||
| socklen_t sockaddr_bind_len = sizeof(sockaddr_bind); | ||
| if (!sock.GetSockName((struct sockaddr*)&sockaddr_bind, &sockaddr_bind_len)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetSockName() currently forwards the raw getsockname() return value (0 on success), which forces callers to use an inverted check (!GetSockName(...)).
Can we normalize it so it returns true on success?
bool Sock::GetSockName(sockaddr* name, socklen_t* name_len) const
{
return getsockname(m_socket, name, name_len) == 0;
}
It would make the code slightly more readable and maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, but this might be better addressed in a separate PR. The entire Sock class wraps POSIX socket management syscalls that return 0 on success (connect, bind, listen, close, etc...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind the Sock methods is that they are mockable versions of the OS functions. Zero learning curve for newcomers (assuming familiarity with the OS interface). Sock::Foo(x, y, z) is equivalent to calling the OS's foo(x, y, z).
inverted check
the function returns an integer error code, not a boolean. A check like if (foo() == 0) looks better (less "inverted").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks @vasild. It would be nice to have more documentation on the return codes then. And +1 on the == 0 suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line gets "modernized" in the next commit, including a comment and == 0, so I think it's cool for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"returns an integer error code" -- oh, I meant this:
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and errno
is set to indicate the error.
or the check could be if (foo() != -1).
|
ACK ea993ff I reviewed the same 6 commits as part of #32061 and all my feedback from there has been addressed. |
vasild
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK ea993ff
src/netbase.cpp
Outdated
| sockaddr_storage storage; | ||
| socklen_t len = sizeof(storage); | ||
|
|
||
| auto sa = static_cast<sockaddr*>(static_cast<void*>(&storage)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inner cast is unnecessary?
| auto sa = static_cast<sockaddr*>(static_cast<void*>(&storage)); | |
| auto sa = static_cast<sockaddr*>(&storage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works with a reinterpret_cast because sockaddr_storage and sockaddr are technically unrelated types even though they overlap... I'll clean it up, thanks. I think I had static_cast there before because of hints from Sonarcloud on corecheck.dev.
src/util/string.cpp
Outdated
| // that means the line we are currently reading is too long, and we throw. | ||
| if (count > max_read) throw std::runtime_error("max_read exceeded by LineReader"); | ||
| } | ||
| const std::string_view untrimmed_line(reinterpret_cast<const char *>(std::to_address(line_start)), count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const std::string_view untrimmed_line(reinterpret_cast<const char *>(std::to_address(line_start)), count); | |
| const std::string_view untrimmed_line(reinterpret_cast<const char*>(std::to_address(line_start)), count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken, thanks
| std::string FormatRFC1123DateTime(int64_t time) | ||
| { | ||
| if (time < -62167219200 || 253402300799 < time) { | ||
| // RFC7231 mandates 4-digit year, so only support years 0 to 9999 | ||
| return ""; | ||
| } | ||
| const std::chrono::sys_seconds secs{std::chrono::seconds{time}}; | ||
| const auto days{std::chrono::floor<std::chrono::days>(secs)}; | ||
| const auto w{days.time_since_epoch().count() % 7}; // will be in the range [-6, 6] | ||
| std::string_view weekday{weekdays.at(w >= 0 ? w : w + 7)}; | ||
| const std::chrono::year_month_day ymd{days}; | ||
| std::string_view month{months.at(unsigned{ymd.month()} - 1)}; | ||
| const std::chrono::hh_mm_ss hms{secs - days}; | ||
| // examples: Mon, 27 Jul 2009 12:28:53 GMT | ||
| // Fri, 31 May 2024 19:18:04 GMT | ||
| return strprintf("%03s, %02u %03s %04i %02i:%02i:%02i GMT", weekday, unsigned{ymd.day()}, month, signed{ymd.year()}, hms.hours().count(), hms.minutes().count(), hms.seconds().count()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think C++20 has a short version of this (no extra arrays needed):
std::string FormatRFC1123DateTime(int64_t time)
{
if (time < -62167219200 || 253402300799 < time) {
// RFC7231 mandates 4-digit year, so only support years 0 to 9999
return "";
}
const std::chrono::sys_seconds secs{std::chrono::seconds{time}};
// Example: "Mon, 27 Jul 2009 12:28:53 GMT"
return std::format("{:%a, %d %b %Y %H:%M:%S} GMT", secs);
}maybe you already tried it and it doesn't work on all our supported platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be locale-independent. I think it is:
the default "C" locale if L is not present in the format specification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/home/admin/actions-runner/_work/_temp/src/util/time.cpp:129:17: error: no member named 'format' in namespace 'std'
129 | return std::format("{:%a, %d %b %Y %H:%M:%S} GMT", secs);
| ~~~~~^
😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, will revert on next push.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too good to be true
furszy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK ea993ff
src/util/string.cpp
Outdated
| while (it != end) { | ||
| // Read a character from the incoming buffer and increment the iterator | ||
| auto c = static_cast<char>(*it); | ||
| ++it; | ||
| ++count; | ||
| // If the character we just consumed was \n, the line is terminated | ||
| if (c == '\n') break; | ||
| // If we are at the end of the incoming buffer, the line is terminated | ||
| if (it == end) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ea993ff:
Seems you can remove this last line. You already do a while (it != end)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Youre right thanks, its gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it is incremented inside the body of the loop, after the check in while (it != end). So after the increment it could be equal to end.
const std::vector<std::byte> input{StringToBuffer("ab")};
LineReader reader(input, /*max_read=*/1);
std::optional<std::string> line{reader.ReadLine()};Before this change the above snippet would have set line to ab and after the change it throws std::runtime_error: max_read exceeded by LineReader. I guess the new behavior is the correct? Maybe worth adding such a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vasild I added this as a test case, and am scratching my head how it slipped through. Removing the line is not only cleaner code, it is a bug fix. LineReader should not return a line longer than max_read
src/util/string.cpp
Outdated
| // If the character we just consumed was \n, the line is terminated | ||
| if (c == '\n') break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit:
could make the termination character customizable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to leave this for now, next dev who needs a custom EOL character can add it in the future ;-)
The function logic is moved-only from net.cpp to netbase.cpp and redeclared (as non-static) in netbase.h
Replace the C-style casting with C++ reinterpret_cast
https://httpwg.org/specs/rfc9110.html#rfc.section.5.1 Field names in HTTP headers are case-insensitive. These structs will be used in the headers map to search by key. In libevent field names are also converted to lowercase for comparison: evhttp_find_header() evutil_ascii_strcasecmp() EVUTIL_TOLOWER_()
HTTP 1.1 responses require a timestamp header with a format specified (currently) by: https://datatracker.ietf.org/doc/html/rfc9110#section-5.6.7 This specific format is defined in RFC1123: https://www.rfc-editor.org/rfc/rfc1123#page-55 The libevent implementation can be referenced in evutil_time.c evutil_date_rfc1123()
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
This is a helper struct to parse HTTP messages from data in buffers from sockets. HTTP messages begin with headers which are CRLF-terminated lines (\n or \r\n) followed by an arbitrary amount of body data. Whitespace is trimmed from the field lines but not the body. https://httpwg.org/specs/rfc9110.html#rfc.section.5
| // Check three lines terminated by \n, \r\n, and end of buffer, trimming whitespace | ||
| const std::vector<std::byte> input{StringToBuffer("once upon a time\n there was a dog \r\nwho liked food")}; | ||
| LineReader reader(input, /*max_read=*/128); | ||
| std::optional<std::string> line1{reader.ReadLine()}; | ||
| BOOST_CHECK_EQUAL(reader.Left(), 33); | ||
| std::optional<std::string> line2{reader.ReadLine()}; | ||
| BOOST_CHECK_EQUAL(reader.Left(), 14); | ||
| std::optional<std::string> line3{reader.ReadLine()}; | ||
| std::optional<std::string> line4{reader.ReadLine()}; | ||
| BOOST_CHECK(line1); | ||
| BOOST_CHECK(line2); | ||
| BOOST_CHECK(line3); | ||
| BOOST_CHECK(!line4); | ||
| BOOST_CHECK_EQUAL(line1.value(), "once upon a time"); | ||
| BOOST_CHECK_EQUAL(line2.value(), "there was a dog"); | ||
| BOOST_CHECK_EQUAL(line3.value(), "who liked food"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 1d24816:
Tiny, non-blocking, idea to see if it makes sense to simplify the LineReader class.
(you can reinterpret std::byte to char in order to make it work with std::istringstream)
// Check three lines terminated by \n, \r\n, and end of buffer, trimming whitespace
const std::string input = "once upon a time\n there was a dog \r\nwho liked food";
std::istringstream stream(input);
std::string line1, line2, line3, line4;
std::getline(stream, line1);
std::getline(stream, line2);
std::getline(stream, line3);
BOOST_CHECK(!std::getline(stream, line4));
BOOST_CHECK_EQUAL(line1, "once upon a time");
BOOST_CHECK_EQUAL(line2, "there was a dog \r"); // TODO: trim whitespaces at the end.
BOOST_CHECK_EQUAL(line3, "who liked food");
This is a component of removing libevent as a dependency of the project. It is the first six commits of #32061 and provides a string-parsing utility (
LineReader) that is also consumed by #34158.These are the functions that are added / updated for HTTP and Torcontrol:
GetBindAddress(): Given a socket, provides the bound address as a CService. Currently used by p2p but moved fromnettonetbaseso other modules can call it.ToIntegral(): Already used to parse numbers from strings, added new argumentbase = 10so it can also be used to parse hexadecimal integers. HTTP chunked transfer-encoding uses hex-encoded integers to specify payload size: https://datatracker.ietf.org/doc/html/rfc7230.html#section-4.1AsciiCaseInsensitivecomparators: Needed to store HTTP headers in anunordered_map. Headers are key-value pairs that are parsed with case-insensitive keys: https://httpwg.org/specs/rfc9110.html#rfc.section.5.1FormatRFC1123DateTime(): The required datetime format for HTTP headers (e.g.Fri, 31 May 2024 19:18:04 GMT)LineReader: Fields in HTTP requests are newline-terminated. This struct is given an input buffer and provides methods to read lines as strings.