-
Notifications
You must be signed in to change notification settings - Fork 38.7k
httpserver, rest: improving URI validation #27253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
httpserver, rest: improving URI validation #27253
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
50e18c0 to
ec14b3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, nice catch.
As for the approach, I think there is a simpler one that requires no overhead for every endpoint using GetQueryParameter(). We could move uri_parsed to a private HTTPRequest member instead of a local GetQueryParameterFromUri variable, and have http_request_cb set it (for example after checking unknown HTTP methods) - or return an early 400 if null. Subsequently, we can assume that uri_parsed is valid and non-null, simplifying the rest code (and avoiding multiple evhttp_uri_parse calls when accessing multiple query params). What do you think?
brunoerg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
As @stickies-v explained to me outside this PR: "... by moving the uri parsing to http_request_cb we're failing super early, and even when the query parameter would never be called by the endpoint... ", this makes more sense. I'm working on this approach which I think it's better as well. |
02cbb58 to
213554c
Compare
|
Updated changes:
|
213554c to
7318e23
Compare
test 2023-03-16T20:06:03.567000Z TestFramework (ERROR): Unexpected exception caught during testing
Traceback (most recent call last):
File "/private/var/folders/v7/fs2b0v3s0lz1n57gj9y4xb5m0000gn/T/cirrus-ci-build/ci/scratch/build/bitcoin-arm64-apple-darwin/test/functional/test_framework/test_framework.py", line 132, in main
self.run_test()
File "/private/var/folders/v7/fs2b0v3s0lz1n57gj9y4xb5m0000gn/T/cirrus-ci-build/ci/scratch/build/bitcoin-arm64-apple-darwin/test/functional/interface_rest.py", line 372, in run_test
resp = self.test_rest_request("/mempool/contents", ret_type=RetType.OBJ, status=400, query_params={"verbose": "true", "mempool_sequence": "true"})
File "/private/var/folders/v7/fs2b0v3s0lz1n57gj9y4xb5m0000gn/T/cirrus-ci-build/ci/scratch/build/bitcoin-arm64-apple-darwin/test/functional/interface_rest.py", line 86, in test_rest_request
assert_equal(resp.status, status.value)
AttributeError: 'int' object has no attribute 'value' |
|
@fanquake that's something perhaps I've introduced with my last push, I'm on it, thanks. Weird, it's not failing locally for me: I think this was the issue: Python Enums were introduced in version 3.4, and the value attribute was added in version 3.6. I'm fixing it. |
0115dcc to
9b0a0dc
Compare
|
Updated changes:
Thanks @stickies-v for your prompt response and your support on quick workarounds on issues raised. |
stickies-v
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the commit message could benefit from additional information on the nature of the bug and how it's fixed
2bf34b7 to
36be160
Compare
|
Updates:
|
src/httpserver.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's great that we're not doing the address-based checks before anything else. Since we're now adding validation into the constructor, how about we do it consistently and do it for all validation, so we can also control what we validate first? I'm still feeling a bit uneasy about coupling the construction of HTTPRequest with validation, but if we're doing it we might as well do it more consistently? Curious to hear @vasild's thoughts on this too.
git diff
diff --git a/src/httpserver.cpp b/src/httpserver.cpp
index f7dc344f4..10f8ec24d 100644
--- a/src/httpserver.cpp
+++ b/src/httpserver.cpp
@@ -235,6 +235,7 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
}
std::unique_ptr<HTTPRequest> hreq;
+ // Initializing the HTTPRequest also performs basic validation such as address, method and URI checks
try {
hreq = std::make_unique<HTTPRequest>(req);
} catch (const std::runtime_error& e) {
@@ -242,22 +243,6 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
return;
}
- // Early address-based allow check
- if (!ClientAllowed(hreq->GetPeer())) {
- LogPrint(BCLog::HTTP, "HTTP request from %s rejected: Client network is not allowed RPC access\n",
- hreq->GetPeer().ToStringAddrPort());
- hreq->WriteReply(HTTP_FORBIDDEN);
- return;
- }
-
- // Early reject unknown HTTP methods
- if (hreq->GetRequestMethod() == HTTPRequest::UNKNOWN) {
- LogPrint(BCLog::HTTP, "HTTP request from %s rejected: Unknown HTTP request method\n",
- hreq->GetPeer().ToStringAddrPort());
- hreq->WriteReply(HTTP_BAD_METHOD);
- return;
- }
-
const std::string strURI{hreq->GetURI()};
LogPrint(BCLog::HTTP, "Received a %s request for %s from %s\n",
RequestMethodString(hreq->GetRequestMethod()), SanitizeString(strURI, SAFE_CHARS_URI).substr(0, 100), hreq->GetPeer().ToStringAddrPort());
@@ -545,13 +530,7 @@ HTTPRequest::HTTPRequest(struct evhttp_request* _req) :
m_uri{req != nullptr ? evhttp_request_get_uri(req) : nullptr},
m_uri_parsed{m_uri != nullptr ? evhttp_uri_parse(m_uri) : nullptr}
{
- if (m_uri_parsed == nullptr) {
- const char* err{"URI parsing failed, it likely contained RFC 3986 invalid characters"};
- const std::string peerAddress{GetPeer().ToStringAddr()};
- WriteReply(HTTP_BAD_REQUEST, err);
- throw std::runtime_error(
- strprintf("HTTP request from %s rejected: %s", peerAddress, err));
- }
+ Validate();
}
HTTPRequest::~HTTPRequest()
@@ -565,6 +544,29 @@ HTTPRequest::~HTTPRequest()
evhttp_uri_free(m_uri_parsed);
}
+void HTTPRequest::Validate()
+{
+ const std::string rejected{strprintf("HTTP request from %s rejected: ", GetPeer().ToStringAddrPort())};
+ // Early address-based allow check
+ if (!ClientAllowed(GetPeer())) {
+ WriteReply(HTTP_FORBIDDEN);
+ throw std::runtime_error(rejected + "Client network is not allowed RPC access");
+ }
+
+ // Early reject unknown HTTP methods
+ if (GetRequestMethod() == HTTPRequest::UNKNOWN) {
+ WriteReply(HTTP_BAD_METHOD);
+ throw std::runtime_error(rejected + "Unknown HTTP request method");
+ }
+
+ // Ensure valid URI
+ if (m_uri_parsed == nullptr) {
+ const std::string err{"URI parsing failed, it likely contained RFC 3986 invalid characters"};
+ WriteReply(HTTP_BAD_REQUEST, err);
+ throw std::runtime_error(rejected + err);
+ }
+}
+
std::pair<bool, std::string> HTTPRequest::GetHeader(const std::string& hdr) const
{
const struct evkeyvalq* headers = evhttp_request_get_input_headers(req);
diff --git a/src/httpserver.h b/src/httpserver.h
index 77a249a6a..3c5af5b0b 100644
--- a/src/httpserver.h
+++ b/src/httpserver.h
@@ -64,6 +64,9 @@ private:
/// The parsed URI, as returned by `evhttp_uri_parse()`. Will never be `nullptr`.
evhttp_uri* const m_uri_parsed;
+ // Perform early validity checks. Send a response and throw std::runtime_error if invalid.
+ void Validate();
+
public:
explicit HTTPRequest(struct evhttp_request* req);
~HTTPRequest();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your worries on coupling the validation within the constructor (separation of/ different concerns, error handling, flexibility, etc) but since those validations/ "early checks" also would impact into the state of the object itself it would make sense to me to put them together, consistently and cleaner. Happy to make the change if @vasild also agree with it.
f70a6b2 to
c9c19c5
Compare
Removing what it was originally a hack/ workaround only for fuzz test 'http_request', no other client/ consumer was using it, to deal with an underlying buffering issue with WriteReply being called from the destructor, which was also fixed here not allowing buffering when conn is not valid.
Non-functional optimisation: moving the logic up in the stack, specifically into the constructor of the HTTPRequest object, this way we catch the invalid uri earlier, removing the exception handling from the rest enpoints as we catch the exception raised from the constructor within the request call back function where we return an http bad request to consumers.
c9c19c5 to
31dd51c
Compare
|
Updates:
|
|
Updates:
|
|
After a chat with @stickies-v where we were discussing different approaches on this enhancement and other details regarding fuzz testing, libevent functionality and each commit intention, I've decided to put this onto draft. Firstly we would need to define the purpose of the |
|
Are you still working on this? |
I do |
|
🐙 This pull request conflicts with the target branch and needs rebase. |
|
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
|
Closing for now. @pablomartin4btc feel free to ping for a re-open when you are actively working on this again. |
| BOOST_CHECK(!QueryParameterExists("/rest/endpoint/someresource.json&p1=v1&p2=v2", "p1")); | ||
|
|
||
| // URI with invalid characters (%) raises a runtime error regardless of which query parameter is queried | ||
| uri = "/rest/endpoint/someresource.json&p1=v1&p2=v2%"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed this while reviewing #27468 (comments locked on that PR) for my own work on HTTPRequest -- this test vector doesn't have a ? so even if the URI didn't contain invalid characters it would still return null for the query parameter request. It doesn't really matter for the invalid-URI-throws-error test but it should matter for this follow up, part of which I'm trying to integrate.
So, my question is - should the test vector always have been:
uri = "/rest/endpoint/someresource.json?p1=v1&p2=v2%";
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything in your observation looks correct to me, yeah. It's fine in master but not in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
I must have copied that URI (without the ?) from the previous check (// Invalid query string syntax is the same as not having parameters).
But for this test in this PR it doesn't really matter I think (we can have even 2, one with ? and one without it - or just clarify it in the comment to avoid confusion) as the URI would be invalid anyways (it includes invalid char %).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for this test in this PR it doesn't really matter I think
I think it does, previously we had three states (value, no value, exception) which kinda accidentally still made the test work, but in this PR there are only two (value, no value). So this test does not actually test what it claims it does, because if QueryParameterExists were to be changed to allow invalid characters, this test would still pass because there aren't any query parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously we had three states (value, no value, exception) which kinda accidentally still made the test work
I think the URI validation was performed before checking the params, so the exception was raised and caught by the test?
// URI with invalid characters (%) raises a runtime error regardless of which query parameter is queried
uri = "/rest/endpoint/someresource.json&p1=v1&p2=v2%";
BOOST_CHECK_EXCEPTION(GetQueryParameterFromUri(uri.c_str(), "p1"), std::runtime_error, HasReason("URI parsing failed, it likely contained RFC 3986 invalid characters"));
So this test does not actually test what it claims it does,
// URI with invalid characters (%) won't return any values regardless of which query parameter is queried
Perhaps it should be clarified also for this incorrect query syntax case, but the entire PR should be revisited anyways.
if QueryParameterExists were to be changed to allow invalid characters, this test would still pass because there aren't any query parameters.
It's not the current state of this closed/ inactive PR but it could be so I agree with your statement in that sense (and with your states analysis as well) and that the PR should be updated to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the URI validation was performed before checking the params, so the exception was raised and caught by the test?
It seems that the libevent parser doesn't differentiate if the URI is invalid whether the query syntax is wrong (missing the '?') or there's an invalid char within the URI (% not followed by 2 hexa chars - RFC 3986 and libevent only interprets it if the encoded chars are withing the variable's value, not in the path).
11422cc bugfix: rest: avoid segfault for invalid URI (pablomartin4btc) Pull request description: Minimal fix to get it promptly into 25.0 release (suggested by [stickies-v](bitcoin#27253 (review)) and supported by [vasild](bitcoin#27253 (review)) ) Please check bitcoin#27253 for reviewers comments and acks regarding this PR and read the commit comment message body for more details about the fix. ACKs for top commit: achow101: ACK 11422cc stickies-v: re-ACK 11422cc Tree-SHA512: 5af6b53fb266a12b463f960910556d5e97bc88b3c2a4f437ffa343886b38749e1eb058cf7bc64d62e82e1acf6232a186bddacd8f3b4500c87bf9e550a0153386
11422cc bugfix: rest: avoid segfault for invalid URI (pablomartin4btc) Pull request description: Minimal fix to get it promptly into 25.0 release (suggested by [stickies-v](bitcoin#27253 (review)) and supported by [vasild](bitcoin#27253 (review)) ) Please check bitcoin#27253 for reviewers comments and acks regarding this PR and read the commit comment message body for more details about the fix. ACKs for top commit: achow101: ACK 11422cc stickies-v: re-ACK 11422cc Tree-SHA512: 5af6b53fb266a12b463f960910556d5e97bc88b3c2a4f437ffa343886b38749e1eb058cf7bc64d62e82e1acf6232a186bddacd8f3b4500c87bf9e550a0153386
This PR contains an isolated enhancement of the segfault bugfix #27468 (already merged), improving the way we handle the URI validation, which will be performed only once instead of on each query parameter of a rest service endpoint.