Skip to content

Test host_regexp against all PTR records instead of only one#37827

Merged
tavplubix merged 31 commits intoClickHouse:masterfrom
arthurpassos:host_regexp_multiple_domains
Jul 21, 2022
Merged

Test host_regexp against all PTR records instead of only one#37827
tavplubix merged 31 commits intoClickHouse:masterfrom
arthurpassos:host_regexp_multiple_domains

Conversation

@arthurpassos
Copy link
Copy Markdown
Contributor

@arthurpassos arthurpassos commented Jun 3, 2022

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

In #17202 was reported that host_regexp was being tested against only one of the possible PTR responses. This PR makes the necessary changes so that host_regexp is applied against all possible PTR responses and validate if any matches.

c-ares is being used because the vanilla getnameinfo is limited to one PTR.

Integration tests for this feature seemed to be an overkill because of the amount of effort required. It would require setting up a DNS server with the necessary resolutions and tweak clickhouse node host DNS settings so it points to the DNS server so it can finally be tested.

I have tested it locally using a setup similar to the gist provided by the OP in the issue. The scenario is:

$client_ip_address resolving to test1.example.com test2.example.com test3.example.com

test1.example.com resolving to $client_ip_address
test2.example.com resolving to $client_ip_address
test3.example.com resolving to $client_ip_address

host_regexp set to <host_regexp>test3\.example\.com$</host_regexp>

Previously, it would fail to auth. With the changes in this PR, it allows the request.

Closes #17202

@robot-ch-test-poll robot-ch-test-poll added pr-feature Pull request with new product feature submodule changed At least one submodule changed in this PR. labels Jun 3, 2022
@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Jun 3, 2022
@alexey-milovidov alexey-milovidov marked this pull request as draft June 5, 2022 12:00
@alexey-milovidov
Copy link
Copy Markdown
Member

Most likely it needs to be added to docker/test/fasttest.

@alexey-milovidov
Copy link
Copy Markdown
Member

url = [email protected]:YukiWorkshop/cpp-dns.git
All the submodules should be from https://github.com/

@arthurpassos
Copy link
Copy Markdown
Contributor Author

Just updated it to use https url. Can you re-approve the run @alexey-milovidov?

In the meantime, I am working on the fasttest local setup so I can iterate quicker without having to push changes to try things out, even tho this might be the last necessary change for fasttest...

@arthurpassos
Copy link
Copy Markdown
Contributor Author

Kind reminder for CI re-run @alexey-milovidov

@alexey-milovidov alexey-milovidov marked this pull request as ready for review June 15, 2022 01:54
@arthurpassos
Copy link
Copy Markdown
Contributor Author

arthurpassos commented Jun 15, 2022

Just fixed style case and DNS error message. Kind ping for CI re-run. @alexey-milovidov

@arthurpassos
Copy link
Copy Markdown
Contributor Author

Hi. I checked the CI failures and it seems like it's not related to the changes in this PR. Plus, checking other recent merged PRs, same failure was present. I believe it's ready for review, please let me know if there is anything else missing.

@arthurpassos
Copy link
Copy Markdown
Contributor Author

Kind ping, is there anything missing here? @alexey-milovidov

@arthurpassos
Copy link
Copy Markdown
Contributor Author

Just resolved conflicts

@alexey-milovidov
Copy link
Copy Markdown
Member

Someone should take it for review. I did not review it yet.

@rschu1ze rschu1ze self-assigned this Jun 28, 2022
Copy link
Copy Markdown
Member

@rschu1ze rschu1ze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, thanks a lot. I added just some minor comments after which we can merge.

Comment on lines 143 to 164
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, why do we need "async DNS resolver" if the only place where it's used actually works synchronously?
Also I don't like the namespace name YukiWorkshop.
Can we get rid of cpp-dns library?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a library that returns multiple PTR records for a given ip address. The problem is that most libraries return only one, including getnameinfo. I am also not a big fan of the namespace, but that's the only working library I could find.

The easiest path is to keep cpp-dns, but I believe we could get rid of it if we implemented something similar, it'll just take more time.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a library that returns multiple PTR records for a given ip address.

Yes, and udns does the job, cpp-dns looks like useless wrapper. Am I missing something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't consider it to be useless. It's much more idiomatic and easier to use than libudns. libudns is cryptic and hard to use, I believe that we need a wrapper. The question is if we want to build a wrapper or use an existing one.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, I can implement the wrapper myself and get rid of cpp-dns. Would that be enough?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mean ares_gethostbyaddr, not ares_getnameinfo

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just tested ares_gethostbyaddr and it indeed works. I will build the wrapper around it then, ok?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so both udns and c-ares support multiple PTR records and work fine. But c-ares looks more popular, does not look abandoned and its interface is quite similar to system functions. So I think yes, we should use c-ares library.

Also there's alive c-ares repo on GitHub. I've created a fork in ClickHouse organization (just in case we will need change something a bit): https://github.com/ClickHouse/c-ares

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClickHouse already builds c-ares because grpc depends on it. I believe that whenever -DENABLE_GRPC is set to ON, c-ares is built as well. Adding c-ares as a submodule as it is fails because it tries to add yet another target called c-ares and it conflicts with the one built by grpc. Based on that, I see a couple of paths forward:

  1. Build c-ares as a submodule and link grpc against it. Requires changes in grpc, which is not in the scope of this PR, but it would be necessary.
  2. Link clickhouse_common_io against c-ares built by grpc. If c-ares target doesn't exist because grpc wasn't built, build c-ares submodule. Dependecy graph becomes a bit messy and not clear, but leaves grpc untouched.
  3. Build c-ares as a submodule, but with different name. I don't see this one as a good option, but leaving it on the table.

Which one do you guys prefer and what do you suggest?

Btw, I have a draft implementation of a c-ares wrapper and already tested it on clickhouse with approach #2 (without the conditional build, tho). It worked!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, so we don't need to add extra libraries! The first option is the best, it makes sense to try it first.

@arthurpassos arthurpassos force-pushed the host_regexp_multiple_domains branch from b1f6747 to 5c00dcd Compare July 4, 2022 13:08
@arthurpassos
Copy link
Copy Markdown
Contributor Author

arthurpassos commented Jul 15, 2022

CI run was all green: https://github.com/ClickHouse/ClickHouse/actions/runs/2666225397, aside from style check, which I just pushed a fix for.

I'll see what I can do about the integ tests. I might reach out for help since it's my first time looking at integ tests code.

@arthurpassos
Copy link
Copy Markdown
Contributor Author

Alright, that took some effort :). Just published integ tests, please let me know if you want to change anything @tavplubix .

@arthurpassos
Copy link
Copy Markdown
Contributor Author

I see three failures:

  1. Style. I will fix it tomorrow morning.
  2. CI machines failing like before. A re-run should help.
  3. BuilderDebSplitted. Doesn't seem to be related to this PR.

Kind ping for review and re-run @tavplubix

@Felixoid
Copy link
Copy Markdown
Member

Felixoid commented Jul 21, 2022

I restarted it. I'm trying to dig into the issue with a rerun

This particular time the behavior is valid, the ci has status Cancelled, and we don't restart it automatically intentionally.

@arthurpassos
Copy link
Copy Markdown
Contributor Author

I have the style fix ready, but will wait until CI run is finished to avoid yet another long run. Btw, I am curious, are you satisfied with the integ tests or you haven't had a chance to review it yet?

@tavplubix
Copy link
Copy Markdown
Member

Tests look fine, thanks!

@arthurpassos
Copy link
Copy Markdown
Contributor Author

Just pushed what I believe to have fixed the style check, couldn't confirm it since I don't know how to run it locally.

@tavplubix
Copy link
Copy Markdown
Member

Style Check (actions) — Style check success
Other checks passed on previous commit
Integration tests (asan, actions) [3/3] - test_postgresql_replica_database_engine_1 is broken in master, cc: @kssenii

@tavplubix tavplubix merged commit 9e9969c into ClickHouse:master Jul 21, 2022
# Force use of c-ares inet_net_pton instead of libresolv one
set(HAVE_INET_NET_PTON OFF CACHE BOOL "" FORCE)

add_subdirectory("../c-ares/" "../c-ares/")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't allow reusing third-party CMake files.

@alexey-milovidov
Copy link
Copy Markdown
Member

@tavplubix You've merged prematurally. Need to revert or fix it quickly.

@alexey-milovidov
Copy link
Copy Markdown
Member

@rschu1ze this goes in the opposite direction of #35630

add_contrib (sqlite-cmake sqlite-amalgamation)
add_contrib (s2geometry-cmake s2geometry)
add_contrib (base-x-cmake base-x)
add_contrib(c-ares-cmake c-ares)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent style.

@rschu1ze
Copy link
Copy Markdown
Member

@rschu1ze this goes in the opposite direction of #35630

Yes, and sorry for missing that, I have not looked at the latest versions of this PR.

Looking at contrib/cares/CMakeLists.txt, there is some complexity but nothing impossible. The MS Win build and the build of "tools" can be omitted. The file "lib/ares_config.h" can be generated by running the original build system on Linux (x86 + ARM + PPC LE), FreeBSD and Mac and checking in the result below "contrib/c-ares-cmake/" (see contrib/vectorscan-cmake/x86_64/config.h for a similar situation).

@tavplubix
Copy link
Copy Markdown
Member

tavplubix commented Jul 25, 2022

@alexey-milovidov, we already reuse third-party cmake files for grpc, our "own" cmake file simply calls it:

set(_gRPC_SOURCE_DIR "${ClickHouse_SOURCE_DIR}/contrib/grpc")

add_subdirectory("${_gRPC_SOURCE_DIR}" "${_gRPC_BINARY_DIR}")

Seems like we had our own cmake files for grpc, but it was changed back to third-party cmakes in #15491, cc: @vitlibar

Before this PR c-ares used to be built inside grpc as kind of recursive submodule. This PR does not add new libraries and new third-party cmake files, it only moves c-ares from contrib/grpc/third_party/cares/cares to contrib/c-ares, so there's nothing to fix quickly (and it's too late to revert #15491).

I will fix style inconsistency and remove contrib/grpc/third_party/cares

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ClickHouse use getaddrinfo for reverse resolve and apply only one PTR response value for <host_regexp> compare

7 participants