Skip to content

resolved: correctly handle non-address RR types with /etc/hosts lookups#4808

Merged
martinpitt merged 1 commit intosystemd:masterfrom
martinpitt:resolved
Dec 22, 2016
Merged

resolved: correctly handle non-address RR types with /etc/hosts lookups#4808
martinpitt merged 1 commit intosystemd:masterfrom
martinpitt:resolved

Conversation

@martinpitt
Copy link
Contributor

@martinpitt martinpitt commented Dec 2, 2016

Fix wrong condition test in manager_etc_hosts_lookup(), which caused it to
return an IPv4 answer when an IPv6 question was asked, and vice versa.
Also only return success if we actually found any A or AAAA record.

In systemd-resolved.service(8), point out that /etc/hosts mappings only
affect address-type lookups, not other types.

Fixes #4801

@poettering
Copy link
Member

Hmm, so let me get this right: with this change we'd "merge" information from two different sources when the same hostname is looked up, right?

That sounds really strange to me. I think it makes a lot more sense that if an entry is defined in /etc/hosts that it then defines it in its entirety. If glibc traditionally would merge IPv4 and IPv6 info from different sources then that appears really questionnable I must say. I am pretty sure a specific hostname/zone should be served by exactly one source, and no per-RR-type merging should take place, if you follow me.

Or did I misunderstand this all?

@martinpitt
Copy link
Contributor Author

If glibc traditionally would merge IPv4 and IPv6 info from different sources then that appears really questionnable I must say.

Yeah, I was quite surprised too, but this is what happens, and as I got a bug report about it apparently some people rely on it. But even if we ignore this "mixed" mode, it seems strange/wrong that if you explicitly ask the _etc_hosts_*() stuff about an AAAA record (e. g. with ping -6 or systemd-resolve -6) that it would answer with an Arecord -- it should at least say "nothing found" then. I believe that this is due to the part inresolved-etc-hosts.c`. I'll stare at this again, and I need to fix the test suite anyway (typo there).

@martinpitt
Copy link
Contributor Author

Fixed the test case and made the change a bit more logical -- instead of second-guessing manager_etc_hosts_lookup() in the caller when it succeeds (return > 0), I fixed its return code to be 0 if there are no records of the requested family in /etc/hosts. Regardless of whether or not we want to emulate the glibc behaviour, returning an IPv4 address for an IPv6 request or returning success with an empty answer make little sense?

@martinpitt
Copy link
Contributor Author

The "default" Fedora test seems broken :-( -- @zonque, do you know what's up there? Is there a way that this can be restarted?

Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, two comments...

if ((found_a && bn->items[i]->family != AF_INET) &&
if ((found_a && bn->items[i]->family != AF_INET) ||
(found_aaaa && bn->items[i]->family != AF_INET6))
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, the old check was wrong, but the new one, too ;-)

found_a is true if the question supplied by the user included an "A" query, and found_aaaa if it included an "AAAA" query.

Now, we should skip over an answer record if:

  1. it is an A answer, but we had no A query

as well as if:

  1. it is an AAAA answer, but we had no AAAA query.

The correct check hence needs to be:

(bn->items[i]->family == AF_INET && !found_a) ||
(bn->items[i]->family == AF_INET6 && !found_aaaa)

Following the cumulative law and De Morgan's law this could also be written as:

(found_a || b->items[i]->family != AF_INET) &&
(found_aaaa || b->items[i]->family != AF_INET6)

But neither the way the original code put it, nor in the version of the patch...

Does this make sense?

}

return 1;
return dns_answer_isempty(*answer) ? 0 : 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change isn't right. Note that hostname resolution generally distuingishes NODATA from NXDOMAIN. Translated to this case this means: If you try to resolve a name, and the name in general exists, but not the RR type/class you were looking for, then you return a successful but empty response — which is the case that is usually called NODATA. However NXDOMAIN is the other case: where there whole domain doesn't exist, and hence neither RRs of teh requested type/class nor of any other name.

hence, we should return 1 here in this case, as we already bailed out much earlier with 0 if we didn't find anything at ll.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That rationale makes sense to me API-wise. But we need that check, as without it this entirely breaks family specific lookups (systemd-resolve -4 or -6). So you liked the previous version where it puts the dns_answer_isempty() check into dns_query_try_etc_hosts() (the caller) better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow? What do you mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that if you do an IPv6 query (systemd-resolve -6 heise.de) then manager_etc_hosts_lookup() will return an empty response as it didn't find any AAAA record, but it will still return with 1. But that would mean that the entire lookup will fail unless we check for emptiness (i. e. to differ between NXDOMAIN and NODATA).

This still won't get us both the IPv4 and IPv6 address if you ask for "any family" (systemd-resolve heise.de), but that's within your boundaries of "if one source has any RR it should be considered sufficient".

@martinpitt martinpitt added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed ci-failure-appears-unrelated labels Dec 6, 2016
@martinpitt martinpitt removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Dec 8, 2016
@martinpitt
Copy link
Contributor Author

I reworked the patch to fix the condition check harder (thanks for pointing out!), moved the NXDOMAIN/NODATA check back into dns_query_try_etc_hosts(), and adjusted the test case (as we don't get an IPv4 address from /etc/hosts and IPv6 from DNS at the same time, and I now agree that this is correct behaviour).

@poettering
Copy link
Member

the if check looks great now.

hmm, the NXDOMAIN/NODATA part i don't grok i must say. If /etc/hosts lists only an IPv4 address for a name "foo", but you ask for IPv6, then this should return an empty, but successful DNS reply, which is what the current code is doing. With your change however, this would turn NODATA into "not listed in /etc/hosts" for this case, but I thought we agreed that that was undesirable?

I think if a domain is listed in /etc/hosts then we should consider that file authoritative for it in its entirety, and not read IPv4 info from that file for some name but read IPv6 info from DNS for the same name!

Or wasn't thta what we agreed on?

@martinpitt
Copy link
Contributor Author

Hm, my understanding was that if we find any matching RR in /etc/hosts then we stop searching DNS. I. e. in the above case, systemd-resolve heise.de would only deliver an IPv4 address, but no IPv6. But if we find no IPv4 RR, then we should fall back to DNS so that systemd-resolve -6 heise.de will ask DNS instead of failing with does not have any RR of the requested type. That's still compatible with "take all RRs from the same source".

With only the if condition fixed and without the isempty() check, then asking for anything else than an IPv4 record is still broken:

$ systemd-resolve -6 heise.de
heise.de: resolve call failed: 'heise.de' does not have any RR of the requested type
$ systemd-resolve -t MX heise.de
heise.de: resolve call failed: Name 'heise.de' does not have any RR of the requested type

While the first failure could still be argued to be somewhat correct (although I see no good reason to break compatibility with glibc's dns), the second one is an outright bug -- breaking other resource type lookups just because the admin provides a shortcut for looking up an IPv4 address seems wrong.

I'll extend the test case to also query for an MX record, which might make this even clearer than just with IPv4/6?

@martinpitt
Copy link
Contributor Author

I added the "MX" test case and clarified the commit message in that regard, and also rebased to current master.

@martinpitt
Copy link
Contributor Author

@poettering: Is this still unclear and we need to discuss further?

@martinpitt
Copy link
Contributor Author

Summary from discussion with Lennart:

  • If hosts has any address, consider this authoritative and also don't fall back to DNS for other families.
  • Explain this in the manpage: "note that /etc/hosts is considered authoritative for address lookups, but lookups for other RR types are not affected by it"
  • Make manager_etc_hosts_lookup() return 0 if neither found_a nor found_aaaa and drop the is_empty check again (adjust the tests accordingly)

@martinpitt martinpitt added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Dec 20, 2016
@martinpitt martinpitt self-assigned this Dec 20, 2016
@martinpitt martinpitt removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Dec 21, 2016
@martinpitt
Copy link
Contributor Author

Reworked as discussed. Now MX queries work fine, but /etc/hosts is completely authoritative once it has any address for a domain. I also extended the test cases to check a different domain name and PTR lookups (i. e. IP → domain name).

@martinpitt martinpitt changed the title resolved: correctly handle address families with /etc/hosts lookups resolved: correctly handle non-address RR types with /etc/hosts lookups Dec 21, 2016
@poettering
Copy link
Member

lgtm! thanks for working on this!

@poettering poettering added the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Dec 21, 2016
@martinpitt
Copy link
Contributor Author

It seems the '-t MX' test fails in CI as that seems to stumble over some DNSSEC bug. I didn't see that locally as I currently have DNSSEC disabled, but I can reproduce this as well when enabling it. This is an unrelated bug, but I'll look into working around this in the tests.

@martinpitt martinpitt added the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Dec 21, 2016
Fix wrong condition test in manager_etc_hosts_lookup(), which caused it to
return an IPv4 answer when an IPv6 question was asked, and vice versa.
Also only return success if we actually found any A or AAAA record.

In systemd-resolved.service(8), point out that /etc/hosts mappings only
affect address-type lookups, not other types.

The test case currently disables DNSSEC in resolved, as there is a bug
where "-t MX" fails due to "DNSSEC validation failed" even after
"downgrading to non-DNSSEC mode". This should be dropped once that bug
gets fixed.

Fixes systemd#4801
@martinpitt martinpitt removed the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Dec 21, 2016
@martinpitt
Copy link
Contributor Author

I added a workaround to the test now that disables DNSSEC in resolved during that test. Once this lands and the test is easily available, I'll file an issue for this, as this is an actual bug:

systemd-resolved[21055]: Server 192.168.5.1 does not support DNSSEC, downgrading to non-DNSSEC mode
systemd-resolved[21055]: DNSSEC validation failed for question other.intranet IN A: failed-auxiliary

I. e. resolved already (correctly) determined that the test dnsmasq server doesn't support DNSSEC and disabled it, but then the query fails anyway. I've seen a couple of different bug reports about similar issues and they weren't easy to reproduce -- seems this new test just happens to do that (without the workaround) :-) But this is a completely unrelated issue.

@martinpitt martinpitt merged commit 4050e04 into systemd:master Dec 22, 2016
@martinpitt martinpitt deleted the resolved branch December 22, 2016 06:58
@poettering
Copy link
Member

@martinpitt i wonder if that dnssec failure actually is caused by the /etc/hosts change, i.e. because two zones are mixed here... Any chance you can provide some debug log output for that lookup?

@martinpitt
Copy link
Contributor Author

I filed issue #5029 for tracking this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed resolve

Development

Successfully merging this pull request may close these issues.

resolved assumes that /etc/hosts entry for one RR type means it doesn't ask DNS for another

2 participants