Skip to content

MINOR: Fix now that kafka.apache.org resolves to 3 IP addresses#9294

Merged
mumrah merged 2 commits intoapache:trunkfrom
jolshan:IPfix
Sep 17, 2020
Merged

MINOR: Fix now that kafka.apache.org resolves to 3 IP addresses#9294
mumrah merged 2 commits intoapache:trunkfrom
jolshan:IPfix

Conversation

@jolshan
Copy link
Copy Markdown
Member

@jolshan jolshan commented Sep 16, 2020

ClusterConnectStatesTest and ClientUtilsTest were failing because they expected kafka.apache.org to resolve to 2 IP addresses. This updates the tests so they reflect that DNS resolves to 3 addresses.

Copy link
Copy Markdown

@abbccdda abbccdda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are really fragile, instead of verifying the number of addresses, could we also check the pattern of addresses or something similar to verify the correctness?

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Sep 16, 2020

@abbccdda I agree that they are fragile. Could you give a little more detail about checking the patterns? I'm not sure if I follow.

@dajac
Copy link
Copy Markdown
Member

dajac commented Sep 17, 2020

Netty used a in-memory DNS server for such tests: https://github.com/netty/netty/blob/master/resolver-dns/src/test/java/io/netty/resolver/dns/TestDnsServer.java. I suppose that we could do something similar in AK as well.

What about merging this one and filing a JIRA to improve this in the future? All these flaky tests due to this are quite annoying so we could already fix them. What do you think?

It would be great if we could backport it to previous versions as well.

@mumrah
Copy link
Copy Markdown
Member

mumrah commented Sep 17, 2020

Agreed with @dajac's suggestion. Maybe for this change we could have the assertion as > 1 rather than equalling a specific value

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Sep 17, 2020

I also like @dajac's suggestion. I've updated the PR so that the tests assert more than 1 address as @mumrah said.

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Sep 17, 2020

@Test
public void testResolveDnsLookup() throws UnknownHostException {
// Note that kafka.apache.org resolves to 2 IP addresses
// Note that kafka.apache.org resolves to at least 2 IP addresses
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment here and below refer to 2 IP addresses still.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying to change it back? I thought it was fair to say "at least 2" to reflect we are asserting greater than 1.


connectionStates.connecting(nodeId1, time.milliseconds(), hostTwoIps, ClientDnsLookup.USE_ALL_DNS_IPS);
InetAddress addr3 = connectionStates.currentAddress(nodeId1);
assertSame(addr1, addr3);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so previously (and also with this change) we are assuming that we will get different resolved IPs for each connection to apache.kafka.org? This seems to rely on round-robbin DNS resolution that we can't really control. I think these assertions will always be prone to failure unless we can control the DNS server like @dajac suggested.

I guess we can commit this to unblock the tests, but we should definitely prioritize a better fix.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I was wondering if we should even keep this test in the current form, and I'm hoping that a DNS we can control would resolve this issue.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is verifying the round robin that our code does, not the DNS server. We should ensure we still have that coverage in the meantime.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying, @ijuma. Seems like we could verify the round-robin behavior without relying on specific number of IPs.

Copy link
Copy Markdown
Member Author

@jolshan jolshan Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I changed it to show that the first three are different but I didn't include the check that it loops back to the first IP since that part seems the most fragile.

Copy link
Copy Markdown
Member

@mumrah mumrah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests looking much better, just the one or two flaky tests as usual rather than dozens of failures :)

@mumrah mumrah merged commit 1443f24 into apache:trunk Sep 17, 2020
@mumrah
Copy link
Copy Markdown
Member

mumrah commented Sep 17, 2020

Thanks for the patch @jolshan!

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Sep 18, 2020

Have we cherry picked this to older branches? It seems like they will be broken otherwise.

mumrah pushed a commit that referenced this pull request Sep 18, 2020
@mumrah
Copy link
Copy Markdown
Member

mumrah commented Sep 18, 2020

2.6 backport applied cleanly, kicked off a build here https://ci-builds.apache.org/job/Kafka/job/kafka-2.6-jdk8/15/

2.5 had a conflict, and one test fails for me locally. I'll look into it along with other branches this evening
Edit: looks like the behavior of ClientUtils#resolve changed in 2.6 (84020bf) so the fix will be slightly different for older versions. How far back should we backport? I see jenkins jobs back to 2.0

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Sep 30, 2020

@mumrah I think 2.0 and higher would be a good target. We don't need separate PRs as long as the cherry pick is clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants