Skip to content

Conversation

@murong00
Copy link
Contributor

Motivation

Fixes #5597

Add backoff retries when getting partitioned metadata from brokers.

The change in #5734 (copy from #5603) used the wrong time unit when inited Backoff which failed to trigger the retry logic as expected.

Modifications

Correct the time unit and add some useful log.

Copy link
Member

@jiazhai jiazhai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @murong00 for figuring out the issue. Would you please help add a unit test for the retry logic to make the code protected.

@jiazhai
Copy link
Member

jiazhai commented Dec 16, 2019

@murong00 Seems there was a test case testGetPartitionedTopicDataTimeout() , maybe we should improve it to make it really protecting the code.

@murong00
Copy link
Contributor Author

@jiazhai ok, will improve it later.

@murong00
Copy link
Contributor Author

@jiazhai fix the unit test, please help to review again.

String urlsWithUnreached = "http://localhost:51000,localhost:" + BROKER_WEBSERVICE_PORT;
if (isTcpLookup) {
url = "pulsar://localhost:51000,localhost:51001";
urlsWithUnreached = "pulsar://localhost:51000,localhost::" + BROKER_PORT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

localhost::" + BROKER_PORT . there are 2 ":" here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks.


@Test (timeOut = 4000)
public void testGetPartitionedTopicDataTimeout() {
@Test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we create a new test case, the original one is to test timeout, while your new test is try to test retry and success. also how about add some comments or change the test case name to make it more clear? e.g. according to the test steps, the name like 'testMultiHostUrlRetrySuccess' would be better. And some of the test cases are not related with PulsarBrokerStatsClientTest, It would be better to move them to or create a more related test suite, such as PulsarMultiHostClientTest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice advice, have fixed it.

@jiazhai
Copy link
Member

jiazhai commented Dec 19, 2019

Thanks @murong00 for the help, over all looks good to me. left some minor comments.

@murong00
Copy link
Contributor Author

run java8 tests
run cpp tests

@sijie
Copy link
Member

sijie commented Dec 20, 2019

run java8 tests
run cpp tests

@sijie sijie added area/client type/bug The PR fixed a bug or issue reported a bug labels Dec 20, 2019
@sijie sijie added this to the 2.5.0 milestone Dec 20, 2019
@murong00
Copy link
Contributor Author

need fix NPE in ConnectionTimeoutTest:

2019-12-20\T\05:33:18.457 [ERROR] testLowTimeout(org.apache.pulsar.client.impl.ConnectionTimeoutTest)  Time elapsed: 31.139 s  <<< FAILURE!
java.lang.NullPointerException
	at org.apache.pulsar.client.impl.ConnectionTimeoutTest.testLowTimeout(ConnectionTimeoutTest.java:53)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:124)
	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:54)
	at org.testng.internal.InvokeMethodRunnable.run(InvokeMethodRunnable.java:44)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

@murong00
Copy link
Contributor Author

run java8 tests
run cpp tests

@sijie
Copy link
Member

sijie commented Dec 24, 2019

run cpp tests
run java8 tests

1 similar comment
@sijie
Copy link
Member

sijie commented Dec 24, 2019

run cpp tests
run java8 tests

@murong00
Copy link
Contributor Author

@sijie The retry logic may skip the limit of lookup throttle which can be set on client or server side (e.g. maxConcurrentLookupRequest,maxLookupRequests), for example unit test testLookupThrottlingForClientByClient() may fail since all consumer is connected.

@murong00
Copy link
Contributor Author

murong00 commented Dec 25, 2019

Meanwhile, fixed some unit test by setting operationTimeout to 1s (default 30s) in order to reduce unit testing time. Please help to take a look.

@sijie
Copy link
Member

sijie commented Dec 25, 2019

retest this please

@murong00
Copy link
Contributor Author

run java8 tests

@jiazhai
Copy link
Member

jiazhai commented Dec 30, 2019

run java8 tests

error:
org.apache.pulsar.io.PulsarFunctionE2ETest.testPulsarFunctionStats
org.apache.pulsar.io.PulsarFunctionE2ETest.testPulsarSinkStatsWithFile
org.apache.pulsar.io.PulsarFunctionE2ETest.testPulsarSinkStatsWithUrl
org.apache.pulsar.io.PulsarFunctionE2ETest.testPulsarSourceStatsWithFile
org.apache.pulsar.io.PulsarFunctionE2ETest.testPulsarSourceStatsWithUrl

@sijie
Copy link
Member

sijie commented Jan 1, 2020

run integration tests

@murong00
Copy link
Contributor Author

murong00 commented Jan 2, 2020

run java8 tests

4 similar comments
@sijie
Copy link
Member

sijie commented Jan 2, 2020

run java8 tests

@murong00
Copy link
Contributor Author

murong00 commented Jan 3, 2020

run java8 tests

@tuteng
Copy link
Member

tuteng commented Jan 3, 2020

run java8 tests

@jiazhai
Copy link
Member

jiazhai commented Jan 3, 2020

run java8 tests

@tuteng tuteng merged commit c7094c9 into apache:master Jan 3, 2020
@murong00 murong00 deleted the branch-5844 branch March 8, 2020 02:19
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
…pache#5844)

Motivation
Fixes apache#5597

Add backoff retries when getting partitioned metadata from brokers.

The change in apache#5734 (copy from apache#5603) used the wrong time unit when inited Backoff which failed to trigger the retry logic as expected.

Modifications
Correct the time unit and add some useful log.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/client type/bug The PR fixed a bug or issue reported a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[pulsar-client-java] retry when getPartitionedTopicMetadata failed

4 participants