Skip to content

Conversation

@ltamber
Copy link
Contributor

@ltamber ltamber commented Nov 10, 2019

Motivation

Fixes #5597

When using a multi-broker service url to create a producer, if the connection to the first broker failed, the creation will fail.

Modification

Add backoff retries when getting partitioned metadata from brokers.

Copy link
Member

@jiazhai jiazhai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ltamber for the help. It would be better if you could add some test for this fix.

if (isTcpLookup) {
url = "pulsar://localhost:51000,localhost:51001";
}
// PulsarClient client = newPulsarClient(url, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it if the code is not needed any more.

.operationTimeout(3, TimeUnit.SECONDS)
.build();

Consumer<byte[]> consumer = client.newConsumer().topic(topicName).subscriptionName(subscriptionName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 181 will never be executed, no?

.build();

Consumer<byte[]> consumer = client.newConsumer().topic(topicName).subscriptionName(subscriptionName)
.acknowledgmentGroupTime(0, TimeUnit.SECONDS).subscribe();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add faill() after creating a consumer since the consumer will never be created, no?

consumer.close();
producer.close();
client.close();
} catch (PulsarClientException pce) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert the exception time.

TopicName topicName = TopicName.get(topic);
metadataFuture = lookup.getPartitionedTopicMetadata(topicName);
AtomicLong opTimeoutMs = new AtomicLong(conf.getOperationTimeoutMs());
Backoff backoff = new BackoffBuilder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not direct related to this change. but we should consider adding backoff for other places as well.

@sijie sijie added area/client type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages labels Nov 11, 2019
@sijie sijie added this to the 2.5.0 milestone Nov 11, 2019
@sijie sijie changed the title Add retry when create producer async [Issue 5597][pulsar-client-java] retry when getPartitionedTopicMetadata failed Nov 11, 2019
@sijie
Copy link
Member

sijie commented Nov 11, 2019

@ltamber it is always a good practice to copy the description from issues to the pull request. so it will provide enough context to review your pull request. I have updated your description. Please take a look and let me know if you have any questions.

@ltamber
Copy link
Contributor Author

ltamber commented Nov 13, 2019

@sijie thanks a lot, I will fix my code soon.

Comment on lines -130 to -131
String requestUrl = new URL(serviceNameResolver.resolveHostUri().toURL(), path).toString();
String remoteHostName = serviceNameResolver.resolveHostUri().getHost();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sijie maybe a bug here, the host in requestUrl not equal with remoteHostName if multi-broker service url provided, and will step forward two service url every request. please take a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I don't quite get this. Can you show me an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PulsarServiceNameResolver.resolveHostUri() called twice(every time it will return the next address) inside HttpClient.get() ,
if service url is http://ip1:port1,ip2:port2, we assume requestUrl is ip1:port1, the remoteHostName will be ip2, and the next time call HttpClient.get() the requestUrl is still ip1:port1, the remoteHostName is ip2. so HttpClient.get() will only request ip1:port1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see. yes. the original code has a bug there.

@sijie sijie modified the milestones: 2.5.0, 2.4.3 Nov 19, 2019
@sijie sijie merged commit ee11e10 into apache:branch-2.4 Nov 20, 2019
@ltamber ltamber deleted the branch-2.4 branch November 22, 2019 02:52
@wolfstudy wolfstudy modified the milestones: 2.4.3, 2.4.2 Nov 22, 2019
wolfstudy pushed a commit that referenced this pull request Nov 23, 2019
…ta failed (#5603)

### Motivation

Fixes #5597 

When using a multi-broker service url to create a producer, if the connection to the first broker failed, the creation will fail.

### Modification

Add backoff retries when getting partitioned metadata from brokers.

(cherry picked from commit ee11e10)
wolfstudy added a commit that referenced this pull request Nov 25, 2019
wolfstudy added a commit that referenced this pull request Nov 25, 2019
tuteng pushed a commit that referenced this pull request Jan 3, 2020
…5844)

Motivation
Fixes #5597

Add backoff retries when getting partitioned metadata from brokers.

The change in #5734 (copy from #5603) used the wrong time unit when inited Backoff which failed to trigger the retry logic as expected.

Modifications
Correct the time unit and add some useful log.
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
…pache#5844)

Motivation
Fixes apache#5597

Add backoff retries when getting partitioned metadata from brokers.

The change in apache#5734 (copy from apache#5603) used the wrong time unit when inited Backoff which failed to trigger the retry logic as expected.

Modifications
Correct the time unit and add some useful log.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/client type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants