Skip to content
This repository was archived by the owner on Jan 24, 2024. It is now read-only.

Conversation

@BewareMyPower
Copy link
Collaborator

@BewareMyPower BewareMyPower commented Jun 8, 2021

Fixes #544

Motivation

After the debugging, when a FETCH request contains many partitions, for each partition, KafkaTopicConsumerManager#remove will be executed in the same pulsar-io-<suffix> thread, which will eventually call MessageIdUtils#getPositionForOffset to get PositionImpl of the offset.

However, getPositionForOffset is a synchronous method that waits until a CompletableFuture is done. The future is returned by ManagedLedgerImpl#asyncFindPosition that calls ManagedLedgerImpl#asyncReadEntry internally. If the producer is publishing messages at the same time, the asyncAddEntry method could be called for the same ManagedLedgerImpl object. It will somehow cause the deadlock.

Modifications

  • Just use ManagedLedgerImpl#asyncFindPosition instead of the synchronous call in MessageIdUtils#getPositionForOffset.
  • Caches the future of Pair<ManagedCursor, Long> in KafkaTopicConsumerManager instead of the Pair<ManagedCursor, Long>. Then in MessageFetchContext, use the ManagedCursor in future's callback .
  • Add a unit test to ensure after this change, the non-durable cursor will be still created once for each consumer. This condition is easily to break in a refactor like this PR and it may cause performance problem because cursor might be created each time a FETCH request arrived. So here I added a test to avoid the case.

@BewareMyPower BewareMyPower requested a review from jiazhai as a code owner June 8, 2021 16:50
@BewareMyPower BewareMyPower force-pushed the bewaremypower/fix-find-position-deadlock branch from 424d369 to b5f76bb Compare June 8, 2021 16:53
@BewareMyPower BewareMyPower self-assigned this Jun 8, 2021
@BewareMyPower BewareMyPower changed the title [WIP] Fix deadlock caused by synchronous asyncFindPosition Fix deadlock caused by synchronous asyncFindPosition Jun 9, 2021
@jiazhai jiazhai merged commit 05b778d into streamnative:master Jun 9, 2021
@BewareMyPower BewareMyPower deleted the bewaremypower/fix-find-position-deadlock branch June 9, 2021 07:16
Copy link
Collaborator

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good job

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] KoP cannot support a FETCH request with many partitions

3 participants