Skip to content

ListOffsets loop of failed requests on leader epoch change until timeout happens #4620

@emasab

Description

Description

ListOffsets requests done for partitions with no committed offsets can be retried indefinitely if that partition leader epoch has changed, because the buffer is retried without recreating it with the new CurrentLeaderEpoch received from the Metadata refresh call.

How to reproduce

Start consuming partitions that have no committed offset, or seek to the latest offset. A partition leader change should happen that changes the current leader epoch to a value higher than the cached one. The ListOffsets request give a FENCED_LEADER_EPOCH and then it refreshes Metadata, but starts retrying the buffer with the same CurrentLeaderEpoch, leading to a loop of failed requests.

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

  • librdkafka version (release number or git tag): 2.1.0+
  • Apache Kafka version: any
  • librdkafka client configuration: any
  • Operating system: any
  • Provide logs (with debug=.. as necessary) from librdkafka
  • Provide broker log excerpts
  • Critical issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions