Fix loop of OffsetForLeaderEpoch requests on quick leader changes#4433
Fix loop of OffsetForLeaderEpoch requests on quick leader changes#4433Emanuele Sabellico (emasab) merged 6 commits intomasterfrom
Conversation
|
Thanks for the PR! We have a pretty reliable scenario for triggering #4425 (we do a reassignment on a small topic so that there are at least two partition state changes in short succession, the first and then all subsequent OffsetForLeaderEpoch requests fail with FencedLeaderEpoch). We ran the consumers from this branch and confirmed that we don't see the issue with this patch. |
|
Thanks for confirming it with some independent testing Martin Dickson (@mjd95)! |
Pranav Rathi (pranavrth)
left a comment
There was a problem hiding this comment.
LGTM! Just a question and a nit.
| rd_kafka_mock_broker_push_request_error_rtts( | ||
| mcluster, 2, RD_KAFKAP_OffsetForLeaderEpoch, 1, | ||
| RD_KAFKA_RESP_ERR_KAFKA_STORAGE_ERROR, 900); | ||
|
|
||
| rd_kafka_mock_broker_push_request_error_rtts( | ||
| mcluster, 2, RD_KAFKAP_OffsetForLeaderEpoch, 1, | ||
| RD_KAFKA_RESP_ERR_NO_ERROR, 1000); |
There was a problem hiding this comment.
We can merge these two.
| * | ||
| * See #4425. | ||
| */ | ||
| static void do_test_two_leader_changes(void) { |
There was a problem hiding this comment.
Can you confirm that this test fails with the old code?
There was a problem hiding this comment.
Yep, it fails with a timeout after we enter the infinite loop
* upstream/master: librdkafka v2.3.0 (confluentinc#4455) Fix for idempotent producer fatal errors, triggered after a possibly persisted message state (confluentinc#4438) Move can_q_contain_fetched_msgs inside q_serve (confluentinc#4431) [KIP-580] Exponential Backoff with Mock Broker Changes to Automate Testing. (confluentinc#4422) Update only the mklove version of OpenSSL to 3.0.11 (confluentinc#4454) Permanent errors during offset validation should be retried (confluentinc#4447) Increased flexver request size for Metadata request to include topic_id size (confluentinc#4453) Fix loop of OffsetForLeaderEpoch requests on quick leader changes (confluentinc#4433) Fix for stored offsets not being committed if they lacked the leader epoch (confluentinc#4442) Add leader epoch to control messages (confluentinc#4434) Refactored tmpabuf and fixed an insufficient buffer allocation (confluentinc#4449) Work around KIP-700 restrictions for DescribeCluster [KIP-430] [admin] KIP-430: Add authorized operations to describe API Fix segfault if assignor state is NULL, (confluentinc#4381)
|
Emanuele Sabellico (@emasab) Do you have an ETA on when 2.3 will be released? We believe this might the issue we are experiencing. |
Fixes #4425