ListOffsetsRequest should only be sent to the leader replica#4616
ListOffsetsRequest should only be sent to the leader replica#4616Kyle Phelps (kphelps) wants to merge 4 commits intoconfluentinc:masterfrom
Conversation
|
It's correct to send the ListOffsets request to the preferred replica. The loop probably comes from this discovered bug: When enabling debug logs, could you check if it's receiving FENCED_LEADER_EPOCH errors? |
|
Nope, I'm seeing |
|
Aha, from KIP-392:
Looks like we unconditionally set the replica id to Looks like the Java client opts to just always send to the leader. WDYT? |
|
Kyle Phelps (@kphelps) The error librdkafka/src/rdkafka_request.c Line 935 in a6d85bd Is it possible to reproduce the issue and send a log with |
dfb9e3e to
341c62d
Compare
|
Found a test that was silently failing due to this issue |
|
Thanks Kyle Phelps (@kphelps) I was checking this issue more in depth and understood the problem, it's different from what I linked and as you said could be solved in two ways, by sending the request to the follower with -2 or to the leader as Java is doing. The con of sending it to leader is that is case the follower is lagging behind it could have other offset resets when fetching, until it has caught up, I've checked broker code and tried using -2 by changing mock cluster implementation and it works too. Will ask for an opinion internally too before deciding for one of the two solutions. |
|
Cannot fix it by sending the request to the follower because there are some problems: broker code: val fetchOnlyFromLeader = offsetRequest.replicaId != ListOffsetsRequest.DEBUGGING_REPLICA_ID
val isClientRequest = offsetRequest.replicaId == ListOffsetsRequest.CONSUMER_REPLICA_ID
val isolationLevelOpt = if (isClientRequest)
Some(offsetRequest.isolationLevel)
else
None |
removed test import
0c86070 to
90d269e
Compare
|
/sem-approve |
|
/sem-approve |
|
Kyle Phelps (@kphelps) sorry, giving we're have having an issue with the public CI, I've created this internal branch with your changes. #4754 |
When using fetch-from-follower, it is currently possible for a consumer to get stuck in a loop sending
ListOffsetRequestwhen we go through therd_kafka_offset_resetpath since the request is sent to the preferred replica. Instead, always send it to the leader.