Skip to content

Fatal error "out of order sequence number" makes idempotent producer not functional anymore #3584

@kenneth-jia

Description

Read the FAQ first: https://github.com/edenhill/librdkafka/wiki/FAQ

Description

We met some fatal error with idempotence producer after kafka cluster was down, -- the producer would not be able to send any more message from then on!
The problem still exists even with latest librdkafka release (v1.8.2) -- thus not the same problem with #3577

Here's the log while error occurred,

[2021-10-19 18:47:34.885129]EMERG KafkaProducer[producer-140691259498496] FATAL | [thrd:127.0.0.1:29092/bootstrap]: Fatal error: Broker: Broker received an out of order sequence number: ProduceRequest for topicForTest [0] with 1 message(s) failed due to sequence desynchronization with broker 1 (PID{Id:8000,Epoch:0}, base seq 0, idemp state change 1383467ms ago, last partition error NOT_LEADER_FOR_PARTITION (actions Refresh,MsgNotPersisted, base seq 0..0, base msgid 1, 107ms ago)
Met error: Broker: Broker received an out of order sequence number [45] fatal | ProduceRequest for topicForTest [0] with 1 message(s) failed due to sequence desynchronization with broker 1 (PID{Id:8000,Epoch:0}, base seq 0, idemp state change 1383467ms ago, last partition error NOT_LEADER_FOR_PARTITION (actions Refresh,MsgNotPersisted, base seq 0..0, base msgid 1, 107ms ago)
Exception thrown by producer: 2021-10-19 18:47:34.885264: Broker: Broker received an out of order sequence number [45] (/home/winner/Repo/modern-cpp-kafka-3/include/./kafka/KafkaProducer.h:435)

(However, it seems no related log from broker side)

How to reproduce

  1. Start the Kafka cluster (v2.8.0)
  2. Start an idempotent producer (with librdkafka v1.8.2), send a few messages and then wait
  3. Kill the zookeeper and Kafka brokers processes
  4. Wait ~20 min
  5. Re-start the Kafka cluster
  6. Try to send a message with the previous producer, and probably fail with the "out-of-sequence-number" fatal error, -- the producer is not functional anymore.

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

  • librdkafka version: 1.8.2
  • Apache Kafka version: kafka_2.13-2.8.0
  • librdkafka client configuration: enable.idempotence=true
  • Operating system: ubuntu20.04 x64
  • Provide logs (with debug=.. as necessary) from librdkafka
  • Provide broker log excerpts
  • Critical issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions