Skip to content

Sarama has stuck on the batch with zero records length. #2053

@pachmu

Description

@pachmu
Versions

Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly.

Sarama Kafka Go
Version 1.29.1 AWS Managed Kafka 1.17
Configuration

What configuration values are you using for Sarama and Kafka?

                 cfg.Net.MaxOpenRequests = 1
		// consume all messages
		cfg.Consumer.IsolationLevel = sarama.ReadUncommitted
		// require producer idempotency
		cfg.Producer.Idempotent = true
		// require consistency on Kafka side before considering the message produced
		cfg.Producer.RequiredAcks = sarama.WaitForAll
		// specify network keep-alive to reduce 'broken pipe' errors on non-KMS environments
		cfg.Net.KeepAlive = time.Second * 30
		// increase processing time from 1s (default) to 5s
		cfg.Consumer.MaxProcessingTime = time.Second * 5
Logs

When filing an issue please provide logs from Sarama and Kafka if at all
possible. You can set sarama.Logger to a log.Logger to capture Sarama debug
output.

logs: CLICK ME

2021/10/22 10:38:26.384689 lvl=INF component=kafka records batch &{5805968 49 2 snappy 0 false false 0 2021-09-30 11:45:34.617 +0200 CEST 2021-09-30 11:45:34.617 +0200 CEST 82609 0 1563 [0xc00030e150] false false [] 1326}
2021/10/22 10:38:26.384705 lvl=INF component=kafka records batch &{5806083 49 2 snappy 0 false false 0 2021-09-30 12:00:01.188 +0200 CEST 2021-09-30 12:00:01.188 +0200 CEST 82609 0 1678 [0xc00030e1c0] false false [] 733}
2021/10/22 10:38:26.384728 lvl=INF component=kafka records batch &{5806313 49 2 snappy 0 false false 0 2021-09-30 12:29:55.542 +0200 CEST 2021-09-30 12:29:55.542 +0200 CEST 82609 0 1908 [0xc00030e230] false false [] 10546}
2021/10/22 10:38:26.384751 lvl=INF component=kafka records batch &{5806399 49 2 snappy 0 false false 0 2021-09-30 12:42:07.755 +0200 CEST 2021-09-30 12:42:07.755 +0200 CEST 82609 0 1994 [0xc00030e2a0] false false [] 1130}
2021/10/22 10:38:26.384772 lvl=INF component=kafka records batch &{5806689 49 2 snappy 0 false false 0 2021-09-30 13:25:25.119 +0200 CEST 2021-09-30 13:25:25.119 +0200 CEST 82609 0 2284 [0xc00030e380] false false [] 2934}
2021/10/22 10:38:26.384790 lvl=INF component=kafka records batch &{5807057 49 2 snappy 0 false false 0 2021-09-30 14:24:12.229 +0200 CEST 2021-09-30 14:24:12.229 +0200 CEST 82609 0 2652 [0xc00030e3f0] false false [] 3124}
2021/10/22 10:38:26.384807 lvl=INF component=kafka records batch &{5807106 49 2 snappy 0 false false 0 2021-09-30 14:31:33.337 +0200 CEST 2021-09-30 14:31:33.337 +0200 CEST 82609 0 2701 [0xc00030e460] false false [] 2335}
2021/10/22 10:38:26.384824 lvl=INF component=kafka records batch &{5807150 49 2 snappy 0 false false 0 2021-09-30 14:38:49.18 +0200 CEST 2021-09-30 14:38:49.18 +0200 CEST 82609 0 2745 [0xc00030e4d0] false false [] 2917}
2021/10/22 10:38:26.384858 lvl=INF component=kafka records batch &{5807374 49 2 snappy 0 false false 0 2021-09-30 15:12:07.587 +0200 CEST 2021-09-30 15:12:07.587 +0200 CEST 82609 0 2969 [0xc00030e540] false false [] 5744}
2021/10/22 10:38:26.384933 lvl=INF component=kafka records batch &{5807388 49 2 snappy 0 false false 0 2021-09-30 15:14:10.358 +0200 CEST 2021-09-30 15:14:10.358 +0200 CEST 82609 0 2983 [0xc00030e5b0] false false [] 31539}
2021/10/22 10:38:26.384958 lvl=INF component=kafka records batch &{5807391 49 2 snappy 0 false false 0 2021-09-30 15:14:44.664 +0200 CEST 2021-09-30 15:14:44.664 +0200 CEST 82609 0 2986 [0xc00030e620] false false [] 2135}
2021/10/22 10:38:26.385027 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.418552 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.450115 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.481958 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.522805 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.557499 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.593690 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.625917 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.657295 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.775601 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.806612 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.840360 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.871158 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}

Problem Description

I'm using Goka (Sarama based library) and have some problems with reading from the one partition of 4. The consumer has stuck on the one particular batch and rereads it again and again.

Here is some info:
This is batch struct:

type RecordBatch struct {
	FirstOffset           int64
	PartitionLeaderEpoch  int32
	Version               int8
	Codec                 CompressionCodec
	CompressionLevel      int
	Control               bool
	LogAppendTime         bool
	LastOffsetDelta       int32
	FirstTimestamp        time.Time
	MaxTimestamp          time.Time
	ProducerID            int64
	ProducerEpoch         int16
	FirstSequence         int32
	Records               []*Record
	PartialTrailingRecord bool
	IsTransactional       bool

	compressedRecords []byte
	recordsLen        int // uncompressed records size
}

I added some logs for all decoded batches and firstly I see this:

2021/10/22 10:38:26.384671 lvl=INF component=kafka records batch &{5805856 49 2 snappy 0 false false 0 2021-09-30 11:29:46.662 +0200 CEST 2021-09-30 11:29:46.662 +0200 CEST 82609 0 1451 [0xc00030e0e0] false false [] 1735}
2021/10/22 10:38:26.384689 lvl=INF component=kafka records batch &{5805968 49 2 snappy 0 false false 0 2021-09-30 11:45:34.617 +0200 CEST 2021-09-30 11:45:34.617 +0200 CEST 82609 0 1563 [0xc00030e150] false false [] 1326}
2021/10/22 10:38:26.384705 lvl=INF component=kafka records batch &{5806083 49 2 snappy 0 false false 0 2021-09-30 12:00:01.188 +0200 CEST 2021-09-30 12:00:01.188 +0200 CEST 82609 0 1678 [0xc00030e1c0] false false [] 733}
2021/10/22 10:38:26.384728 lvl=INF component=kafka records batch &{5806313 49 2 snappy 0 false false 0 2021-09-30 12:29:55.542 +0200 CEST 2021-09-30 12:29:55.542 +0200 CEST 82609 0 1908 [0xc00030e230] false false [] 10546}
2021/10/22 10:38:26.384751 lvl=INF component=kafka records batch &{5806399 49 2 snappy 0 false false 0 2021-09-30 12:42:07.755 +0200 CEST 2021-09-30 12:42:07.755 +0200 CEST 82609 0 1994 [0xc00030e2a0] false false [] 1130}
2021/10/22 10:38:26.384772 lvl=INF component=kafka records batch &{5806689 49 2 snappy 0 false false 0 2021-09-30 13:25:25.119 +0200 CEST 2021-09-30 13:25:25.119 +0200 CEST 82609 0 2284 [0xc00030e380] false false [] 2934}
2021/10/22 10:38:26.384790 lvl=INF component=kafka records batch &{5807057 49 2 snappy 0 false false 0 2021-09-30 14:24:12.229 +0200 CEST 2021-09-30 14:24:12.229 +0200 CEST 82609 0 2652 [0xc00030e3f0] false false [] 3124}
2021/10/22 10:38:26.384807 lvl=INF component=kafka records batch &{5807106 49 2 snappy 0 false false 0 2021-09-30 14:31:33.337 +0200 CEST 2021-09-30 14:31:33.337 +0200 CEST 82609 0 2701 [0xc00030e460] false false [] 2335}
2021/10/22 10:38:26.384824 lvl=INF component=kafka records batch &{5807150 49 2 snappy 0 false false 0 2021-09-30 14:38:49.18 +0200 CEST 2021-09-30 14:38:49.18 +0200 CEST 82609 0 2745 [0xc00030e4d0] false false [] 2917}
2021/10/22 10:38:26.384858 lvl=INF component=kafka records batch &{5807374 49 2 snappy 0 false false 0 2021-09-30 15:12:07.587 +0200 CEST 2021-09-30 15:12:07.587 +0200 CEST 82609 0 2969 [0xc00030e540] false false [] 5744}
2021/10/22 10:38:26.384933 lvl=INF component=kafka records batch &{5807388 49 2 snappy 0 false false 0 2021-09-30 15:14:10.358 +0200 CEST 2021-09-30 15:14:10.358 +0200 CEST 82609 0 2983 [0xc00030e5b0] false false [] 31539}
2021/10/22 10:38:26.384958 lvl=INF component=kafka records batch &{5807391 49 2 snappy 0 false false 0 2021-09-30 15:14:44.664 +0200 CEST 2021-09-30 15:14:44.664 +0200 CEST 82609 0 2986 [0xc00030e620] false false [] 2135}

Looks ok. But then I see:

2021/10/22 10:38:26.385027 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.418552 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.450115 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.481958 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.522805 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.557499 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.593690 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.625917 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.657295 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.775601 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.806612 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.840360 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.871158 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.901792 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.934249 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}

and it keeps going indefinitely.
I dug into the code and figured out that there is no exit from such situation when the records len is 0. Sarama just reads this batch again and again. A high watermark was not reached.
We use AWS Managed Kafka in this case.
Please help. We are firing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-investigationIssues that require followup from maintainersstale/exemptIssues and pull requests that should never be closed as stale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions