Description
Currently we have a low value for queued.max.messages.kbytes for our consumers. It is set to 25. When we were on version v0.9.5 (without change: 8321e37), our consumer had high message rates for consumption. After moving to v1.3.0 (and with this backoff change), we see that the that when the toppar reaches queued.max.messages.kbytes, it applies an immediate backoff of 1000ms. This causes the consumer to slow down to fetch from the kafka broker only once (25KB) and then wait until the next second. In the previous case, v0.9.5, it would do the opposite, fetch more aggressively to keep the queue full, more frequently than 100ms (appears so from the logs, have not checked the code for this). (100ms, being the standard fetch timeout). We have fetch.error.backoff.ms set to 0, but that does not help in this scenario. The only way out of this scenario is to increase queued.max.messages.kbytes to a value that the app can consume in 1 second to make enough data available for the app until kafka consumer goes back the next second or to remove the property all together. Having a larger queued.max.messages.kbytes would increase the memory footprint of the app but in this case where throughput is more important, it works alright for us.
I was wondering if you had any suggestions around this please? Ideally shouldn't setting fetch.error.backoff.ms to 0 work?
Checklist
Please provide the following information:
Description
Currently we have a low value for queued.max.messages.kbytes for our consumers. It is set to 25. When we were on version v0.9.5 (without change: 8321e37), our consumer had high message rates for consumption. After moving to v1.3.0 (and with this backoff change), we see that the that when the toppar reaches queued.max.messages.kbytes, it applies an immediate backoff of 1000ms. This causes the consumer to slow down to fetch from the kafka broker only once (25KB) and then wait until the next second. In the previous case, v0.9.5, it would do the opposite, fetch more aggressively to keep the queue full, more frequently than 100ms (appears so from the logs, have not checked the code for this). (100ms, being the standard fetch timeout). We have fetch.error.backoff.ms set to 0, but that does not help in this scenario. The only way out of this scenario is to increase queued.max.messages.kbytes to a value that the app can consume in 1 second to make enough data available for the app until kafka consumer goes back the next second or to remove the property all together. Having a larger queued.max.messages.kbytes would increase the memory footprint of the app but in this case where throughput is more important, it works alright for us.
I was wondering if you had any suggestions around this please? Ideally shouldn't setting fetch.error.backoff.ms to 0 work?
Checklist
Please provide the following information:
[fetch.error.backoff.ms] : [0]
[enable.partition.eof] : [true]
[queued.max.messages.kbytes] : [25]
[fetch.message.max.bytes] : [25600]
[debug] : [all]
[statistics.interval.ms] : [1000]
[queue.buffering.max.ms] : [1]
[socket.timeout.ms] : [300000]
debug=..as necessary) from librdkafka