Skip to content

Prevent offsets from being committed manually when a rebalance is in progress and partitions are owned by consumer#4089

Closed
Roxane (roxelo) wants to merge 6 commits intoconfluentinc:masterfrom
roxelo:only-commit-when-we-dont-think-there-is-a-rebalance
Closed

Prevent offsets from being committed manually when a rebalance is in progress and partitions are owned by consumer#4089
Roxane (roxelo) wants to merge 6 commits intoconfluentinc:masterfrom
roxelo:only-commit-when-we-dont-think-there-is-a-rebalance

Conversation

@roxelo
Copy link
Copy Markdown

See #4059

When using cooperative sticky, attempting to commit offsets during a rebalance can trigger a follow up rebalance.
One way of reducing the risk of this happening, is to add the same check that is included in the auto commit function in the manual commit method

…n a rebalance is in progress and partitions are owned by consumer
Comment thread src/rdkafka_offset.c Outdated

/* Don't attempt auto commit when rebalancing or initializing since
* the rkcg_generation_id is most likely in flux. */
if (rkcg->rkcg_subscription &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not thread safe (this is on the application thread, but rkcg is local to the rdkafka main thread).

Check should probably be performed in rd_kafka_cgrp_offsets_commit().

But we need to make sure that it is ok to commit during the phase of the rebalance when the rebalance callback/event is triggered.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, I also removed the same check from rd_kafka_cgrp_offset_commit_tmr_cb since the function will also call rd_kafka_cgrp_offsets_commit. I am not sure if this is the ideal behavior for auto commit though as we would no longer "silently" skip committing offsets...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make sure to run make style-fix (on linux, since it si broken on osx)

@@ -0,0 +1,140 @@
/*
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bravo on test!

Comment thread tests/0137-cooperative_commit_rebalance.c Outdated
@wmorgan6796
Copy link
Copy Markdown

Roxane (@roxelo) whats the status of this PR? Any help needed?

@roxelo
Copy link
Copy Markdown
Author

William Morgan (@wmorgan6796) Some of the tests failed and I haven't had a chance to look at it further. If you have some time to investigate, another set of eyes would help

@wmorgan6796
Copy link
Copy Markdown

William Morgan (wmorgan6796) commented Mar 14, 2023

William Morgan (@wmorgan6796) Some of the tests failed and I haven't had a chance to look at it further. If you have some time to investigate, another set of eyes would help

I believe I found the reason for the test failure. We fundamentally change how offset committing occurs by short circuiting even sending the OffsetCommitRequest to the brokers. When we send the request to the brokers, we will get an RD_KAFKA_RESP_ERR_UNKNOWN_MEMBER_ID from the (mock) brokers. But since the request never makes it to the brokers, we end up changing the error returned to RD_KAFKA_RESP_ERR_REBALANCE_IN_PROGRESS.

I think the correct thing to do is allow for the error returned to change, but ultimately would leave it up to the maintainers to decide is ok. In the mean time I'll bring this branch up to date with the latest librdkafka

@wmorgan6796
Copy link
Copy Markdown

#4089 (comment)

Milind L (@milindl) Emanuele Sabellico (@emasab) regarding this, what are your thoughts?

@roxelo
Copy link
Copy Markdown
Author

I am closing this PR since it's out of date and William Morgan (@wmorgan6796) opened #4220 to fix the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants