Skip to content

SOL-143245: Updated librdkafka to 2.12.1#6

Merged
solace-dberezhnoy merged 165 commits intomasterfrom
solace-dberezhnoy/librdkafka/master/SOL-143245/0
Jan 21, 2026
Merged

SOL-143245: Updated librdkafka to 2.12.1#6
solace-dberezhnoy merged 165 commits intomasterfrom
solace-dberezhnoy/librdkafka/master/SOL-143245/0

Conversation

@solace-dberezhnoy
Copy link
Copy Markdown

I got conflicts in:

        both modified:   src/rdkafka.c
        both modified:   src/rdkafka_admin.c
        both modified:   src/rdkafka_broker.c
        both modified:   src/rdkafka_conf.c
        both modified:   src/rdkafka_conf.h
        both modified:   src/rdkafka_metadata.c
        both modified:   src/rdkafka_metadata_cache.c
        both modified:   src/rdkafka_sasl_oauthbearer_oidc.c
        both modified:   src/rdkafka_topic.c
        both modified:   src/rdmap.c
        both modified:   src/rdunittest.c

emasab and others added 30 commits October 30, 2024 13:44
…a instances (confluentinc#4724)

Circular dependencies from a partition fetch queue message to  the same partition blocked the destroy of an instance, that happened in case the partition was removed from the cluster while it was being consumed. Solved by purging internal partition queue, after being stopped and removed, to allow reference count to reach zero and trigger a destroy.

Purging internal fetch queue on removing the partition only for the consumer.
* Security upgrade for OpenSSL and Curl, CVEs fixed:

OpenSSL
- CVE-2024-2511
- CVE-2024-4603
- CVE-2024-4741
- CVE-2024-5535
- CVE-2024-6119

CURL
- CVE-2024-8096
- CVE-2024-7264
- CVE-2024-6874
- CVE-2024-6197

* Fix for curl configure failure caused by
curl/curl#14373
)

must be equal to the server sent nonce, that already contains the client side nonce. librdkafka was incorrectly concatenating the client side nonce again, leading to this fix being made on AK side, released in 3.8.1, with endsWith instead of equals.
apache/kafka@0a00456
except the Style check job because it needs clang format 10
Fix to inherit javac path, needed by test 0098
Mock handler implementation
Rename current consumer protocol from generic to classic
Mock handler with automatic or manual assignment
More consumer group metadata getters
Test helpers
Configurable session timeout and HB interval
Fix mock handler ListOffsets response
LeaderEpoch instead of CurrentLeaderEpoch
Integration tests passing with AK trunk
Improve documentation and KIP 848 specific mock tests
Add mock tests for unknown topic id in metadata request and partial reconciliation
Make test 0147 more reliable
Fix test 0106 after HB timeout change
Exclude test case with AK trunk
Rename rd_kafka_buf_write_tags to rd_kafka_buf_write_tags_empty
Trivup 0.12.5 can run a KafkaCluster directly with KRaft and AK trunk
Trivup 0.12.6 build with a specific commit
Trivup 0.12.7 with fixes for AK 3.8.0 and Py 3.12
New version of trivup 0.12.7 to fix an issue with apache/kafka#16464 on AK > 3.8.0
Static group membership mock tests
Move test 0147 to a different PR
Disable interactive "needsrestart" prompt
* test_read_file can read binary files too
* Trivup 0.12.8
* Read certificate CA chain when set using a configuration setter with PEM format. Test that CA with untrusted chain fails authentication.
* Test untrusted certificate signed with an intermediate CA
* Remove private key and duplicate certs from pem client certificate
* Print logs sent as events
* Trivup now already inheriths the environment in interactive mode
* Use namespace to avoid conflicts on TestEventCb
…h with client certificate chain (confluentinc#4900)

Failing test: expect the error code that is received when no certificate is sent instead of the one received when it's sent but not trusted.
Client cert callback to check if trusted certificate authorities match with client certificate chain.
Log a warning when client certificate isn't sent


---------

Co-authored-by: trnguyencflt <[email protected]>
… leader epoch (confluentinc#4901)

Failing tests including for confluentinc#4796 and confluentinc#4804
Closes confluentinc#4796 and confluentinc#4804
CHANGELOG
Fix for the correct expected RPC code in test 0139
Apply same fix to metadata update operation too
Don't change rktp state to active when there's no leader but wait it's available to validate it
Comment about excluded -1 value
An incorrect assumption is made that libssl is built with support for
the (now-deprecated) ENGINE API if it is provided by OpenSSL >= 1.1.0 or
LibreSSL. OPENSSL_NO_ENGINE is defined by OpenSSL and all of its forks
if the ENGINE API was disabled at compile-time - ensure that the
definition of OPENSSL_NO_ENGINE is taken into account when using ENGINE
features.
…and client is using SASL authentication only (confluentinc#4936)

without any client certificate set
* removing generated internal project.yml

* removing generated public project.yml

---------

Co-authored-by: service-bot-app[bot] <189278048+service-bot-app[bot]@users.noreply.github.com>
* removing generated internal project.yml

* removing generated public project.yml

---------

Co-authored-by: service-bot-app[bot] <189278048+service-bot-app[bot]@users.noreply.github.com>
* Verify Ubuntu 24.04 and arm64 packages
* Add Semaphore task for verifying
as it's not used anymore. Was used for
AppVeyor CI builds.
…tinc#4908)

Closes: confluentinc#4059.

Commits during a rebalance could cause to lose the assignment if the
generation id was bumped by second join group request.
Solved by not re-joining the group in case an illegal generation error happens
during a rebalance.

Happening since v1.6.0.
Semaphore pipeline linked to a task that runs the full test suite with customizable parameters.
Contains a promotion to run it automatically on master commits only.
as timeout and checking after wakeups if it's been reached,
Avoids yielding earlier than requested because of spourious wakeups.
Fix flakiness in many tests, especially 0080
because of the fetch backoff left from previous broker.
Resets the fetch backoff when the partitions joins a
new broker.
due to latency increase applying to all RPCs,
including ApiVersions, leading to the timeout happening
before the produce request is sent.
The error is IN_QUEUE instead of IN_FLIGHT, and the
status becomes NOT_PERSISTED instead of POSSIBLY_PERSISTED.
Fixed using the mock cluster instead of sockem and applying
the latency only to the Produce request.
emasab and others added 21 commits September 2, 2025 09:28
…e with big-endian architectures (confluentinc#5183)

* Fix compression types read issue in GetTelemetrySubscriptions response for big-endian architectures
* Decrease allocated buffer size in `rd_kafka_PushTelemetryRequest` and explicitly cast the enum
…roupHeartbeat not updating member epoch in a case (confluentinc#4672)

[KIP-848] Fixed a condition where error was being raised in commit due to old error in the topic partition
[KIP-848] Fix discarding heartbeat response without epoch update when leaving during inflight HB
Re-bootstrap is now triggered only after metadata.recovery.rebootstrap.trigger.ms
have passed since first metadata refresh request after last successful
metadata response. The calculation was since last successful metadata response
so it's possible it did overlap with the periodic topic.metadata.refresh.interval.ms
and cause a re-bootstrap even if not needed.
…m them (confluentinc#4931)

* Fetched committed offsets should be validated
before starting to consume from it.
Failing test and mock handler implementation
for returning the committed offset leader epoch
instead of current leader epoch.

* Validate the offsets before starting to fetch assigned partitions

* Add more test cases for partition assignment
offset validation

* Fix for test 0139 subtest `do_test_store_offset_without_leader_epoch` . When fetching an offset it returns the leader epoch used when committing, not the current
leader epoch.
Given the mock cluster fix the test needs to be changed.

* Fix test `0139` subtest `do_test_list_offsets_leader_change`:
use cloned partition list for listing offsets, to avoid the fake leader epoch is then used for validation when assigning.

Fix ListOffsets mock handler for logging the correct returned leader epoch.

* Changelog entry

* Reduce number of tests in quick mode

* Add a new fetch state when finishing validating and starting to seek after a truncation,
to avoid a second repeated validation and possibly duplicated messages.

* Increase single test timeout

* Fix to leave the group in `rd_kafka_cgrp_incr_unassign_done` if terminate was requested, as done in `rd_kafka_cgrp_unassign_done` and `rd_kafka_cgrp_consumer_incr_unassign_done`

* Mock cluster, set the group as empty when last member leaves
instead of triggering a rebalance

* Test 0139 with mock cluster marked as local.
Doesn't delete topic if tests are local only as it's
possible there's no cluster to connect to
and it speeds up completing the test

* Resume the partition before fetch start or before validation
* Revert setting timeout to infinity

* style fix

* Changelog change

* Changelog changes

* Changelog change
* Fix flakyness test 0085
* Errors that cause a refresh coordinator
like NOT_COORDINATOR during an offset fetch
should not be propagated to the application.
…al promotions (confluentinc#5191)

* Pipeline improvements about machine types and auto-cancel
* Use cached docker image for integration tests, style checks, docs build
* vcpkg cache
* msys2 cache
* Upgrade macOS agents
…fluentinc#5155)

* Implementation of OAUTHBEARER/OIDC metadata based authentication, initially supporting the Azure UAMI method.
* Tests with trivup 0.14.0 supporting metadata based authentications
* Add documentation and changelog entry
* Rename `azure` value to `azure_imds` and replace UAMI that is the identity with IMDS that is the authentication service
* Extract authentication URL and rename internal function and enums
* Changes to name the configuration property "query" instead of "params" as in other implementations and to make it optional if the default endpoint is overridden.
…odes (confluentinc#5194)

* Add test cases for new OffsetCommit and OffsetFetch Error Codes
* Testcase for discarding the member epoch in a consumer group heartbeat response when leaving with an inflight HB
confluentinc#5214)

* Changelog changes and some modification to the KIP-848 migration guide
* Add that KIP-848 is not enabled by default and other PR comments
* Downgrade min supported OSX version to 13
* Version upgrade to v2.12.1
@gitstream-cm
Copy link
Copy Markdown

gitstream-cm Bot commented Nov 14, 2025

Please mark whether you used Copilot to assist coding in this PR

  • Copilot Assisted

Comment thread src/rdkafka_broker.c
rttinfo[0] = 0;

rkb->rkb_c.skip_broker_down = rd_true;
rd_kafka_broker_fail(rkb, LOG_WARNING,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confluence librdkafka has changed this to rd_kafka_broker_planned_fail(...). As far as I can see, the difference between their call to rd_kafka_broker_planned_fail(...) and
this merge of

                         rkb->rkb_c.skip_broker_down = rd_true;
                         rd_kafka_broker_fail(rkb, LOG_WARNING, ...

is just the log level: this code will result in LOG_WARNING, rd_kafka_broker_planned_fail(...) will result in a LOG_DEBUG.

I think we should take Confuence's change to rd_kafka_broker_planned_fail(...). Originally, rd_kafka_broker.c was full of LOG_ERR, which caused AFW and QA to freak out. So, I did a bulk lowering of many of rd_kafka_broker.c's LOG_ERR to LOG_WARNING. It looks like Confluence has gone even further, lowering LOG_ERR to LOG_DEBUG. I think we should take Confluence's judgement on the log severity.

"Expected token as a string value");
goto fail;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to the Confluence merge, there would have been code here like:

        if (rk->rk_conf.debug_sensitive) {
                rd_kafka_dbg(rk, SECURITY, "OIDC",
                             "Received JWT token \"%s\"", jwt_token);
        }

This merge has stripped this out. (With the code re-arrangement, I don't think rk is even in scope in this function.) I think I'm OK with that; we can add it back later if need be.

Copy link
Copy Markdown
Collaborator

@kwdubuc kwdubuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed only the merge collisions, and have only one suggested change in rdkafka_broker.c.

@solace-dberezhnoy solace-dberezhnoy merged commit e9b009e into master Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.