Conversation
fluent-bit 3.2.7, 3.2.8 and 3.2.9 are segfaulting when used in combination with the systemd input. Lets revert to 3.2.6 for now. Upstream bug: fluent/fluent-bit#10139 Note that fluent-bit-3.2.7 fixes two high CVEs which we are now reintroducing. However they are only exploitable if you are using the OpenTelemetry input or the Prometheus Remote Write input. OpenTelemetry input: [CVE-2024-50609](https://nvd.nist.gov/vuln/detail/CVE-2024-50609) Prometheus Remote Write input: [CVE-2024-50608](https://nvd.nist.gov/vuln/detail/CVE-2024-50608) The problem is as follows: 3.2.7 started vendoring a copy of `libzstd` in tree and statically linking against it. Also, the fluent-bit binary exports the symbols of static libraries it links against. This is a problem because `libzstd` gets `dlopen()`ed by `libsystemd` when enumerating the journal (as journal logs are zstd compressed). and `libzstd` in Nixpkgs is built with `-DZSTD_LEGACY_SUPPORT=0` which causes `struct ZSTD_DCtx` to be 16 bytes smaller than without this flag https://github.com/facebook/zstd/blob/dev/lib/decompress/zstd_decompress_internal.h#L183-L187 `libsystemd` calls [`sym_ZSTD_createDCtx()`](https://github.com/systemd/systemd/blob/1e79a2923364b65fc9f347884dd5b9b2087f6e32/src/basic/compress.c#L480) which calls the function pointer returned by `dlsym()` which is calling into the `libzstd` that comes with `nixpkgs` and thus allocates a struct that is 16 bytes smaller. Later then `sym_ZSTD_freeDCtx()` is called. However because fluent-bit has `zstd` in its global symbol table, any functions that `sym_ZSTD_freeDCtx()` calls will be calls to the functions in the vendored fluent-bit version of the library which expects the larger struct. This then causes enough heap corruption to cause a segfault. E.g. the subsequent calls to `ZSTD_clearDict(dctx)` and `ZSTD_customFree(dctx->inBuff)` in https://github.com/facebook/zstd/blob/dev/lib/decompress/zstd_decompress.c#L324 will be working on a struct that is 16 bytes smaller than the one that was allocated by `libsystemd` and will cause a segfault at some point and thus are probably modifying pieces of memory that they shouldn't (gdb) bt #0 0x00007f10e7e9916c in __pthread_kill_implementation () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6 #1 0x00007f10e7e40e86 in raise () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6 #2 0x00007f10e7e2893a in abort () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6 #3 0x000000000046a938 in flb_signal_handler () #4 <signal handler called> #5 0x00007f10e7ea42b7 in unlink_chunk.isra () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6 #6 0x00007f10e7ea45cd in _int_free_create_chunk () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6 #7 0x00007f10e7ea5a1c in _int_free_merge_chunk () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6 #8 0x00007f10e7ea5dc9 in _int_free () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6 #9 0x00007f10e7ea8613 in free () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6 #10 0x00007f10e80ad3b5 in ZSTD_freeDCtx () from /nix/store/wy0slah6yvchgra8nhp6vgrqa6ay72cq-zstd-1.5.6/lib/libzstd.so.1 #11 0x00007f10e8c90f6b in decompress_blob_zstd () from /nix/store/b2cfj7yk3wfg1jdwjzim7306hvsc5gnl-systemd-257.3/lib/libsystemd.so.0 #12 0x00007f10e8bf0efe in journal_file_data_payload () from /nix/store/b2cfj7yk3wfg1jdwjzim7306hvsc5gnl-systemd-257.3/lib/libsystemd.so.0 #13 0x00007f10e8c00f74 in sd_journal_enumerate_data () from /nix/store/b2cfj7yk3wfg1jdwjzim7306hvsc5gnl-systemd-257.3/lib/libsystemd.so.0 #14 0x00000000004eae2f in in_systemd_collect () #15 0x00000000004eb5a0 in in_systemd_collect_archive () #16 0x000000000047aa18 in flb_input_collector_fd () #17 0x0000000000495223 in flb_engine_start () #18 0x000000000046f304 in flb_lib_worker () #19 0x00007f10e7e972e3 in start_thread () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6 #20 0x00007f10e7f1b2fc in __clone3 () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6 Reverts 7310ab3 Reverts 4fbc6cf
Member
Author
|
An alternative solution is to try out the suggestion in: |
47 tasks
Member
Author
|
I will merge this as is and then collaborate with upstream for a proper fix. We can then bump to the fixed version at a later date. |
13 tasks
arianvp
added a commit
to arianvp/nixpkgs
that referenced
this pull request
Apr 4, 2025
arianvp
added a commit
that referenced
this pull request
Apr 7, 2025
arianvp
added a commit
to arianvp/nixpkgs
that referenced
this pull request
May 27, 2025
fluent-bit now dynamically links against libzstd, sqlite and msgpack This means that we will not run into the issue that caused us to roll back from 3.2.9 to 3.2.6 anymore (NixOS#395128) as there shouldn't be two incompatible versions of libzstd loaded at the same time. Fixes fluent/fluent-bit#10139 Is this eligible for back-porting even-though it's a major version bump? In my opinion: yes. We can't keep maintaining 3.x as all the builds after 3.2.6 have the same issue so we are missing out on critical vulnerability fixes. In the meantine Non of the following links mention any backwards compatibilities with 3.2.6: * https://fluentbit.io/announcements/v4.0.0/ * https://fluentbit.io/announcements/v4.0.1/ * https://fluentbit.io/announcements/v4.0.2/ * https://docs.fluentbit.io/manual/installation/upgrade-notes/
arianvp
added a commit
to arianvp/nixpkgs
that referenced
this pull request
Jun 2, 2025
fluent-bit now dynamically links against libzstd, sqlite and msgpack This means that we will not run into the issue that caused us to roll back from 3.2.9 to 3.2.6 anymore (NixOS#395128) as there shouldn't be two incompatible versions of libzstd loaded at the same time. Fixes fluent/fluent-bit#10139 Is this eligible for back-porting even-though it's a major version bump? In my opinion: yes. We can't keep maintaining 3.x as all the builds after 3.2.6 have the same issue so we are missing out on critical vulnerability fixes. In the meantine Non of the following links mention any backwards compatibilities with 3.2.6: * https://fluentbit.io/announcements/v4.0.0/ * https://fluentbit.io/announcements/v4.0.1/ * https://fluentbit.io/announcements/v4.0.2/ * https://fluentbit.io/announcements/v4.0.3/ * https://docs.fluentbit.io/manual/installation/upgrade-notes/
nixpkgs-ci bot
pushed a commit
that referenced
this pull request
Jun 3, 2025
fluent-bit now dynamically links against libzstd, sqlite and msgpack This means that we will not run into the issue that caused us to roll back from 3.2.9 to 3.2.6 anymore (#395128) as there shouldn't be two incompatible versions of libzstd loaded at the same time. Fixes fluent/fluent-bit#10139 Is this eligible for back-porting even-though it's a major version bump? In my opinion: yes. We can't keep maintaining 3.x as all the builds after 3.2.6 have the same issue so we are missing out on critical vulnerability fixes. In the meantine Non of the following links mention any backwards compatibilities with 3.2.6: * https://fluentbit.io/announcements/v4.0.0/ * https://fluentbit.io/announcements/v4.0.1/ * https://fluentbit.io/announcements/v4.0.2/ * https://fluentbit.io/announcements/v4.0.3/ * https://docs.fluentbit.io/manual/installation/upgrade-notes/ (cherry picked from commit cd90fbd)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fluent-bit 3.2.7, 3.2.8 and 3.2.9 are segfaulting when used in combination with the systemd input. Lets
revert to 3.2.6 for now.
Upstream bug: fluent/fluent-bit#10139
Note that fluent-bit-3.2.7 fixes two high CVEs which we are now reintroducing. However they are only exploitable if you are using the OpenTelemetry input or the Prometheus Remote Write input.
OpenTelemetry input: CVE-2024-50609
Prometheus Remote Write input: CVE-2024-50608
The problem is as follows:
3.2.7 started vendoring a copy of
libzstdin tree and statically linking against it. Also, the fluent-bit binary exports the symbols of static libraries it links against.This is a problem because
libzstdgetsdlopen()ed bylibsystemdwhen enumerating the journal (as journal logs are zstd compressed). andlibzstdin Nixpkgs is built with-DZSTD_LEGACY_SUPPORT=0which causesstruct ZSTD_DCtxto be 16 bytes smaller than without this flag https://github.com/facebook/zstd/blob/dev/lib/decompress/zstd_decompress_internal.h#L183-L187libsystemdcallssym_ZSTD_createDCtx()which calls the function pointer returned bydlsym()which is calling into thelibzstdthat comes withnixpkgsand thus allocates a struct that is 16 bytes smaller.Later then
sym_ZSTD_freeDCtx()is called. However because fluent-bit haszstdin its global symbol table, any functions thatsym_ZSTD_freeDCtx()calls will be calls to the functions in the vendored fluent-bit version of the library which expects the larger struct. This then causes enough heap corruption to cause a segfault.E.g. the subsequent calls to
ZSTD_clearDict(dctx)andZSTD_customFree(dctx->inBuff)in https://github.com/facebook/zstd/blob/dev/lib/decompress/zstd_decompress.c#L324 will be working on a struct that is 16 bytes smaller than the one that was allocated bylibsystemdand will cause a segfault at some point and thus are probably modifying pieces of memory that they shouldn'tReverts 7310ab3
Reverts 4fbc6cf
Things done
nix.conf? (See Nix manual)sandbox = relaxedsandbox = truenix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)Add a 👍 reaction to pull requests you find important.