Skip to content

fluent-bit: 3.2.9 -> 3.2.6#395128

Merged
edef1c merged 1 commit intomasterfrom
revert-fluent-bit
Apr 2, 2025
Merged

fluent-bit: 3.2.9 -> 3.2.6#395128
edef1c merged 1 commit intomasterfrom
revert-fluent-bit

Conversation

@arianvp
Copy link
Member

@arianvp arianvp commented Apr 1, 2025

fluent-bit 3.2.7, 3.2.8 and 3.2.9 are segfaulting when used in combination with the systemd input. Lets
revert to 3.2.6 for now.

Upstream bug: fluent/fluent-bit#10139

Note that fluent-bit-3.2.7 fixes two high CVEs which we are now reintroducing. However they are only exploitable if you are using the OpenTelemetry input or the Prometheus Remote Write input.

OpenTelemetry input: CVE-2024-50609
Prometheus Remote Write input: CVE-2024-50608

The problem is as follows:

3.2.7 started vendoring a copy of libzstd in tree and statically linking against it. Also, the fluent-bit binary exports the symbols of static libraries it links against.

This is a problem because libzstd gets dlopen()ed by libsystemd when enumerating the journal (as journal logs are zstd compressed). and libzstd in Nixpkgs is built with -DZSTD_LEGACY_SUPPORT=0 which causes struct ZSTD_DCtx to be 16 bytes smaller than without this flag https://github.com/facebook/zstd/blob/dev/lib/decompress/zstd_decompress_internal.h#L183-L187

libsystemd calls sym_ZSTD_createDCtx() which calls the function pointer returned by dlsym() which is calling into the libzstd that comes with nixpkgs and thus allocates a struct that is 16 bytes smaller.

Later then sym_ZSTD_freeDCtx() is called. However because fluent-bit has zstd in its global symbol table, any functions that sym_ZSTD_freeDCtx() calls will be calls to the functions in the vendored fluent-bit version of the library which expects the larger struct. This then causes enough heap corruption to cause a segfault.

E.g. the subsequent calls to ZSTD_clearDict(dctx) and ZSTD_customFree(dctx->inBuff) in https://github.com/facebook/zstd/blob/dev/lib/decompress/zstd_decompress.c#L324 will be working on a struct that is 16 bytes smaller than the one that was allocated by libsystemd and will cause a segfault at some point and thus are probably modifying pieces of memory that they shouldn't

(gdb) bt
#0  0x00007f10e7e9916c in __pthread_kill_implementation () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
#1  0x00007f10e7e40e86 in raise () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
#2  0x00007f10e7e2893a in abort () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
#3  0x000000000046a938 in flb_signal_handler ()
#4  <signal handler called>
#5  0x00007f10e7ea42b7 in unlink_chunk.isra () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
#6  0x00007f10e7ea45cd in _int_free_create_chunk () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
#7  0x00007f10e7ea5a1c in _int_free_merge_chunk () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
#8  0x00007f10e7ea5dc9 in _int_free () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
#9  0x00007f10e7ea8613 in free () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
#10 0x00007f10e80ad3b5 in ZSTD_freeDCtx () from /nix/store/wy0slah6yvchgra8nhp6vgrqa6ay72cq-zstd-1.5.6/lib/libzstd.so.1
#11 0x00007f10e8c90f6b in decompress_blob_zstd () from /nix/store/b2cfj7yk3wfg1jdwjzim7306hvsc5gnl-systemd-257.3/lib/libsystemd.so.0
#12 0x00007f10e8bf0efe in journal_file_data_payload () from /nix/store/b2cfj7yk3wfg1jdwjzim7306hvsc5gnl-systemd-257.3/lib/libsystemd.so.0
#13 0x00007f10e8c00f74 in sd_journal_enumerate_data () from /nix/store/b2cfj7yk3wfg1jdwjzim7306hvsc5gnl-systemd-257.3/lib/libsystemd.so.0
#14 0x00000000004eae2f in in_systemd_collect ()
#15 0x00000000004eb5a0 in in_systemd_collect_archive ()
#16 0x000000000047aa18 in flb_input_collector_fd ()
#17 0x0000000000495223 in flb_engine_start ()
#18 0x000000000046f304 in flb_lib_worker ()
#19 0x00007f10e7e972e3 in start_thread () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
#20 0x00007f10e7f1b2fc in __clone3 () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6

Reverts 7310ab3
Reverts 4fbc6cf

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

fluent-bit 3.2.7, 3.2.8 and 3.2.9 are segfaulting when
used in combination with the systemd input. Lets
revert to 3.2.6 for now.

Upstream bug: fluent/fluent-bit#10139

Note that fluent-bit-3.2.7 fixes two high CVEs which we are now
reintroducing. However they are only exploitable if you are
using the OpenTelemetry input or the Prometheus Remote Write input.

OpenTelemetry input: [CVE-2024-50609](https://nvd.nist.gov/vuln/detail/CVE-2024-50609)
Prometheus Remote Write input: [CVE-2024-50608](https://nvd.nist.gov/vuln/detail/CVE-2024-50608)

The problem is as follows:

3.2.7 started vendoring a copy of `libzstd` in tree and statically
linking against it. Also, the fluent-bit binary exports the symbols
of static libraries it links against.

This is a problem because `libzstd` gets `dlopen()`ed by `libsystemd`
when enumerating the journal (as journal logs are zstd compressed). and `libzstd` in Nixpkgs is built
with `-DZSTD_LEGACY_SUPPORT=0` which causes `struct ZSTD_DCtx` to be 16
bytes smaller than without this flag https://github.com/facebook/zstd/blob/dev/lib/decompress/zstd_decompress_internal.h#L183-L187

`libsystemd` calls [`sym_ZSTD_createDCtx()`](https://github.com/systemd/systemd/blob/1e79a2923364b65fc9f347884dd5b9b2087f6e32/src/basic/compress.c#L480)
which calls the function pointer returned by `dlsym()` which is calling into
the `libzstd` that comes with `nixpkgs` and thus allocates a struct that is 16 bytes smaller.

Later then `sym_ZSTD_freeDCtx()` is called. However because fluent-bit
has `zstd` in its global symbol table, any functions that `sym_ZSTD_freeDCtx()`
calls will be calls to the functions in the vendored fluent-bit version of the library
which expects the larger struct. This then causes enough heap corruption to cause
a segfault.

E.g. the subsequent calls to `ZSTD_clearDict(dctx)` and `ZSTD_customFree(dctx->inBuff)`
in https://github.com/facebook/zstd/blob/dev/lib/decompress/zstd_decompress.c#L324
will be working on a struct that is 16 bytes smaller than the one that was allocated
by `libsystemd` and will cause a segfault at some point and thus are probably modifying
pieces of memory that they shouldn't

	(gdb) bt
	#0  0x00007f10e7e9916c in __pthread_kill_implementation () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
	#1  0x00007f10e7e40e86 in raise () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
	#2  0x00007f10e7e2893a in abort () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
	#3  0x000000000046a938 in flb_signal_handler ()
	#4  <signal handler called>
	#5  0x00007f10e7ea42b7 in unlink_chunk.isra () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
	#6  0x00007f10e7ea45cd in _int_free_create_chunk () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
	#7  0x00007f10e7ea5a1c in _int_free_merge_chunk () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
	#8  0x00007f10e7ea5dc9 in _int_free () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
	#9  0x00007f10e7ea8613 in free () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
	#10 0x00007f10e80ad3b5 in ZSTD_freeDCtx () from /nix/store/wy0slah6yvchgra8nhp6vgrqa6ay72cq-zstd-1.5.6/lib/libzstd.so.1
	#11 0x00007f10e8c90f6b in decompress_blob_zstd () from /nix/store/b2cfj7yk3wfg1jdwjzim7306hvsc5gnl-systemd-257.3/lib/libsystemd.so.0
	#12 0x00007f10e8bf0efe in journal_file_data_payload () from /nix/store/b2cfj7yk3wfg1jdwjzim7306hvsc5gnl-systemd-257.3/lib/libsystemd.so.0
	#13 0x00007f10e8c00f74 in sd_journal_enumerate_data () from /nix/store/b2cfj7yk3wfg1jdwjzim7306hvsc5gnl-systemd-257.3/lib/libsystemd.so.0
	#14 0x00000000004eae2f in in_systemd_collect ()
	#15 0x00000000004eb5a0 in in_systemd_collect_archive ()
	#16 0x000000000047aa18 in flb_input_collector_fd ()
	#17 0x0000000000495223 in flb_engine_start ()
	#18 0x000000000046f304 in flb_lib_worker ()
	#19 0x00007f10e7e972e3 in start_thread () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6
	#20 0x00007f10e7f1b2fc in __clone3 () from /nix/store/rmy663w9p7xb202rcln4jjzmvivznmz8-glibc-2.40-66/lib/libc.so.6

Reverts 7310ab3
Reverts 4fbc6cf
@arianvp
Copy link
Member Author

arianvp commented Apr 1, 2025

An alternative solution is to try out the suggestion in:

fluent/fluent-bit#10139 (comment)

@arianvp arianvp requested review from edef1c and fpletz April 1, 2025 09:39
@github-actions github-actions bot added 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 1 This PR causes 1 package to rebuild on Linux. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. labels Apr 1, 2025
@arianvp arianvp added this to the 25.05 milestone Apr 1, 2025
@arianvp arianvp marked this pull request as ready for review April 1, 2025 15:30
@arianvp arianvp requested review from 9999years and lf- April 2, 2025 11:46
@arianvp arianvp requested a review from flokli April 2, 2025 14:23
@arianvp
Copy link
Member Author

arianvp commented Apr 2, 2025

I will merge this as is and then collaborate with upstream for a proper fix. We can then bump to the fixed version at a later date.

Copy link
Member

@edef1c edef1c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edef1c edef1c merged commit b09b796 into master Apr 2, 2025
38 of 39 checks passed
@edef1c edef1c deleted the revert-fluent-bit branch April 2, 2025 14:25
arianvp added a commit to arianvp/nixpkgs that referenced this pull request Apr 4, 2025
@arianvp arianvp mentioned this pull request May 22, 2025
13 tasks
arianvp added a commit to arianvp/nixpkgs that referenced this pull request May 27, 2025
fluent-bit now dynamically links against libzstd, sqlite and msgpack

This means that we will not run into the issue that caused us
to roll back from 3.2.9 to 3.2.6 anymore (NixOS#395128)
as there shouldn't be two incompatible versions of libzstd loaded at the same time.

Fixes fluent/fluent-bit#10139

Is this eligible for back-porting even-though it's a major version bump?  In my
opinion: yes.  We can't keep maintaining 3.x as all the builds after 3.2.6 have
the same issue so we are missing out on critical vulnerability fixes. In the
meantine Non of the following links mention any backwards compatibilities with
3.2.6:

* https://fluentbit.io/announcements/v4.0.0/
* https://fluentbit.io/announcements/v4.0.1/
* https://fluentbit.io/announcements/v4.0.2/
* https://docs.fluentbit.io/manual/installation/upgrade-notes/
@arianvp arianvp mentioned this pull request Jun 2, 2025
13 tasks
arianvp added a commit to arianvp/nixpkgs that referenced this pull request Jun 2, 2025
fluent-bit now dynamically links against libzstd, sqlite and msgpack

This means that we will not run into the issue that caused us
to roll back from 3.2.9 to 3.2.6 anymore (NixOS#395128)
as there shouldn't be two incompatible versions of libzstd loaded at the same time.

Fixes fluent/fluent-bit#10139

Is this eligible for back-porting even-though it's a major version bump?  In my
opinion: yes.  We can't keep maintaining 3.x as all the builds after 3.2.6 have
the same issue so we are missing out on critical vulnerability fixes. In the
meantine Non of the following links mention any backwards compatibilities with
3.2.6:

* https://fluentbit.io/announcements/v4.0.0/
* https://fluentbit.io/announcements/v4.0.1/
* https://fluentbit.io/announcements/v4.0.2/
* https://fluentbit.io/announcements/v4.0.3/
* https://docs.fluentbit.io/manual/installation/upgrade-notes/
nixpkgs-ci bot pushed a commit that referenced this pull request Jun 3, 2025
fluent-bit now dynamically links against libzstd, sqlite and msgpack

This means that we will not run into the issue that caused us
to roll back from 3.2.9 to 3.2.6 anymore (#395128)
as there shouldn't be two incompatible versions of libzstd loaded at the same time.

Fixes fluent/fluent-bit#10139

Is this eligible for back-porting even-though it's a major version bump?  In my
opinion: yes.  We can't keep maintaining 3.x as all the builds after 3.2.6 have
the same issue so we are missing out on critical vulnerability fixes. In the
meantine Non of the following links mention any backwards compatibilities with
3.2.6:

* https://fluentbit.io/announcements/v4.0.0/
* https://fluentbit.io/announcements/v4.0.1/
* https://fluentbit.io/announcements/v4.0.2/
* https://fluentbit.io/announcements/v4.0.3/
* https://docs.fluentbit.io/manual/installation/upgrade-notes/

(cherry picked from commit cd90fbd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-linux: 1 This PR causes 1 package to rebuild on Linux.

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants