Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiled-in page size makes jemalloc binaries unportable #467

Closed
cuviper opened this issue Oct 6, 2016 · 19 comments
Closed

Compiled-in page size makes jemalloc binaries unportable #467

cuviper opened this issue Oct 6, 2016 · 19 comments
Assignees
Milestone

Comments

@cuviper
Copy link

cuviper commented Oct 6, 2016

I see that configure uses sysconf(_SC_PAGESIZE) at compile time to determine the page size. However, this can vary depending on kernel configuration, so a jemalloc compiled on one machine may have issues running on some other machine. For instance, Debian aarch64 uses 4k pages, but Fedora uses 64k, so a Debian jemalloc effectively can't run correctly on Fedora.

The concrete issue I encountered was in trying to bootstrap Rust aarch64 on Fedora. Because Rust is self-hosting, I have to use upstream binaries to get started. I was getting strange crashes, and when I tried strace I saw this:

2 madvise(0x3ff7a398000, 8192, MADV_DONTNEED) = -1 EINVAL (Invalid argument)
2 madvise(0x3ff7a3cd000, 8192, MADV_DONTNEED) = -1 EINVAL (Invalid argument)
2 madvise(0x3ff7a465000, 8192, MADV_DONTNEED) = -1 EINVAL (Invalid argument)
2 madvise(0x3ff7970b000, 65536, MADV_DONTNEED) = -1 EINVAL (Invalid argument)
2 madvise(0x3ff7a6d6000, 8192, MADV_DONTNEED) = -1 EINVAL (Invalid argument)
2 madvise(0x3ff7a7aa000, 8192, MADV_DONTNEED) = -1 EINVAL (Invalid argument)
2 madvise(0x3ff7a7e0000, 8192, MADV_DONTNEED) = 0
2 madvise(0x3ff7a927000, 8192, MADV_DONTNEED) = -1 EINVAL (Invalid argument)
2 madvise(0x3ff7aab0000, 8192, MADV_DONTNEED) = 0
2 madvise(0x3ff79659000, 86016, MADV_DONTNEED) = -1 EINVAL (Invalid argument)
2 madvise(0x3ff7aacf000, 8192, MADV_DONTNEED) = -1 EINVAL (Invalid argument)
2 madvise(0x3ff778ae000, 28672, MADV_DONTNEED) = -1 EINVAL (Invalid argument)
2 madvise(0x3ff7ad07000, 8192, MADV_DONTNEED) = -1 EINVAL (Invalid argument)
2 madvise(0x3ff7aeb7000, 8192, MADV_DONTNEED) = -1 EINVAL (Invalid argument)
2 madvise(0x3ff7b0c2000, 8192, MADV_DONTNEED) = -1 EINVAL (Invalid argument)
2 madvise(0x3ff779ba000, 200704, MADV_DONTNEED) = -1 EINVAL (Invalid argument)
2 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x8} ---

So it seems clearly expecting to have 4k pages. Most of the madvise calls were not on the real 64k page boundaries, so they get EINVAL. A few of them happen to align, so they succeed -- but the requested length was less than 64k and the kernel rounds up! Thus it will dump more memory than intended, and I suspect the SEGV addr=0x8 is a deref of a nulled-pointer from the cleared page.

I have managed to get around this with: export MALLOC_CONF=lg_dirty_mult:-1. But I think it would be better if jemalloc read the page size at runtime, and FWIW Firefox's forked "mozjemalloc" does exactly that.

@jasone
Copy link
Member

jasone commented Oct 6, 2016

You can use e.g. --with-lg-page=16 to cross-compile to a target that uses 64 KiB pages. jemalloc used to check page size during bootstrapping, but this had two shortcomings. First, it was difficult to avoid recursive allocation on all platforms, and second, making the page size dynamic prevented some compiler optimizations. IIRC Solaris is the only platform I could find on which it was conceivable for a single system to use multiple page sizes (thus warranting a run-time check), but even there it was not a practical concern.

@cuviper
Copy link
Author

cuviper commented Oct 6, 2016

I think in this case, Rust would like to have binaries that run everywhere, not subject to the configuration whims of different distros. So --with-lg-page=16 just makes a different non-portable binary, rather than something universal.

Is it feasible to run jemalloc compiled with the 64k page size even on systems that are only 4k? I guess these madvise calls would be fine, but general mmap will probably be surprising if it returns addresses aligned less than expected. And Mozilla bug 1091515 showed real problems with munmap (on mozjemalloc).

I'm not surprised that dynamic page size reduces compiler optimizations. Perhaps some of that can be mitigated with different expressions, e.g. instead of page mask, use a shr+shl combo to strip bits. But I'm sure that's a lot of code to deal with.

FWIW a single system could have different page sizes with different kernels -- e.g. a kernel developer who normally runs the distro kernel, but sometimes reboots into their own build. Obviously they would not want to swap out all userspace each time. But maybe mixed pages are impractical for other reasons. There may be other programs with static page sizes too, and that's normally just fine in a controlled distro.

@jasone
Copy link
Member

jasone commented Oct 6, 2016

Earlier this year I tried making it possible to specify a larger page size than the true system page size (245ae60), but I had to revert it (05a9e4a) because it could cause VM map fragmentation.

@cuviper
Copy link
Author

cuviper commented Oct 6, 2016

What about --with-lg-page-sizes? That sounds like it's supposed to support a hybrid configuration.

@jasone
Copy link
Member

jasone commented Oct 6, 2016

--with-lg-page-sizes is only useful when generating headers that will be imported into another build system that doesn't run configure again, e.g. in FreeBSD's libc, which in turn can be built for a variety of architectures.

alexcrichton added a commit to alexcrichton/rust that referenced this issue Oct 27, 2016
Sounds like jemalloc is broken on systems which differ in page size than the
host it was compiled on (unless an option was passed). This unfortunately
reduces the portability of binaries created and can often make Rust segfault by
default. For now let's patch over this by disabling jemalloc until we can figure
out a better solution.

Closes rust-lang#36994
Closes rust-lang#37320
cc jemalloc/jemalloc#467
bors added a commit to rust-lang/rust that referenced this issue Oct 29, 2016
Disable jemalloc on aarch64/powerpc

Sounds like jemalloc is broken on systems which differ in page size than the
host it was compiled on (unless an option was passed). This unfortunately
reduces the portability of binaries created and can often make Rust segfault by
default. For now let's patch over this by disabling jemalloc until we can figure
out a better solution.

Closes #36994
Closes #37320
cc jemalloc/jemalloc#467
bors added a commit to rust-lang/rust that referenced this issue Oct 30, 2016
Disable jemalloc on aarch64/powerpc

Sounds like jemalloc is broken on systems which differ in page size than the
host it was compiled on (unless an option was passed). This unfortunately
reduces the portability of binaries created and can often make Rust segfault by
default. For now let's patch over this by disabling jemalloc until we can figure
out a better solution.

Closes #36994
Closes #37320
cc jemalloc/jemalloc#467
@jasone jasone closed this as completed Nov 2, 2016
@glandium
Copy link
Contributor

More than a cross-distro problem, it can cause problems in a single distro, where the user switches the page size it their kernel.

Now, something that /could/ work, is if one configured --with-lg-pages for the biggest you can expect a kernel to use. The problem is that it then brakes chunk_alloc_mmap_slow when the runtime page size is actually smaller, and you end up with chunks actually smaller than the chunk size. Fun ensues.

@joshlf
Copy link

joshlf commented Apr 15, 2017

How expensive would it be (in code bloat, etc) to have multiple copies of the code compiled for different constants (to still get the benefit of compiler optimizations which are aware of the page size)? So you basically compile three different versions of the code - the 4K page version, the 64K page version, and, if you're being really careful, the X page version (where X must be determined at runtime). Then, at init time when a process starts, you decide which one you're going to use.

@jasone jasone self-assigned this Apr 16, 2017
@jasone jasone added this to the 5.0.0 milestone Apr 16, 2017
@jasone jasone reopened this Apr 16, 2017
jasone added a commit to jasone/jemalloc that referenced this issue Apr 16, 2017
All mappings continue to be PAGE-aligned, even if the system page size
is smaller.  This change is primarily intended to provide a mechanism
for supporting multiple page sizes with the same binary; smaller page
sizes work better in conjunction with jemalloc's design.

This resolves jemalloc#467.
jasone added a commit to jasone/jemalloc that referenced this issue Apr 16, 2017
All mappings continue to be PAGE-aligned, even if the system page size
is smaller.  This change is primarily intended to provide a mechanism
for supporting multiple page sizes with the same binary; smaller page
sizes work better in conjunction with jemalloc's design.

This resolves jemalloc#467.
@jasone
Copy link
Member

jasone commented Apr 17, 2017

Page size is baked into various data structures at compile time, so we would have to make substantial changes to directly support multiple page sizes at run time. However, it is possible to support a system page size smaller than the page size that is baked in. #769 implements such support.

@jasone jasone closed this as completed in da4cff0 Apr 19, 2017
@cuviper
Copy link
Author

cuviper commented Apr 19, 2017

Nice!

@devnoname120
Copy link

it is actively choosing to sabotage the ARM64 Linux ecosystem

@marcan IMO this comment is abusive. As much as I like your work (I even sponsor you), I don't think it's OK to insinuate malevolence.

@dzaima
Copy link

dzaima commented Aug 24, 2024

As far as I understand, on ARM, assuming the compile-time page size is equivalent to the runtime one is exactly as non-portable as doing -march=native.

That is to say, in any scenario where a non-64K page size is hard-coded, architecturally speaking, -march=native is completely safe to add. So here's a free potential perf boost to add in the default jemalloc aarch64 linux configuration that does not add any additional portability concerns - compile with -march=native :) (might be slightly less trivial if you wanted nice error messages on mismatched compile-time vs runtime arch; and on aarch64 the difference between native and baseline isn't that large; but this is for making a point rather than being a serious suggestion ¯\_(ツ)_/¯)

I might not quite call this active sabotage, but it's a pretty simple fact that essentially doing -march=native on a single architecture while not having equivalent behavior on others is a rather unusual stance for the default configuration.

@Kamayuq
Copy link

Kamayuq commented Nov 11, 2024

I can understand the compiler optimization. But the pagesize is always a pow2 so you just do the masking for div/mod manually as the compiler would. (Non pow2 numbers need an additional mad AFAIK and for more info on this see libdivide) I don’t think it’s malicious just a bit too much focus on performance got in the way of practicality on ARM.

@hramrach
Copy link

Linux has now boot time dynamic page size switch.

That makes page size a runtime selection, not something that is fixed, even for one OS distribution.

ArthurSonzogni pushed a commit to ArthurSonzogni/gn that referenced this issue Mar 5, 2025
jemalloc's default configuration is broken on arm64 and causes
runtime failures on machines with non-4k page sizes. See
jemalloc/jemalloc#467

Change-Id: Ia243d6e43fcc9eaad893e51d6fa90febd7c0f344
Reviewed-on: https://gn-review.googlesource.com/c/gn/+/18300
Commit-Queue: Dirk Pranke <[email protected]>
Commit-Queue: Takuto Ikuta <[email protected]>
Reviewed-by: Takuto Ikuta <[email protected]>
Reviewed-by: Dirk Pranke <[email protected]>
raboof added a commit to raboof/nixpkgs that referenced this issue Apr 2, 2025
raboof added a commit to raboof/nixpkgs that referenced this issue Apr 2, 2025
raboof added a commit to raboof/nixpkgs that referenced this issue Apr 2, 2025
raboof added a commit to raboof/nixpkgs that referenced this issue Apr 2, 2025
raboof added a commit to raboof/nixpkgs that referenced this issue Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests