Skip to content

Asynchronous sandbox cleanup leads to inode exhaustion with defaulted --experimental_sandbox_async_tree_delete_idle_threads=4 #20965

@jeremiahar

Description

@jeremiahar

Description of the bug:

It seems that Bazel does not clean up the sandbox directories fast enough, which eventually leads to running out of inodes when performing a build.

Which category does this issue belong to?

Local Execution

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I'm not sure what the simplest way to reproduce this bug is. It might be because I am running a 96 core machine with

build --jobs=92 --local_cpu_resources=92 --sandbox_base=/dev/shm

and lots of symlinks from c++ compilation with tmpfs. The documentation hints that deletion threads are limited to 1 during the build which is probably the problem.

Which operating system are you running Bazel on?

Linux 6.7.0-arch3-1 #1 SMP PREEMPT_DYNAMIC Sat, 13 Jan 2024 14:37:14 +0000 x86_64 GNU/Linux

What is the output of bazel info release?

release 7.0.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

Probably d169556 which changed the default of experimental_sandbox_async_tree_delete_idle_threads to 4. I did not actually run the bisection, but I will note that 7.0.0-pre.20231011.2 did not have this problem and 7.0.1 does have the issue.

Have you found anything relevant by searching the web?

#19266 (comment)

Any other information, logs, or outputs that you want to share?

It fails with an error message such as:

ERROR: /home/jeremiah/projects/REDACTED/BUILD:3:18: Compiling REDACTED.cpp failed: I/O exception during sandboxed execution: /dev/shm/bazel-sandbox.215dc078d692e667b99e0907ffd4a9aa9a5f6df28dfed5a7e2f2dcdc3fa54f19/linux-sandbox/8034/execroot/_main/external/boost/boost/metaparse/v1/error (No space left on device)

Inodes after failure:

Filesystem Inodes IUsed IFree IUse% Mounted on
tmpfs 95M 95M 1.7K 100% /dev/shm

Inodes after manually removing sandbox directory:

Filesystem Inodes IUsed IFree IUse% Mounted on
tmpfs 95M 543 95M 1% /dev/shm

I am using --sandbox_base=/dev/shm which is mounted as tmpfs.

Filesystem Size Used Avail Use% Mounted on
tmpfs 378G 746M 377G 1% /dev/shm

After the build fails, the sandbox directory is slowly cleaned up, unless bazel is shutdown, which is what originally alerted me to the cause.

It seems that this was probably caused by d169556 which enabled asynchronous sandbox cleanup by changing the default of experimental_sandbox_async_tree_delete_idle_threads to 4.

I confirmed that adding experimental_sandbox_async_tree_delete_idle_threads=0 fixes the issue for me.

Metadata

Metadata

Assignees

Labels

P1I'll work on this now. (Assignee required)team-Local-ExecIssues and PRs for the Execution (Local) teamtype: bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions