-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
I have seen multiple cases of "schema is corrupt" error messages in a production environment. This tends to happen on NixOS systems that have unexpected power cuts.
$ nix store verify --all
error: '/mnt/nix/var/nix/db/schema' is corrupt
In this case, it's an ext4 file system and the schema file is empty.
Steps To Reproduce
I have a minimal test case that simulates a power cut with NixOS tests and reproduces the problem here: https://github.com/squalus/nix-durability-tests. It can be run on several different file system.
nix -L build github:squalus/nix-durability-tests#corrupt-schema-tests.xfs
This will hopefully print a "schema is corrupt" error message.
Expected behavior
The schema file should never be invalid, even if there's an unexpected power cut.
nix-env --version output
nix-env (Nix) 2.8.1
Additional context
Some possible causes:
- Errors from
close(2)are ignored innix::writeFile. (Fromman close: Failing to check the return value when closing a file may lead to silent loss of data.) fsync(2)is not run on the file after writing the contents. This means the data may not be fully flushed to disk.fsync(2)is not run on the parent directory after closing the file. This means the directory may have outdated contents. (This wouldn't cause an empty file, but it could cause a mismatch. I haven't yet observed this problem.)- The file is not written atomically. It could instead be written with a temporary file and a call to
rename(2), like in https://github.com/google/renameio.
Point 2 was addressed in this PR, but it was never merged: #1956
More background: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/