Skip to content

Random multi-second slowdowns opening /nix/var/nix/db/db.sqlite on ZFS #13515

@edolstra

Description

@edolstra

Describe the bug

On ZFS on Linux (kernel 6.1.141), opening the Nix database is usually instantaneous but sometimes has a several second latency, which randomly slows down CLI invocations.

Example

$ while true; do command time nix store info --store /tmp/nix 2>&1 | grep elapsed; done
0.02user 0.00system 0:00.03elapsed 100%CPU (0avgtext+0avgdata 18624maxresident)k
0.02user 0.00system 0:00.03elapsed 100%CPU (0avgtext+0avgdata 18624maxresident)k
0.02user 0.00system 0:00.03elapsed 96%CPU (0avgtext+0avgdata 18624maxresident)k
...
0.01user 0.00system 0:06.16elapsed 0%CPU (0avgtext+0avgdata 18620maxresident)k
...
0.01user 0.00system 0:03.54elapsed 0%CPU (0avgtext+0avgdata 18620maxresident)k
...
0.01user 0.00system 0:05.14elapsed 0%CPU (0avgtext+0avgdata 18620maxresident)k

This happens in SQLite while it truncates /nix/var/nix/db/db.sqlite-shm:

#0  0x00007ffff6f063db in ftruncate64 () from /nix/store/g3s0z9r7m1lsfxdk8bj88nw8k8q3dmmg-glibc-2.40-66/lib/libc.so.6
#1  0x00007ffff6447a67 in unixLockSharedMemory () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#2  0x00007ffff647eab3 in unixShmMap () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#3  0x00007ffff643a5d6 in walIndexPageRealloc () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#4  0x00007ffff6477d5f in walIndexReadHdr () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#5  0x00007ffff647883b in walTryBeginRead () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#6  0x00007ffff64a4570 in sqlite3PagerSharedLock () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#7  0x00007ffff64a51f8 in btreeBeginTrans () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#8  0x00007ffff64e1314 in sqlite3InitOne () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#9  0x00007ffff64f88cc in sqlite3Init () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#10 0x00007ffff64f891f in sqlite3ReadSchema () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#11 0x00007ffff6526b99 in sqlite3Pragma () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#12 0x00007ffff64dd00e in yy_reduce.isra () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#13 0x00007ffff64de517 in sqlite3RunParser () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#14 0x00007ffff64defe5 in sqlite3Prepare () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#15 0x00007ffff64df4a3 in sqlite3LockAndPrepare () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#16 0x00007ffff64df856 in sqlite3_prepare_v2 () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#17 0x00007ffff64ed420 in sqlite3_exec () from /nix/store/nff52xidq9f2qf93x5r5sr8zc8iwwa5j-sqlite-3.48.0/lib/libsqlite3.so.0
#18 0x00007ffff7d91303 in operator() (__closure=<optimized out>) at /nix/store/24sdvjs6rfqs69d21gdn437mb3vc0svh-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/basic_string.h:227
#19 nix::retrySQLite<void, nix::SQLite::exec(const std::string&)::<lambda()> > (fun=...) at ../src/libstore/include/nix/store/sqlite.hh:177
#20 nix::SQLite::exec (this=0x49ee18, stmt=...) at ../src/libstore/sqlite.cc:103
#21 0x00007ffff7d19765 in nix::LocalStore::openDB (this=this@entry=0x49edd0, state=..., create=create@entry=false) at ../src/libstore/local-store.cc:494
#22 0x00007ffff7d1f502 in nix::LocalStore::LocalStore (this=<optimized out>, config=..., this=<optimized out>, config=...) at ../src/libstore/local-store.cc:316

This appears to be a known ZFS issue: openzfs/zfs#14290

Workaround:

echo 0 > /sys/module/zfs/parameters/zfs_txg_timeout

Probably not much we can do about this in Nix (except maybe open the DB in the background?), but I thought I'd make an issue about it for the record.

Steps To Reproduce

Expected behavior

Metadata

Additional context

Checklist


Add 👍 to issues you find important.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions