Skip to content

Investigate memory-related unstable latencies in performance tests #11121

@akuzm

Description

@akuzm

A list for me not to forget.

I've seen two main sources of unpredictable memory-related latency:

  • Some pages we receive from the allocator may be newly allocated from the OS and still be copy-on-write references to zero page. On write to such pages, page fault occurs, and a new writable page is allocated and filled with zeros.
  • Sometimes jemalloc decides that it's time to purge the arena and peforms MADV_DONTNEED on some pages.

What can we do about this:

  • Increase (or maybe decrease instead?) the decay time for arenas, governed by the arenas.dirty_decay_ms and arenas.muzzy_decay_ms parameters. Read the docs carefully, because it's not obvious what they influence. Tried to do this once, but ran into some Weirdness with our cmake config that prevented me from using mallctl (USE_JEMALLOC not defined for Server.cpp). A sample snippet for Server::initialize:
+
+#if USE_JEMALLOC
+    const int64_t decay_time_ms = 10 * 60 * 1000;
+    logger().information("Will set jemalloc decay time to %ls ms\n", decay_time_ms);
+    logger().information("test");
+
+    int err = mallctl("arenas.dirty_decay_ms", nullptr, nullptr, &decay_time_ms, sizeof(decay_time_ms));
+    if (err != 0)
+        logger().error("Failed to set 'arenas.dirty_decay_ms' with code %ld", err);
+
+    err = mallctl("arenas.muzzy_decay_ms", nullptr, nullptr, &decay_time_ms, sizeof(decay_time_ms));
+    if (err != 0)
+        logger().error("Failed to set 'arenas.muzzy_decay_ms' with code %ld", err);
+#endif
+
  • Decay the arenas explicitly using arenas.<i>.decay with i = MALLCTL_ARENAS_ALL. This could be a hidden system query e.g. system reset memory, and we could call it before we start test runs of each query from performance tests.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions