Skip to content

perf(binding): enable mimalloc v3 to reduce idle memory#9349

Merged
shulaoda merged 2 commits into
mainfrom
05-11-perf_binding_enable_mimalloc_v3_to_release_dev-server_memory_to_the_os
May 13, 2026
Merged

perf(binding): enable mimalloc v3 to reduce idle memory#9349
shulaoda merged 2 commits into
mainfrom
05-11-perf_binding_enable_mimalloc_v3_to_release_dev-server_memory_to_the_os

Conversation

@shulaoda

@shulaoda shulaoda commented May 11, 2026

Copy link
Copy Markdown
Member

Summary

  • Upgrades mimalloc-safe from 0.1.52 to 0.1.59 at the workspace level.
  • Enables the v3 feature on every mimalloc-safe dependency block inside crates/rolldown_binding/Cargo.toml so the binding ships against mimalloc v3.
  • No Rust source changes — the feature is purely a libmimalloc-sys2/build.rs switch between c_src/mimalloc/ (v2) and c_src/mimalloc3/ (v3).

Why

Per the investigation on #9330, the steady-state memory of vite dev on a non-trivial workload (lobe-chat, lobehub-ui) is dominated by mimalloc retaining pages it has already freed. v2 on macOS effectively never returns them: a single live object pins a 64 MB segment, and the purge-delay path is conservative. v3 reworks segments into smaller sub-pages and simplifies the purge timer, so empty regions actually get madvise(MADV_FREE_REUSABLE)-released on the order of seconds.

Measured impact

Process model

  • vite 7 + esbuild runs as two processes: the Vite Node process and an esbuild Go child process (used for dep prebundling). Both must be accounted for. measure.mjs recursively walks every descendant of the Vite server and sums Physical footprint / RSS at each sample, so both processes are included in every row below.
  • vite 8 + [email protected] and vite 8 + local + v3 run as a single Node process — rolldown is an in-process Rust napi addon, so no child process is spawned.

Σ per-proc peak caveat
The Σ per-proc peak row sums each process's lifetime Physical footprint (peak) field reported by vmmap. It is a mathematical upper bound, not a real instantaneous peak — different processes may reach their per-process peaks at different moments. This especially inflates the vite 7 + esbuild column, because esbuild typically peaks during dep prebundling while the Vite Node process peaks later during request handling; the two peaks never co-occur but get added together. Single-process columns ([email protected], local + v3) are not affected — for them Σ per-proc peak equals the real instantaneous peak.

cijiugechu/rolldown-9330-repro

Metric vite 7 + esbuild vite 8 + [email protected] vite 8 + local + v3
Σ per-proc peak (upper bound) ~2.058 G ~2.100 G ~1.800 G
Physical footprint (after browser close) ~1.971 G ~2.000 G ~0.971 G
Physical footprint (90s idle) ~1.876 G ~1.800 G ~0.780 G
Physical footprint (180s idle) ~1.876 G ~1.800 G ~0.780 G
Physical footprint (270s idle) ~0.281 G ~1.800 G ~0.780 G
Physical footprint (360s idle) ~0.281 G ~1.800 G ~0.780 G
RSS (after browser close) ~2.018 G ~2.182 G ~2.040 G
RSS (90s idle) ~1.932 G ~2.005 G ~1.857 G
RSS (180s idle) ~1.932 G ~2.003 G ~1.857 G
RSS (270s idle) ~1.932 G ~2.003 G ~1.857 G
RSS (360s idle) ~1.932 G ~2.003 G ~1.857 G

lobehub/lobe-chat

Metric vite 7 + esbuild vite 8 + [email protected] vite 8 + local + v3
Σ per-proc peak (upper bound) ~6.900 G ~7.200 G ~7.800 G
Physical footprint (after browser close) ~6.800 G ~7.000 G ~3.700 G
Physical footprint (90s idle) ~5.887 G ~5.700 G ~2.400 G
Physical footprint (180s idle) ~5.860 G ~5.700 G ~2.300 G
Physical footprint (270s idle) ~0.882 G ~5.700 G ~2.300 G
Physical footprint (360s idle) ~0.882 G ~5.700 G ~2.300 G
RSS (after browser close) ~7.057 G ~8.642 G ~8.385 G
RSS (90s idle) ~6.187 G ~7.505 G ~7.311 G
RSS (180s idle) ~6.156 G ~7.505 G ~7.241 G
RSS (270s idle) ~6.156 G ~7.462 G ~7.241 G
RSS (360s idle) ~6.156 G ~7.462 G ~7.241 G

lobehub/lobehub

Metric vite 7 + esbuild vite 8 + [email protected] vite 8 + local + v3
Σ per-proc peak (upper bound) ~7.100 G ~7.500 G ~7.700 G
Physical footprint (after browser close) ~7.100 G ~7.300 G ~3.300 G
Physical footprint (90s idle) ~3.236 G ~6.000 G ~2.200 G
Physical footprint (180s idle) ~3.236 G ~6.000 G ~2.200 G
Physical footprint (270s idle) ~0.884 G ~6.000 G ~2.200 G
Physical footprint (360s idle) ~0.884 G ~6.000 G ~2.200 G
RSS (after browser close) ~7.252 G ~8.767 G ~8.183 G
RSS (90s idle) ~6.367 G ~7.548 G ~7.220 G
RSS (180s idle) ~6.367 G ~7.548 G ~7.220 G
RSS (270s idle) ~6.313 G ~7.505 G ~7.178 G
RSS (360s idle) ~6.313 G ~7.505 G ~7.178 G

Caveats

  • Windows is not covered by this PR. libmimalloc-sys2/build.rs has a separate build_mimalloc_win() path that hard-codes ./c_src/mimalloc/ (v2) regardless of the v3 feature. The Cargo manifest still requests v3 so it activates automatically once upstream fixes the Windows branch; until then Windows users continue on v2.
scripts/measure.mjs
import { chromium } from '@playwright/test';
import { execFile, spawn } from 'node:child_process';
import { existsSync } from 'node:fs';
import { promisify } from 'node:util';

const execFileAsync = promisify(execFile);
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));

const SAMPLE_INTERVAL_MS = 250;
const IDLE_AFTER_CLOSE_MS = 90_000;
const IDLE_SAMPLES = 4;

const server = spawn(
  'pnpm',
  ['exec', 'vite', '--port', '9876', '--host', '127.0.0.1', '--force'],
  { stdio: ['ignore', 'pipe', 'pipe'] },
);

const waitReady = new Promise((resolve, reject) => {
  const timeout = setTimeout(() => reject(new Error('vite ready timeout')), 90_000);
  server.stdout.on('data', (data) => {
    const text = data.toString();
    process.stdout.write(text);
    if (text.includes('ready in')) {
      clearTimeout(timeout);
      resolve();
    }
  });
  server.stderr.on('data', (data) => process.stderr.write(data));
  server.on('exit', (code) => reject(new Error(`vite exited before ready: ${code}`)));
});

const getVitePid = async () => {
  const { stdout } = await execFileAsync('lsof', ['-tiTCP:9876', '-sTCP:LISTEN']);
  const pid = Number(stdout.trim().split('\n')[0]);
  if (!pid) throw new Error('cannot find vite pid listening on 9876');
  return pid;
};

const allDescendants = async (rootPid) => {
  const result = [rootPid];
  const stack = [rootPid];
  while (stack.length) {
    const p = stack.pop();
    try {
      const { stdout } = await execFileAsync('pgrep', ['-P', String(p)]);
      const children = stdout.trim().split('\n').filter(Boolean).map(Number);
      for (const c of children) {
        result.push(c);
        stack.push(c);
      }
    } catch { /* no children */ }
  }
  return result;
};

const parseSizeMB = (text) => {
  const match = text.match(/([\d.]+)\s*([KMG])?/);
  if (!match) return 0;
  const num = parseFloat(match[1]);
  const unit = match[2];
  if (unit === 'G') return num * 1024;
  if (unit === 'K') return num / 1024;
  if (unit === 'M') return num;
  return num / (1024 * 1024);
};

const procInfo = async (pid) => {
  try {
    const [vm, psOut] = await Promise.all([
      execFileAsync('vmmap', ['-summary', String(pid)]),
      execFileAsync('ps', ['-o', 'ppid=,rss=,command=', '-p', String(pid)]),
    ]);
    const [, ppidStr, rssStr, command] = psOut.stdout.trim().match(/^\s*(\d+)\s+(\d+)\s+(.*)$/);
    const ppid = Number(ppidStr);
    const rss = Number(rssStr);
    const lines = vm.stdout.split('\n');
    const fpLine = lines.find((l) => l.trim().startsWith('Physical footprint:')) || '';
    const peakLine = lines.find((l) => l.trim().startsWith('Physical footprint (peak):')) || '';
    return {
      pid,
      ppid,
      alive: true,
      cmd: (command.split(/\s+/)[0] || '').split('/').pop(),
      command,
      rssMB: rss / 1024,
      footprintMB: parseSizeMB(fpLine.split(':')[1] || ''),
      peakMB: parseSizeMB(peakLine.split(':')[1] || ''),
    };
  } catch {
    return { pid, alive: false };
  }
};

const sampleOnce = async (rootPid) => {
  const pids = await allDescendants(rootPid);
  const procs = (await Promise.all(pids.map(procInfo))).filter((p) => p.alive);
  const total = procs.reduce((s, p) => s + p.footprintMB, 0);
  return { ts: Date.now(), procs, total };
};

let peakSample = null;
let stop = false;
let vitePid = null;
const samplerLoop = async () => {
  while (!stop) {
    if (vitePid) {
      const s = await sampleOnce(vitePid).catch(() => null);
      if (s && (!peakSample || s.total > peakSample.total)) {
        peakSample = s;
      }
    }
    if (!stop) await sleep(SAMPLE_INTERVAL_MS);
  }
};

const fmtGB = (mb) => `${(mb / 1024).toFixed(3).padStart(8)} GB`;

const dumpSample = (label, sample) => {
  console.log(`\n=== ${label} ===`);
  console.log(`  ts=${new Date(sample.ts).toISOString()}  procs=${sample.procs.length}`);
  for (const p of sample.procs) {
    console.log(`\n  pid=${p.pid} ppid=${p.ppid} (${p.cmd})`);
    console.log(`    command              : ${p.command}`);
    console.log(`    rss                  : ${fmtGB(p.rssMB)}`);
    console.log(`    Physical footprint   : ${fmtGB(p.footprintMB)}`);
    console.log(`    Physical footprint^  : ${fmtGB(p.peakMB)}`);
  }
  const totalRss = sample.procs.reduce((s, p) => s + p.rssMB, 0);
  const totalPeak = sample.procs.reduce((s, p) => s + p.peakMB, 0);
  console.log(`\n  TOTAL`);
  console.log(`    rss                  : ${fmtGB(totalRss)}   (shared pages double-counted)`);
  console.log(`    Physical footprint   : ${fmtGB(sample.total)}   <-- accurate`);
  console.log(`    Σ per-proc peak      : ${fmtGB(totalPeak)}   (upper bound, not a real instant)`);
};

const samplerPromise = samplerLoop();

try {
  await waitReady;
  vitePid = await getVitePid();

  const chromePath = '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome';
  const browser = await chromium.launch({
    args: ['--no-sandbox', '--disable-gpu'],
    executablePath: existsSync(chromePath) ? chromePath : undefined,
    headless: true,
  });
  const page = await browser.newPage({ viewport: { width: 1280, height: 800 } });
  await page.goto('http://127.0.0.1:9876/', { waitUntil: 'load', timeout: 120_000 });
  await page.waitForTimeout(25_000);
  await browser.close();

  console.log(`\nsample interval: ${SAMPLE_INTERVAL_MS} ms`);

  dumpSample('after browser.close()', await sampleOnce(vitePid));

  for (let i = 1; i <= IDLE_SAMPLES; i++) {
    await sleep(IDLE_AFTER_CLOSE_MS);
    const label = `steady state (after ${(IDLE_AFTER_CLOSE_MS * i) / 1000}s idle)`;
    dumpSample(label, await sampleOnce(vitePid));
  }

  stop = true;
  await samplerPromise;
} finally {
  stop = true;
  server.kill('SIGTERM');
}

Why switching to mimalloc v3 helps Rolldown

TL;DR

Switching mimalloc-safe to the v3 feature drops Physical footprint by roughly 47% in our dev-server reproduction (1946 MiB → 1024 MiB), with no observable downside on the same workload. The improvement is not specific to dev-server. Any Rolldown workload that involves multiple worker threads, multiple build phases, or repeated builds in one process benefits.

Why v3 wins, in one paragraph

mimalloc v2 keeps every mimalloc page strictly thread-private. A page can only ever be reused by the thread that allocated it. The only cross-thread sharing path is "thread exits, abandon the whole 32 MiB segment to a global list", which almost never fires in long-running processes that use a worker pool (tokio, rayon).

mimalloc v3 changes this. As soon as a page becomes full, the default code path (mi_page_to_full calls _mi_page_abandon) puts the page into a global pages_abandoned[bin] bitmap, where any other thread looking for that size class can claim it via an atomic CAS. The upstream comment in page.c:382 says it directly: "this is the usual case in order to allow for sharing of memory between theaps".

The end result is that the per-process committed memory converges from sum(historical peak per thread) (v2) toward max(global concurrent active working set) (v3). The reduction factor approaches the worker thread count when workloads are imbalanced.

Rolldown workloads that benefit

Every Rolldown invocation has these traits to varying degrees:

  1. Multi-threaded by default: tokio + rayon worker pools sized around CPU cores
  2. Phased allocation: parser, AST, transforms, codegen each touch different size classes
  3. Wide size class coverage: from 16-byte tokens to 64+ KiB chunks
  4. Sparse long-lived survivors: module cache, Arc'd info, interned strings
  5. Multiple builds per process in dev-server, watch, test suites, library bundling pipelines

Whether it is the Rolldown CLI, Vite production builds, the dev-server with HMR, or watch mode, these traits hold. Dev-server simply pushes every dimension to the extreme, which is why the improvement is most dramatic there.

@netlify

netlify Bot commented May 11, 2026

Copy link
Copy Markdown

Deploy Preview for rolldown-rs canceled.

Name Link
🔨 Latest commit 6f32a77
🔍 Latest deploy log https://app.netlify.com/projects/rolldown-rs/deploys/6a03e2985d56d40008b9de02

@shulaoda shulaoda changed the title perf(binding): enable mimalloc v3 to release dev-server memory to the OS perf(binding): enable mimalloc v3 to reduce idle memory May 11, 2026
@shulaoda shulaoda force-pushed the 05-11-perf_binding_enable_mimalloc_v3_to_release_dev-server_memory_to_the_os branch 2 times, most recently from d0499b1 to 541f287 Compare May 12, 2026 03:55
@shulaoda shulaoda marked this pull request as ready for review May 12, 2026 03:55
@shulaoda shulaoda force-pushed the 05-11-perf_binding_enable_mimalloc_v3_to_release_dev-server_memory_to_the_os branch from 541f287 to 4ec47bc Compare May 12, 2026 12:57
@hyf0

hyf0 commented May 13, 2026

Copy link
Copy Markdown
Member

Can you add a design doc about why we choose mimalloc V3?

@hyf0

hyf0 commented May 13, 2026

Copy link
Copy Markdown
Member

Also can you run benchmark to compare if rolldown is affected by two different versions of mimalloc?

@shulaoda shulaoda merged commit 6fb4811 into main May 13, 2026
31 checks passed
@shulaoda shulaoda deleted the 05-11-perf_binding_enable_mimalloc_v3_to_release_dev-server_memory_to_the_os branch May 13, 2026 09:24
@rolldown-guard rolldown-guard Bot mentioned this pull request May 13, 2026
shulaoda added a commit that referenced this pull request May 13, 2026
## [1.0.1] - 2026-05-13

### 🚀 Features

- experimental/lazy-barrel: advice on oversized barrel modules (#9236) by @shulaoda
- rolldown: inline optional-chain enum access (#9379) by @Dunqing
- chunk-optimization: dedupe already-loaded dynamic deps (#9305) by @IWANABETHATGUY
- binding: call moduleParsed hook in ParallelJsPlugin (#9318) by @jaehafe

### 🐛 Bug Fixes

- transform: enable `enum_eval` for `transformSync` and vite TS transform (#9325) by @Dunqing
- error: remove severity prefix from diagnostic messages (#9262) by @Kyujenius
- deps: pin pnpm to 10.23.0 to work around catalog mismatch on Netlify (#9364) by @shulaoda
- ci: pin mimalloc-safe to 0.1.58 (#9361) by @shulaoda
- dev/lazy: fix exports of lazy requests in lazy chunks (#9249) by @h-a-n-a
- rolldown_plugin_vite_resolve: handle errors in `resolveSubpathImports` callback (#9355) by @sapphi-red
- rolldown_plugin_lazy_compilation: use loadExports for fetched proxy to preserve original export names (#9132) by @h-a-n-a
- common: include offending index in HybridIndexVec panic message (#9296) by @SAY-5

### 🚜 Refactor

- ecmascript: extract semantic_builder_for_transform helper (#9326) by @Dunqing
- test: extract reusable static-import-cycle helper (#9332) by @IWANABETHATGUY

### 📚 Documentation

- clarify scope of `topLevelVar` (#9380) by @IWANABETHATGUY
- meta/design: add ast-mutation design doc (#9338) by @hyf0
- feat: add ai policy in contribution guide (#9315) by @mdong1909

### ⚡ Performance

- binding: enable mimalloc v3 to reduce idle memory (#9349) by @shulaoda

### 🧪 Testing

- mcs: cover require() in `$initial` group (#9376) by @hyf0
- add regression for CJS facade chunk merge into entry (#9351) by @IWANABETHATGUY

### ⚙️ Miscellaneous Tasks

- switch prepare-release to manual dispatch with version input (#9383) by @shulaoda
- migrate `@rolldown/pluginutils` to `rolldown/plugins` (#9317) by @shulaoda
- deps: pin libmimalloc-sys2 to 0.1.54 (#9372) by @shulaoda
- replace `igorskyflyer/action-readfile` with `cat` (#9369) by @sapphi-red
- deps: update test262 submodule for tests (#9371) by @rolldown-guard[bot]
- use app token for test dep update PRs (#9368) by @sapphi-red
- replace some actions with gh commands (#9367) by @sapphi-red
- replace action-semantic-pull-request with inline regex (#9366) by @sapphi-red
- remove pull_request_target workflows (#9188) by @Boshen
- deps: upgrade oxc to 0.130.0 (#9360) by @shulaoda
- deps: update github actions (major) (#9348) by @renovate[bot]
- deps: update github actions (#9341) by @renovate[bot]
- deps: update rust crates (#9344) by @renovate[bot]
- deps: update crate-ci/typos action to v1.46.1 (#9357) by @renovate[bot]
- deps: update npm packages (#9343) by @renovate[bot]
- deps: update pnpm to v10.33.4 (#9347) by @renovate[bot]
- deps: update dependency rolldown-plugin-dts to ^0.25.0 (#9346) by @renovate[bot]
- .claude: add rolldown-repl encoder, rename decode skill (#9352) by @IWANABETHATGUY
- deps: update crate-ci/typos action to v1.46.0 (#9345) by @renovate[bot]
- deps: update napi to v3.8.6 (#9342) by @renovate[bot]
- deps: update dependency vite-plus to v0.1.20 (#9340) by @renovate[bot]
- enable rollup chunking-form test (#9335) by @IWANABETHATGUY
- typo: fix typo in watcher options comment (#9324) by @thescripted

### ❤️ New Contributors

* @Kyujenius made their first contribution in [#9262](#9262)
* @SAY-5 made their first contribution in [#9296](#9296)
* @thescripted made their first contribution in [#9324](#9324)

Co-authored-by: shulaoda <[email protected]>
IWANABETHATGUY pushed a commit that referenced this pull request May 18, 2026
## Summary

- Upgrades `mimalloc-safe` from 0.1.52 to 0.1.59 at the workspace level.
- Enables the `v3` feature on every `mimalloc-safe` dependency block inside `crates/rolldown_binding/Cargo.toml` so the binding ships against mimalloc v3.
- No Rust source changes — the feature is purely a `libmimalloc-sys2/build.rs` switch between `c_src/mimalloc/` (v2) and `c_src/mimalloc3/` (v3).

## Why

Per the investigation on #9330, the steady-state memory of `vite dev` on a non-trivial workload (lobe-chat, lobehub-ui) is dominated by mimalloc retaining pages it has already freed. v2 on macOS effectively never returns them: a single live object pins a 64 MB segment, and the purge-delay path is conservative. v3 reworks segments into smaller sub-pages and simplifies the purge timer, so empty regions actually get `madvise(MADV_FREE_REUSABLE)`-released on the order of seconds.

## Measured impact

> **Process model**
> - `vite 7 + esbuild` runs as **two processes**: the Vite Node process **and** an esbuild Go child process (used for dep prebundling). Both must be accounted for. `measure.mjs` recursively walks every descendant of the Vite server and sums `Physical footprint` / RSS at each sample, so both processes are included in every row below.
> - `vite 8 + [email protected]` and `vite 8 + local + v3` run as a **single Node process** — rolldown is an in-process Rust napi addon, so no child process is spawned.
>
> **`Σ per-proc peak` caveat**
> The `Σ per-proc peak` row sums each process's *lifetime* `Physical footprint (peak)` field reported by `vmmap`. It is a **mathematical upper bound**, not a real instantaneous peak — different processes may reach their per-process peaks at different moments. This **especially inflates the `vite 7 + esbuild` column**, because esbuild typically peaks during dep prebundling while the Vite Node process peaks later during request handling; the two peaks never co-occur but get added together. Single-process columns (`[email protected]`, `local + v3`) are not affected — for them `Σ per-proc peak` equals the real instantaneous peak.

[`cijiugechu/rolldown-9330-repro`](https://github.com/cijiugechu/rolldown-9330-repro)

| Metric | vite 7 + esbuild | vite 8 + [email protected] | vite 8 + local + `v3` |
|--------|------------------|--------------------------|------------------------|
| Σ per-proc peak (upper bound) | ~2.058 G | ~2.100 G | ~1.800 G |
| Physical footprint (after browser close) | ~1.971 G | ~2.000 G | ~0.971 G |
| Physical footprint (90s idle) | ~1.876 G | ~1.800 G | ~0.780 G |
| Physical footprint (180s idle) | ~1.876 G | ~1.800 G | ~0.780 G |
| Physical footprint (270s idle) | ~0.281 G | ~1.800 G | ~0.780 G |
| Physical footprint (360s idle) | ~0.281 G | ~1.800 G | ~0.780 G |
| RSS (after browser close) | ~2.018 G | ~2.182 G | ~2.040 G |
| RSS (90s idle) | ~1.932 G | ~2.005 G | ~1.857 G |
| RSS (180s idle) | ~1.932 G | ~2.003 G | ~1.857 G |
| RSS (270s idle) | ~1.932 G | ~2.003 G | ~1.857 G |
| RSS (360s idle) | ~1.932 G | ~2.003 G | ~1.857 G |

[`lobehub/lobe-chat`](https://github.com/lobehub/lobe-chat)

| Metric | vite 7 + esbuild | vite 8 + [email protected] | vite 8 + local + `v3` |
|--------|------------------|--------------------------|------------------------|
| Σ per-proc peak (upper bound) | ~6.900 G | ~7.200 G | ~7.800 G |
| Physical footprint (after browser close) | ~6.800 G | ~7.000 G | ~3.700 G |
| Physical footprint (90s idle) | ~5.887 G | ~5.700 G | ~2.400 G |
| Physical footprint (180s idle) | ~5.860 G | ~5.700 G | ~2.300 G |
| Physical footprint (270s idle) | ~0.882 G | ~5.700 G | ~2.300 G |
| Physical footprint (360s idle) | ~0.882 G | ~5.700 G | ~2.300 G |
| RSS (after browser close) | ~7.057 G | ~8.642 G | ~8.385 G |
| RSS (90s idle) | ~6.187 G | ~7.505 G | ~7.311 G |
| RSS (180s idle) | ~6.156 G | ~7.505 G | ~7.241 G |
| RSS (270s idle) | ~6.156 G | ~7.462 G | ~7.241 G |
| RSS (360s idle) | ~6.156 G | ~7.462 G | ~7.241 G |

[`lobehub/lobehub`](https://github.com/lobehub/lobehub)

| Metric | vite 7 + esbuild | vite 8 + [email protected] | vite 8 + local + `v3` |
|--------|------------------|--------------------------|------------------------|
| Σ per-proc peak (upper bound) | ~7.100 G | ~7.500 G | ~7.700 G |
| Physical footprint (after browser close) | ~7.100 G | ~7.300 G | ~3.300 G |
| Physical footprint (90s idle) | ~3.236 G | ~6.000 G | ~2.200 G |
| Physical footprint (180s idle) | ~3.236 G | ~6.000 G | ~2.200 G |
| Physical footprint (270s idle) | ~0.884 G | ~6.000 G | ~2.200 G |
| Physical footprint (360s idle) | ~0.884 G | ~6.000 G | ~2.200 G |
| RSS (after browser close) | ~7.252 G | ~8.767 G | ~8.183 G |
| RSS (90s idle) | ~6.367 G | ~7.548 G | ~7.220 G |
| RSS (180s idle) | ~6.367 G | ~7.548 G | ~7.220 G |
| RSS (270s idle) | ~6.313 G | ~7.505 G | ~7.178 G |
| RSS (360s idle) | ~6.313 G | ~7.505 G | ~7.178 G |

## Caveats

- **Windows is not covered by this PR.** `libmimalloc-sys2/build.rs` has a separate `build_mimalloc_win()` path that hard-codes `./c_src/mimalloc/` (v2) regardless of the `v3` feature. The Cargo manifest still requests `v3` so it activates automatically once upstream fixes the Windows branch; until then Windows users continue on v2.

<details>
<summary><code>scripts/measure.mjs</code></summary>

```js
import { chromium } from '@playwright/test';
import { execFile, spawn } from 'node:child_process';
import { existsSync } from 'node:fs';
import { promisify } from 'node:util';

const execFileAsync = promisify(execFile);
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));

const SAMPLE_INTERVAL_MS = 250;
const IDLE_AFTER_CLOSE_MS = 90_000;
const IDLE_SAMPLES = 4;

const server = spawn(
  'pnpm',
  ['exec', 'vite', '--port', '9876', '--host', '127.0.0.1', '--force'],
  { stdio: ['ignore', 'pipe', 'pipe'] },
);

const waitReady = new Promise((resolve, reject) => {
  const timeout = setTimeout(() => reject(new Error('vite ready timeout')), 90_000);
  server.stdout.on('data', (data) => {
    const text = data.toString();
    process.stdout.write(text);
    if (text.includes('ready in')) {
      clearTimeout(timeout);
      resolve();
    }
  });
  server.stderr.on('data', (data) => process.stderr.write(data));
  server.on('exit', (code) => reject(new Error(`vite exited before ready: ${code}`)));
});

const getVitePid = async () => {
  const { stdout } = await execFileAsync('lsof', ['-tiTCP:9876', '-sTCP:LISTEN']);
  const pid = Number(stdout.trim().split('\n')[0]);
  if (!pid) throw new Error('cannot find vite pid listening on 9876');
  return pid;
};

const allDescendants = async (rootPid) => {
  const result = [rootPid];
  const stack = [rootPid];
  while (stack.length) {
    const p = stack.pop();
    try {
      const { stdout } = await execFileAsync('pgrep', ['-P', String(p)]);
      const children = stdout.trim().split('\n').filter(Boolean).map(Number);
      for (const c of children) {
        result.push(c);
        stack.push(c);
      }
    } catch { /* no children */ }
  }
  return result;
};

const parseSizeMB = (text) => {
  const match = text.match(/([\d.]+)\s*([KMG])?/);
  if (!match) return 0;
  const num = parseFloat(match[1]);
  const unit = match[2];
  if (unit === 'G') return num * 1024;
  if (unit === 'K') return num / 1024;
  if (unit === 'M') return num;
  return num / (1024 * 1024);
};

const procInfo = async (pid) => {
  try {
    const [vm, psOut] = await Promise.all([
      execFileAsync('vmmap', ['-summary', String(pid)]),
      execFileAsync('ps', ['-o', 'ppid=,rss=,command=', '-p', String(pid)]),
    ]);
    const [, ppidStr, rssStr, command] = psOut.stdout.trim().match(/^\s*(\d+)\s+(\d+)\s+(.*)$/);
    const ppid = Number(ppidStr);
    const rss = Number(rssStr);
    const lines = vm.stdout.split('\n');
    const fpLine = lines.find((l) => l.trim().startsWith('Physical footprint:')) || '';
    const peakLine = lines.find((l) => l.trim().startsWith('Physical footprint (peak):')) || '';
    return {
      pid,
      ppid,
      alive: true,
      cmd: (command.split(/\s+/)[0] || '').split('/').pop(),
      command,
      rssMB: rss / 1024,
      footprintMB: parseSizeMB(fpLine.split(':')[1] || ''),
      peakMB: parseSizeMB(peakLine.split(':')[1] || ''),
    };
  } catch {
    return { pid, alive: false };
  }
};

const sampleOnce = async (rootPid) => {
  const pids = await allDescendants(rootPid);
  const procs = (await Promise.all(pids.map(procInfo))).filter((p) => p.alive);
  const total = procs.reduce((s, p) => s + p.footprintMB, 0);
  return { ts: Date.now(), procs, total };
};

let peakSample = null;
let stop = false;
let vitePid = null;
const samplerLoop = async () => {
  while (!stop) {
    if (vitePid) {
      const s = await sampleOnce(vitePid).catch(() => null);
      if (s && (!peakSample || s.total > peakSample.total)) {
        peakSample = s;
      }
    }
    if (!stop) await sleep(SAMPLE_INTERVAL_MS);
  }
};

const fmtGB = (mb) => `${(mb / 1024).toFixed(3).padStart(8)} GB`;

const dumpSample = (label, sample) => {
  console.log(`\n=== ${label} ===`);
  console.log(`  ts=${new Date(sample.ts).toISOString()}  procs=${sample.procs.length}`);
  for (const p of sample.procs) {
    console.log(`\n  pid=${p.pid} ppid=${p.ppid} (${p.cmd})`);
    console.log(`    command              : ${p.command}`);
    console.log(`    rss                  : ${fmtGB(p.rssMB)}`);
    console.log(`    Physical footprint   : ${fmtGB(p.footprintMB)}`);
    console.log(`    Physical footprint^  : ${fmtGB(p.peakMB)}`);
  }
  const totalRss = sample.procs.reduce((s, p) => s + p.rssMB, 0);
  const totalPeak = sample.procs.reduce((s, p) => s + p.peakMB, 0);
  console.log(`\n  TOTAL`);
  console.log(`    rss                  : ${fmtGB(totalRss)}   (shared pages double-counted)`);
  console.log(`    Physical footprint   : ${fmtGB(sample.total)}   <-- accurate`);
  console.log(`    Σ per-proc peak      : ${fmtGB(totalPeak)}   (upper bound, not a real instant)`);
};

const samplerPromise = samplerLoop();

try {
  await waitReady;
  vitePid = await getVitePid();

  const chromePath = '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome';
  const browser = await chromium.launch({
    args: ['--no-sandbox', '--disable-gpu'],
    executablePath: existsSync(chromePath) ? chromePath : undefined,
    headless: true,
  });
  const page = await browser.newPage({ viewport: { width: 1280, height: 800 } });
  await page.goto('http://127.0.0.1:9876/', { waitUntil: 'load', timeout: 120_000 });
  await page.waitForTimeout(25_000);
  await browser.close();

  console.log(`\nsample interval: ${SAMPLE_INTERVAL_MS} ms`);

  dumpSample('after browser.close()', await sampleOnce(vitePid));

  for (let i = 1; i <= IDLE_SAMPLES; i++) {
    await sleep(IDLE_AFTER_CLOSE_MS);
    const label = `steady state (after ${(IDLE_AFTER_CLOSE_MS * i) / 1000}s idle)`;
    dumpSample(label, await sampleOnce(vitePid));
  }

  stop = true;
  await samplerPromise;
} finally {
  stop = true;
  server.kill('SIGTERM');
}
```

</details>


---

# Why switching to mimalloc v3 helps Rolldown

## TL;DR

Switching `mimalloc-safe` to the `v3` feature drops Physical footprint by roughly **47%** in our dev-server reproduction (1946 MiB → 1024 MiB), with no observable downside on the same workload. The improvement is not specific to dev-server. Any Rolldown workload that involves multiple worker threads, multiple build phases, or repeated builds in one process benefits.

## Why v3 wins, in one paragraph

mimalloc v2 keeps every mimalloc page strictly thread-private. A page can only ever be reused by the thread that allocated it. The only cross-thread sharing path is "thread exits, abandon the whole 32 MiB segment to a global list", which almost never fires in long-running processes that use a worker pool (tokio, rayon).

mimalloc v3 changes this. As soon as a page becomes full, the default code path (`mi_page_to_full` calls `_mi_page_abandon`) puts the page into a global `pages_abandoned[bin]` bitmap, where any other thread looking for that size class can claim it via an atomic CAS. The upstream comment in `page.c:382` says it directly: *"this is the usual case in order to allow for sharing of memory between theaps"*.

The end result is that the per-process committed memory converges from `sum(historical peak per thread)` (v2) toward `max(global concurrent active working set)` (v3). The reduction factor approaches the worker thread count when workloads are imbalanced.

## Rolldown workloads that benefit

Every Rolldown invocation has these traits to varying degrees:

1. **Multi-threaded by default**: tokio + rayon worker pools sized around CPU cores
2. **Phased allocation**: parser, AST, transforms, codegen each touch different size classes
3. **Wide size class coverage**: from 16-byte tokens to 64+ KiB chunks
4. **Sparse long-lived survivors**: module cache, Arc'd info, interned strings
5. **Multiple builds per process** in dev-server, watch, test suites, library bundling pipelines

Whether it is the Rolldown CLI, Vite production builds, the dev-server with HMR, or watch mode, these traits hold. Dev-server simply pushes every dimension to the extreme, which is why the improvement is most dramatic there.
IWANABETHATGUY pushed a commit that referenced this pull request May 18, 2026
## [1.0.1] - 2026-05-13

### 🚀 Features

- experimental/lazy-barrel: advice on oversized barrel modules (#9236) by @shulaoda
- rolldown: inline optional-chain enum access (#9379) by @Dunqing
- chunk-optimization: dedupe already-loaded dynamic deps (#9305) by @IWANABETHATGUY
- binding: call moduleParsed hook in ParallelJsPlugin (#9318) by @jaehafe

### 🐛 Bug Fixes

- transform: enable `enum_eval` for `transformSync` and vite TS transform (#9325) by @Dunqing
- error: remove severity prefix from diagnostic messages (#9262) by @Kyujenius
- deps: pin pnpm to 10.23.0 to work around catalog mismatch on Netlify (#9364) by @shulaoda
- ci: pin mimalloc-safe to 0.1.58 (#9361) by @shulaoda
- dev/lazy: fix exports of lazy requests in lazy chunks (#9249) by @h-a-n-a
- rolldown_plugin_vite_resolve: handle errors in `resolveSubpathImports` callback (#9355) by @sapphi-red
- rolldown_plugin_lazy_compilation: use loadExports for fetched proxy to preserve original export names (#9132) by @h-a-n-a
- common: include offending index in HybridIndexVec panic message (#9296) by @SAY-5

### 🚜 Refactor

- ecmascript: extract semantic_builder_for_transform helper (#9326) by @Dunqing
- test: extract reusable static-import-cycle helper (#9332) by @IWANABETHATGUY

### 📚 Documentation

- clarify scope of `topLevelVar` (#9380) by @IWANABETHATGUY
- meta/design: add ast-mutation design doc (#9338) by @hyf0
- feat: add ai policy in contribution guide (#9315) by @mdong1909

### ⚡ Performance

- binding: enable mimalloc v3 to reduce idle memory (#9349) by @shulaoda

### 🧪 Testing

- mcs: cover require() in `$initial` group (#9376) by @hyf0
- add regression for CJS facade chunk merge into entry (#9351) by @IWANABETHATGUY

### ⚙️ Miscellaneous Tasks

- switch prepare-release to manual dispatch with version input (#9383) by @shulaoda
- migrate `@rolldown/pluginutils` to `rolldown/plugins` (#9317) by @shulaoda
- deps: pin libmimalloc-sys2 to 0.1.54 (#9372) by @shulaoda
- replace `igorskyflyer/action-readfile` with `cat` (#9369) by @sapphi-red
- deps: update test262 submodule for tests (#9371) by @rolldown-guard[bot]
- use app token for test dep update PRs (#9368) by @sapphi-red
- replace some actions with gh commands (#9367) by @sapphi-red
- replace action-semantic-pull-request with inline regex (#9366) by @sapphi-red
- remove pull_request_target workflows (#9188) by @Boshen
- deps: upgrade oxc to 0.130.0 (#9360) by @shulaoda
- deps: update github actions (major) (#9348) by @renovate[bot]
- deps: update github actions (#9341) by @renovate[bot]
- deps: update rust crates (#9344) by @renovate[bot]
- deps: update crate-ci/typos action to v1.46.1 (#9357) by @renovate[bot]
- deps: update npm packages (#9343) by @renovate[bot]
- deps: update pnpm to v10.33.4 (#9347) by @renovate[bot]
- deps: update dependency rolldown-plugin-dts to ^0.25.0 (#9346) by @renovate[bot]
- .claude: add rolldown-repl encoder, rename decode skill (#9352) by @IWANABETHATGUY
- deps: update crate-ci/typos action to v1.46.0 (#9345) by @renovate[bot]
- deps: update napi to v3.8.6 (#9342) by @renovate[bot]
- deps: update dependency vite-plus to v0.1.20 (#9340) by @renovate[bot]
- enable rollup chunking-form test (#9335) by @IWANABETHATGUY
- typo: fix typo in watcher options comment (#9324) by @thescripted

### ❤️ New Contributors

* @Kyujenius made their first contribution in [#9262](#9262)
* @SAY-5 made their first contribution in [#9296](#9296)
* @thescripted made their first contribution in [#9324](#9324)

Co-authored-by: shulaoda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants