fix: stabilize chunk assignment across parallel file reads#6362
Conversation
|
Someone is attempting to deploy a commit to the rollup-js Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Pull request overview
Fixes nondeterministic chunk assignment/hashing when parallel file reads complete in different orders (resolves #5902) by making chunk assignment inputs and merge ordering deterministic.
Changes:
- Derive entry indices from a stable sorted entry list and convert per-entry tracking to be module-keyed before indexing.
- Add deterministic ordering/tie-breakers for chunk grouping, partitioning, merging, and final output ordering.
- Add a regression test that runs equivalent builds with different file-read completion orders and asserts identical output.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/utils/chunkAssignment.ts |
Makes entry indexing and multiple chunk ordering/merge steps deterministic to stabilize chunk relationships and hashes. |
test/misc/misc.js |
Adds a memfs-based regression test that varies read completion order and asserts stable hashed outputs. |
|
Good catch. I consolidated the bitmask signature logic into a single helper to avoid drift. I also moved the deterministic sort key computation out of the sort comparator so signatures and module-id keys are computed once per sort pass. |
04bfe7e to
f40903b
Compare
Rollup stores parsed modules in a Map<string, Module> (modulesById). With parallel file reads, modules are inserted in whichever order their resolveId hooks complete, making iteration order non-deterministic across builds. Two code paths depended on this order: - Bundle.assignManualChunks iterated modulesById.values() directly, so a stateful manualChunks function produced different chunk assignments on different runs. Fix: sort modules alphabetically by ID before iterating. - assignExportsToMangledNames and assignExportsToNames iterated the chunk exports Set in insertion order. Fix: sort exports before assigning aliases using a stable comparator: module ID → source declaration position → base name → export key → constructor name. Synthetic variables (NamespaceVariable etc.) have no source declaration and receive Number.MAX_SAFE_INTEGER as their position so they always sort after regular variables, preventing them from stealing natural export names in CJS/AMD/UMD formats. Adds a regression test that builds the same graph twice with different resolveId delay patterns and asserts identical chunk names, hashes, and code. Updates five chunking-form snapshots whose previous expected output reflected arbitrary insertion order rather than the new deterministic sort (ES and System formats only).
f40903b to
772ac46
Compare
|
I carefully debugged the repro today and found that the inconsistent hash names come from the linked repro calling Since the file read order can affect the order in which I think sorting the exports is a reasonable approach, but it is only needed when generating mangled names, because I have updated your PR directly with the description above and added a regression test. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #6362 +/- ##
=======================================
Coverage 98.78% 98.78%
=======================================
Files 274 274
Lines 10793 10795 +2
Branches 2882 2883 +1
=======================================
+ Hits 10662 10664 +2
Misses 89 89
Partials 42 42 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@TrickyPi The makes sense, and the root cause analysis is more precise - emitFile call order from resolveId/transform is indeed the non-determinism source, and sorting by name in assignExportsToMangledNames is the right place to fix it. The simplified comparator and the focused regression test are cleaner. |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@TrickyPi is there anything pending on this PR before it can be merged? |
|
I don’t think there’s anything pending from my side. It should be ready to merge whenever @lukastaegert has time. |
|
Hi, thanks for the PR and sorry for the wait. I added another test and a small change to make the sorting more stable. Will merge this once I sorted out CI... |
|
I'm soooo happy this is fixed; is this also a problem in rolldown? |
|
This PR has been released as part of [email protected]. You can test it via |
`codeSplitting.groups[].name` was called in the order of `ModuleIdx` which is assigned in module-load completion order. This is a problem if `codeSplitting.groups[].name`behaves differently depending on the call order. While a stateful `codeSplitting.groups[].name`is not a good idea, it is also difficult to detect that and I think it's better to handle this on Rolldown side. This PR makes `codeSplitting.groups[].name` to be called in the `stable_id` order so that the order is deterministic. This PR is similar to the first part of rollup/rollup#6362. (The second part was already handled)
…own#9457) `codeSplitting.groups[].name` was called in the order of `ModuleIdx` which is assigned in module-load completion order. This is a problem if `codeSplitting.groups[].name`behaves differently depending on the call order. While a stateful `codeSplitting.groups[].name`is not a good idea, it is also difficult to detect that and I think it's better to handle this on Rolldown side. This PR makes `codeSplitting.groups[].name` to be called in the `stable_id` order so that the order is deterministic. This PR is similar to the first part of rollup/rollup#6362. (The second part was already handled)
This PR contains:
Are tests included?
Breaking Changes?
List any relevant issue numbers:
Description
Rollup stores parsed modules in a
Map<string, Module>(modulesById). With parallel file reads, modules are inserted in whichever order theirresolveIdhooks complete, making the Map's iteration order non-deterministic across builds. Two places depended on this order:Bundle.assignManualChunksiteratedmodulesById.values()directly, so a statefulmanualChunksfunction (one whose return value depends on how many times it has been called) would produce different chunk assignments on different runs. Fixed by sorting the modules alphabetically by ID before iterating.assignExportsToMangledNames/assignExportsToNamesinsrc/utils/exportNames.tsiterated the chunk'sexportsSet in insertion order. Because the Set is populated during module graph traversal, its order could also vary. Fixed by sorting the exports before assigning aliases, using a stable comparator: module ID → source declaration position → base name → export key → constructor name. Synthetic variables such asNamespaceVariablehave no source declaration, so they receiveNumber.MAX_SAFE_INTEGERas their position and always sort after regular variables within the same module — preventing them from stealing natural export names in CJS/AMD/UMD formats.The regression test builds the same graph twice with different
resolveIddelay patterns (so modules resolve in different orders) and asserts that chunk file names, hashes, and generated code are identical. Five chunking-form snapshot fixtures are updated to reflect the new deterministic sort order for ES and System output formats.