Conversation
807410d to
401f259
Compare
|
I've finished padding out the tests now including full dynamic import chunking cases for CJS and AMD output being supported. |
7621017 to
92afe1b
Compare
ad69de0 to
5626db6
Compare
4717559 to
df6dec2
Compare
|
We would be very interested in using this within StencilJS. I have been working on a way to accomplish this using a custom tool to identify bundling use cases based on https://github.com/samthor/srcgraph. It appears that this PR will mitigate much of the issues I have had. |
1e03702 to
afe6288
Compare
ed46590 to
abad2be
Compare
|
This is killer work @guybedford. If you need any help with approaching use cases that we've already covered let me know if we can lend a hand. |
|
Is it possible to add a sort of manifest file to the output? This would let consumers implement preloading in browsers of parsed she converted to http headers or H2 push |
|
@Munter the bundle.chunks: {
[chunkName: string]: {
name: string, // same as chunkName above
imports: string[],
exports: string[],
modules: string[]
}
}Reading the |
| return Promise.all(promises).then(() => { | ||
| return mapSequence( | ||
| graph.plugins.filter(plugin => plugin.onwrite), (plugin: Plugin) => | ||
| Promise.resolve(plugin.onwrite(assign({ bundle: chunk }, outputOptions), chunk)) |
There was a problem hiding this comment.
Should onwrite also receive the chunkName as a parameter somewhere? Previously in "single bundle mode", onwrite could figure out what files were written using the outputOptions, but now with chunks, I think it needs the chunkName to be able to re-construct the written file paths.
There was a problem hiding this comment.
The chunk object above actually does include the name as well, so I believe that would be passed to the plugins ok?
There was a problem hiding this comment.
Ah okay, must've misread it. Sounds good!
afe6288 to
ba10f04
Compare
abad2be to
e7c2c80
Compare
| }); | ||
| } | ||
|
|
||
| if (!codeSplitting) return graph.buildSingle(inputOptions.input) |
There was a problem hiding this comment.
When the codeSplitting flag is true then the TS types returned by generate will be different. Currently TS believes that the same types are returned whether this is set or not. I am not exactly sure the right approach to resolve this rather than setting an OR statement in the OutputBundle's generate result.
There was a problem hiding this comment.
There are basically two overloads of the generic rollup function from a typing perspective:
interface SingleInputOptions extends InputOptions {
input: string;
}
interface MultipleInputOptions extends InputOptions {
input: string[];
}
function rollup(options: SingleInputOptions): Promise<BundleSingle>;
function rollup(options: MultipleInputOptions): Promise<BundleChunks>;e7c2c80 to
605b12c
Compare
605b12c to
f76db6b
Compare
|
I am using this branch in a local version of StencilJS to build out Ionic. In my testing so far I have not ran into any issues. The chunking logic works exactly as expected. |
lukastaegert
left a comment
There was a problem hiding this comment.
This is just the first part of the review where I checked functionality and looked at all the tests. There are already some really impressive things you can do here!
I find the behaviour of the entrypoint-facade test a little confusing - why is the default export moved from main2 to the shared chunk? But the result is certainly correct.
I fear I will need to continue my review tomorrow with the main logic.
| @@ -0,0 +1,6 @@ | |||
| module.exports = { | |||
| description: 'simple chunking', | |||
There was a problem hiding this comment.
Better describe purpose of this specific test?
| @@ -0,0 +1,6 @@ | |||
| module.exports = { | |||
| description: 'simple chunking', | |||
There was a problem hiding this comment.
Better describe purpose of this specific test?
| @@ -0,0 +1,6 @@ | |||
| module.exports = { | |||
| description: 'simple chunking', | |||
There was a problem hiding this comment.
Better describe purpose of this specific test?
| @@ -0,0 +1,6 @@ | |||
| module.exports = { | |||
| description: 'simple chunking', | |||
There was a problem hiding this comment.
Better describe purpose of this specific test?
| @@ -0,0 +1,6 @@ | |||
| module.exports = { | |||
| description: 'simple chunking', | |||
There was a problem hiding this comment.
Better describe purpose of this specific test?
| return num + multiplier; | ||
| } | ||
|
|
||
| export { mult }; |
There was a problem hiding this comment.
So I assume the point of this test is to show that the unused imports from dep2.js are properly removed. But maybe we can do better by not emitting this chunk at all? But this might also be a possible future enhancement.
For instance the logic could be that a chunk is not emitted when
- There are no synchronous imports from it (otherwise we should probably include it for potential side-effects until we found away to improve the side-effect detection logic to find out if a chunk has side-effects in its initialization code. Actually, this could probably be done by checking if any of the top-level statements are included or if one of the imports has side-effects. But again, maybe a future enhancement)
- None of the dynamic imports are included
Another interesting fact I stumbled upon that we might improve in the future is that when
- a chunk exports a variable and
- this chunk is imported by two other chunks both of which import this variable and
- only one of the chunks actually uses this variable,
the imported variable is retained for both chunks. The reason is that there is only one variable entity and as soon as it is included, all imports of this variable are included. If none of the chunks use the variable, it is properly removed. I think it is more of a cosmetic issue but still.
There was a problem hiding this comment.
You've exactly understood the two limitations of the approach I put together - the chunking algorithm is run on the graph before treeshaking, and the export boundaries between chunks aren't properly pruned. I originally aimed to handle both of these cases (and my first few attempts were in this direction), but hit some road blocks.
When running chunking on the graph after treeshaking, execution order still has to be maintained as an invariant. In addition we need to work out which chunks can be considered pure in order to skip emission like you mention.
If I remember correctly, the main issue I hit when implementing this (after fixing up export tracing etc etc) was that the concept of "marked" or "used" must then be considered uniquely for each specific chunk, which wasn't a simple capability to add. I can't remember what the other issues were, but if you're interested I could try trawling through my notes for future reference.
There was a problem hiding this comment.
Note that in many of my practical investigations of the algorithm, these assumptions ended up not seeming so bad at all though. The majority of cases work really well without these features. I think as we use this feature we'll get a better idea for how bad "pure" chunk relations and unpruned boundaries might end up looking, and can either consider specific fixes, or taking the more general route then.
| options: { | ||
| input: ['main1.js', 'main2.js'] | ||
| } | ||
| }; |
There was a problem hiding this comment.
I guess indeed the proper term should be facade without the "s" in both the description and folder name.
|
Thanks for the initial review. I've made the changes and fixed up the namespace test. To explain the entrypoint facade concept the point is that when an entry point module is contained in a chunk, where that chunk has other exports as well, those other exports would effectively pollute the entry point module exports if we treated that chunk as the entry point itself. So the way we handle this is to first imagine all entry points as being "facades" to their actual exports in whatever chunk their actual code happens to be in. Those export names might then be changed and bundled with other exports, but the facade module allows us to just reexport the right interface. In the case where a chunk contains an entry point, and no name changes are necessary and no other exports are provided from the chunk we then make the optimization that the entrypoint facade is no longer necessary as the chunk matches the facade. So then we can just treat the chunk as the facade (as happens in most of the tests). Let me know if there is anything that is still unclear on that. |
|
I gave the code in this branch a try with a mid size Angular application. In my setup I used Rollup code splitting to generate lazy modules. Next I implemented lazy loading in Angular using the router, aided by systemjs url mappings. Here is a summary of my findings: http://www.syntaxsuccess.com/viewarticle/lazy-loading-with-rollup. Here is the config: https://github.com/thelgevold/angular-2-samples/blob/master/rollup/rollup-lazy-config.js |
|
Nice! |
lukastaegert
left a comment
There was a problem hiding this comment.
@guybedford yes, I initially had the misconception that the case you described could never happen. But of course you are right: if A and B are entry points, A imports B, and both A and B import D, then there will be two chunks: the A chunk and the BD chunk. And of course as both are directly imported, BD needs to have the exports of both B and D.
I was especially curious about this as I was wondering if code-splitting code not be used/extended to solve @kellyselden's ember-cli problem, cf. #1878. My thought was that maybe if the algorithm regarded each module as an entry point and was told to not bundle chunks, the only issue would be to provide proper output paths, especially for bundled external dependencies.
As nonsense as this idea may sound at first—i.e. running rollup without bundling—I see it becoming more and more relevant as we are further improving the tree-shaking and adding advanced code transformations that others may want to profit from without the need for bundling. Just some food for thought.
| if (command._.length === 1) { | ||
| command.input = command._; | ||
| } else if (command._.length === 1) { | ||
| if (command.input) { |
There was a problem hiding this comment.
It seems this could be simplified to only handle the error once and merge conditions, i.e. handle the error if command._.length >= 1 and then do the rest in a nested if clause.
src/Graph.ts
Outdated
| import path from 'path'; | ||
| import GlobalScope from './ast/scopes/GlobalScope'; | ||
| import xor from 'buffer-xor'; | ||
| import * as crypto from 'crypto'; |
There was a problem hiding this comment.
When building rollup, these imports create a lot of noise and especially the path, buffer-xor and crypto imports break browser compatibility (which is really important for the REPL to work; note that we cannot have unresolved dependencies for the browser build)!
You will also notice there is another import of path related functionality from ./utils/path further up. In a browser build, this import is replaced via an ad-hoc plugin in /rollup.config.js with a polyfill in /browser/path.ts. I guess a similar route should be taken for the other path-related methods or otherwise we should consider switching to rollup-plugin-node-builtins which promises to also solve the buffer related problems that npm run build lists.
For crypto, maybe a browser-compatible drop-in can be found that we might swap in similar to the path approach? Otherwise, as you are basically only using the hashing and the xor, maybe a more specialised library can be found?
There was a problem hiding this comment.
Noting that, maybe we need a test for the browser build that at least checks for externals or unresolved dependencies.
There was a problem hiding this comment.
Sure will see what I can do.
There was a problem hiding this comment.
Ok I've replaced it with native functions here and gone with numbered chunks. Hashing could be useful, but it should at least be proper hashing of the source that can cover all file names in the build as a separate feature.
| }); | ||
| } | ||
|
|
||
| buildChunks (entryModuleIds: string[]): Promise<{ [name: string]: Chunk }> { |
There was a problem hiding this comment.
There is a lot of overlap between buildSingle and this function that should probably be extracted if we want to keep both functions; though of course it would be even better if buildChunks could just handle the buildSingle case as well if we slightly modify it.
There was a problem hiding this comment.
Also this function is really long. It think brain-overload-wise, but also with regard to separation of concerns, it would benefit if the final chunk preparation could be extracted. Doing this first might also help us to determine more easily if the non-chunking case could become a special case of the chunking version. But maybe this does not make sense as you have obviously put much more thought into this. Still, some splitting up of this >120 lines method certainly would not hurt maintainability.
There was a problem hiding this comment.
At it's made an improvement on the buildSingle function though from where it was before as the main build method.
I'm still weary of over-abstracting though, it can be just as much or even more of a maintenance issue having to jump around a file to work out exactly what a method is doing. Seeing the low-level details in one place is a powerful way to understand something even if it does mean you have to stare at it for longer than 5 minutes :P
There was a problem hiding this comment.
Also the idea is the code comments provide the high-levels... reading through the comments gives an algorithmic overview, better than some clever method names might.
There was a problem hiding this comment.
First a promise as this is going to be a little longer (but only this one time): This is the last time for this review that I will try to make a point that smaller functions are a gods-end for maintainability and I will happily merge this PR without any changes here!
I also want to stress that I have the deepest respect for you as well as the tremendous work you did here. Nevertheless, I like to hold it like this tweet and hope others do towards me and my code as well: https://twitter.com/wolframkriesing/status/951375642876145664 Especially I want to obey commandment 8 in the given link, as hard as it feels.
That being said, I really do hope to eventually be able to convince you about the value of smaller, well-named functions. Maybe not for this PR but for PRs to come.
Code is read many more times than it is written
When I develop new functionality, I too find it easier to write a monolithic function first because it enables me to see everything together, find parallels, etc. But this is the initial development and I am the person who developed the code so I really know my way around. But other people will need to read and understand my code. And for each of them, it will be at time and again these 5 Minutes of staring at the code. Only to find out that most of it is irrelevant for their problem. And this is not just others but my future self as well after a few months. Estimates are of a ratio of 1-to-10 writing vs. reading, but it surely depends on the code. I would expect this rate to be even higher for such a high-profile OSS project.
This is an excellently sums it up. If you only want to read one thing, reads this: http://sam-koblenski.blogspot.de/2014/01/functions-should-be-short-and-sweet-but.html
The important argument for readability here is the "drilling down" aspect that allows you to get to the relevant parts quickly. Modern IDEs let you jump to function and method definitions with a single click, and if you really need to see it in context, inlining a function or method is usually just a keyboard shortcut away. As well as extracting it.
Long functions make people wary of refactorings
It is much harder in a long function to switch out part of the functionality. Because variables have long scopes and there are interdependencies between different parts of the function that make people wary. So instead of just switching out the part that needs switching, people start piling up additional functionality. Because they always need to understand the whole function to be able to change the relevant part and this costs time and greatly increases cognitive overload. See here: https://www.sandimetz.com/blog/2017/6/1/the-half-life-of-code
Code that isn't easy to replace doesn't get replaced, instead it gets expanded
Comments are a bad way of documenting code
This is thoroughly described here: https://visualstudiomagazine.com/articles/2013/06/01/roc-rocks.aspx
But the real problem is that, even if that miracle of accuracy and completeness were achieved, there's no way for someone reading the comments to know that accuracy and completeness have been achieved
A much shorter to-the-point summary of how to write self-documenting code (which is what I strongly advocate!) can be found here: https://gist.github.com/tharumax/a78883d198217ec3cdaf
Maintainability is key for rollup to be ahead of the competition
I have spent several years coding in tightly integrated teams with highly talented and experienced colleagues, constant code reviews and high enforced standards of software craftsmanship. What I am proposing is not solely my personal feeling but also the result of countless years of experience of many software developers on how to keep development speed high and bugs short-lived when coding in a team as the complexity of an application grows. Even though not strictly about JavaScript, Clean Code is still one of the most comprehensive books about the issue, listing countless examples and counter-examples as well as more background to understand the value these concepts have for efficient software development. And I have myself seen over and over that even the most convinced proponents of longer functions at some point accepted the wisdom of better abstraction.
Admittedly, good function names are key when extracting functionality. But in contrast to comments, function names can easily be adjusted via your IDE (and they should be). And a reviewer will always be happy to point out where a name is mis-leading so that a better one can be found.
But now I want to be silent, finish the review and obey commandment 8.
src/Graph.ts
Outdated
| } = {}; | ||
|
|
||
| const entryChunkNames: string[] = []; | ||
| function generateUniqueEntryPointChunkName (id: string): string { |
There was a problem hiding this comment.
If you provide the entryChunkNames as a parameter, this function could be extracted out of this method (and even out of this class), which if feel could help readability.
src/Graph.ts
Outdated
| // one more xor for circular entry points to ensure distinction | ||
| // to ensure one entry point per chunk property | ||
| if (module.isEntryPoint) | ||
| module.entryPointsHash = xor(module.entryPointsHash, curEntryHash); |
There was a problem hiding this comment.
Maybe there is an if-condition missing? Because if you have this situation:
- A is the only entry point
- A imports B
- B imports A
then two separate chunks will be created for no reason at all since B could easily be included into the common AB chunk. Maybe you want this second xor to only happen if the module is different from the module with which you started the current iteration (e.g. by providing it via a second parameter)? Or am I missing something?
There was a problem hiding this comment.
Also, I discovered an issue that may or may not be related to this for which I will create a test PR to illustrate it where rollup is not properly bundling an entry point.
There was a problem hiding this comment.
Excellent catch - yes we must avoid xoring with itself!
There was a problem hiding this comment.
Ok changed the algorithm slightly :)
| return; | ||
| allSeen[module.id] = true; | ||
|
|
||
| module.execIndex = ordered.length; |
There was a problem hiding this comment.
Nice idea with the execution order!
|
This was the main part of the review, everything that is missing are the finalisers + some small files here and there. I'm confident that I will conclude this tomorrow and looking forward to releasing this! Great stuff! Note that we definitely need to find a solution for the browser bundle first, though, see my comments. |
5d5b4ed to
4dca6bd
Compare
c7f0d3d to
e148229
Compare
lukastaegert
left a comment
There was a problem hiding this comment.
Done, all reviewed! One last point (see one of the comments) is if we shouldn't make the transition from non-chunking to chunking smoother. Might also enable us to remove some duplicate code. But might also be done once we move this out of "experimental".
| legacy?: boolean; | ||
| watch?: WatcherOptions; | ||
| experimentalDynamicImport?: boolean; | ||
| experimentalCodeSplitting?: boolean; |
There was a problem hiding this comment.
As I understand it in its current implementation, code splitting is a different mode of operation for rollup as with code-splitting
- the
dirproperty is required (even if you only specify one entry point) - the
fileproperty is ignored in favour of the original file name (+ possible deconflicting)
If we want to keep it like that, I would suggest to remove the "experimental" from the flag. However I would actually prefer to not have a second mode of operation (and keep the "experimental" in the first iteration). What if instead it worked like this:
- If you only provide one entry point (either like
"input": "./main.js"or"input": ["./main.js"]), then- If you only provide "output.file" but not "output.dir", this file is used as the bundle file as in this situation, there will always be a single output file (or am I wrong?)
- If you only provide "output.dir" but not "output.file", the same naming scheme as in the chunking version of the algorithm will be used
- If you provide both, a warning is displayed that you cannot provide both "output.file" and "output.dir" and the latter will be ignored (or the other way around, I am not partial here)
- If you provide several entry points and "output.dir" is provided, rollup will run the chunking version of the algorithm and use the original file names as a basis for the generated ones
- If you additionally provide "output.file", a warning is displayed that "output.file" will be ignored for multiple entry points
- If you do not provide "output.dir", an error is thrown (as is currently the case, but only if you actually have multiple entry points)
What do you think? Or do you think this is too complicated for the first iteration?
There was a problem hiding this comment.
The difference here is that dynamic imports are inlined into one file for the single-file output, while dynamic imports can generate chunks for the directory output.
So I think it's perhaps clearer to consider input[], dir and input, file as completely distinct cases. We could consider allowing input as a string for the dir case possibly.
test/chunking-form/index.js
Outdated
| expectedCode = 'missing file'; | ||
| } | ||
|
|
||
| /*try { |
There was a problem hiding this comment.
Would be nice to have a test with actual sourcemaps but if course it is kind of hard to verify they are correct. Not sure about this, commented code tends to become obsolete quickly. Maybe we check sourcemaps if they are present in the _expected folder and just add them to the basic test?
There was a problem hiding this comment.
Got this going :)
src/Graph.ts
Outdated
|
|
||
| export type ResolveDynamicImportHandler = (specifier: string | Node, parentId: string) => Promise<string | void>; | ||
|
|
||
| function generateChunkName (id: string, curEntryChunkNames: string[], startAtTwo = false): string { |
There was a problem hiding this comment.
As the code below does not seem to depend on curEntryChunkNames being an array, maybe we should replace it with a Set (or a blank() object) to slightly improve lookup performance when there are many chunks (i.e. Set.has instead of Array.indexOf)?
src/Graph.ts
Outdated
| while (curEntryChunkNames.indexOf(uniqueEntryName) !== -1) | ||
| uniqueEntryName = entryName + uniqueIndex++; | ||
| curEntryChunkNames.push(uniqueEntryName); | ||
| return uniqueEntryName + ext; |
There was a problem hiding this comment.
Btw. I love the new chunk names! Looks much better!
| } | ||
| }); | ||
| private warnCycle (_entryModule: Module, _ordered: Module[]) { | ||
| // TODO: reinstate |
| return result; | ||
| }); | ||
|
|
||
| return graph.buildChunks(inputOptions.input) |
There was a problem hiding this comment.
My hope is that if we can manage to make code-splitting always on (as outlined above), we could basically remove the first version of generate.
There was a problem hiding this comment.
Perhaps a middle ground for always-on could be that one input and an output file would output chunks in the same folder as the output file, possibly separating the dynamic import inlining option as inlineDynamicImports: true or similar?
There was a problem hiding this comment.
Ah I see, I was forgetting about this. I guess a flag would definitely make sense as otherwise the behaviour would be inconsistent.
There was a problem hiding this comment.
So maybe:
- If we have a single entry point
- if
output.fileis provided, it is the name of the chunk generated for the entry point - if not but we have
output.dir, we use the new naming algorithm - if neither is provided, we show an error that either is missing
- if a dynamic import is encountered +
inlineDynamicImports: true, it is of course inlined - if a dynamic import is encountered +
inlineDynamicImports: false- if
output.diris provided, additional generated chunks are placed there - otherwise, an error is shown
- if
- if
- If we have multiple entry points
- if
output.diris missing, we show an error - if
output.fileis provided, we show a warning that it is ignored
?
- if
There was a problem hiding this comment.
I noticed the experimentalCodeSplitting flag isn't really behaving like an "experimental" flag here since it's guiding the whole process. I've made the following change for now:
- If
inputis a string, do a single file build, assumingfileand warning otherwise - If
inputis an array and the flag is enabled, do a code splitting build, assumingdirand warning otherwise
We can potentially look at different arrangements of this before flag removal, but at least the flag itself is not part of the final decision now.
b841276 to
6ab004d
Compare
lukastaegert
left a comment
There was a problem hiding this comment.
Cool! Gonna release this Sunday or Monday.
| while (chunkNames[uniqueName]) | ||
| uniqueName = name + uniqueIndex++; | ||
| chunkNames[uniqueName] = true; | ||
| return uniqueName + ext; |
There was a problem hiding this comment.
Very good, also like the stream-lined names!
lukastaegert
left a comment
There was a problem hiding this comment.
Could catch, must have totally overlooked this 👍
Also like the new way the feature flag works now.
|
I will put this into a release branch first which I will merge into master once we are sure everything is there that should be there :) |
|
This was a great story to read. |
Finally got the first draft of this feature working (#372). This is feature complete, passing tests, and ready for review. It deals with the edge cases comprehensively.
This is written as a PR against the dynamic import inlining PR, so I'll update this as that gets reviewed in as well (lets get that one going first too).
The API for this is to take
inputas an array of entry points:Then the bundle returned can be written with a
dirargument instead of afileargument:Code splitting supports output formats
cjs,esandamdbut not iife or umd due to the inability to get good dependency support.The above would output files:
dist/main1.js,dist/main2.jsanddist/feature1.jswhere those files would not contain any dupicated modules between them. Instead, where modules are duplicated, a separate chunk is created with a hash suffix and both modules will load from that. The algorithm to do this is basically a graph colouring algorithm where every entry point is given a unique colour (hash buffer), and we then iterate all reachable modules from that iterate, colouring in the graph. As we iterate the other entry points the colours mix, and at the end, the chunks are the modules that have the same colours.Detected dynamic imports are treated as "discovered" entry points in this process, thereby sharing the chunking.
There is no registry used in any of this - we simply wire up the import and export bindings, carefully managing execution order consistency and then adapting the finalisers and deshadowing process to deal with the new deshadowing problems between these boundaries.
The wiring gets complex quickly on these cases, so I had to make some architectural changes to the deshadowing and finaliser process, basically:
getName ()is now simply return this.safeName || this.name on all Variable classes.setRenderIdentifiersand includes the previous getName logic, ie setting an identifier safeName toexports.exportNamewhere necssary and everything. We could even consider changingsafeNametobundleName,outputNameorrenderNameto make this clearer.