Skip to content

Faster passing ASTs from Rust to JS #2409

@overlookmotel

Description

@overlookmotel

Currently OXC's parser is extremely fast, but using it from NodeJS is not. The primary cause is the overhead of the JS/Rust boundary - specifically serializing/deserializing large AST structures, in order to pass them between the two "worlds".

Right now, it's not a problem, as OXC is mainly consumed as a Rust lib. However, I suspect that as OXC's transformer, linter, and minifier are built out and gain popularity, this may become a bottleneck, because people will be asking for a way to write transformer/linter/etc plugins in JavaScript, and the performance will not be up to their expectations.

Currently OXC uses JSON as the serialization format. There's a POC implementation using Flexbuffers, which I imagine is much faster.

However, I believe that OXC is uniquely placed to go one better, and cut the overhead of serialization/deserialization practically to zero - in a way that no other current tool that I'm aware of will be able to match.

Apologies in advance this is going to be a long one...

Background: Why I think this is important

JavaScript as we know it today is the result of a great spurt of innovation over the past decade (particularly around ES6). Babel was pivotal in that process. Many of the new language features (e.g. array destructuring) are essentially syntax sugar, and a working implementation as a Babel plugin became both a requirement of the TC39 process, and an important part of the process of developing and refining features - allowing people to test them out and suggest improvements etc.

At this point, the trend towards tooling written in native languages like Rust is irreversible. This is great for DX. However, it does have the unfortunate side effect of making those tools less accessible to JavaScript developers who only "speak" JS. And of course it's JS programmers who are most familiar with the language, most aware of what its rough edges are, and most motivated to play a role in improving the language.

I believe that to enable the continued evolution of JS, it's important to ensure that, as Babel fades into the distance, the new crop of tools replacing it also fulfil the role Babel has played up until now, allowing JS developers to prototype new language features in the language they know best - JavaScript.

Therefore I feel it's important that transformer plugins written in JS continue to be a thing.

More "selfishly", from the point of view of OXC, I think there is also a real opportunity here. Most people's needs will be mostly met by the most common plugins which OXC will offer as standard, implemented in Rust.

However, I would bet that there's a very long tail of projects/companies who rely on at least one less popular Babel/ESLint plugin, and are therefore currently blocked from migrating from Babel/ESLint to OXC/SWC/etc. This is likely a major pain point for them.

Pursuing a goal of satisfying every developer's needs by re-implementing every plugin that has any user base would be an immense maintenance burden. And many companies/developers will not have the capability to do it themselves in Rust. If OXC can offer a solution for plugins in JS, and unlock their path to much faster builds, it could be a significant driver to adoption.

How to do it?

I attempted to tackle exactly this problem on SWC a couple of years ago swc-project/swc#2175.

My first prototype using rkyv as serializer did show solid performance gains vs JSON - around 4x. I had the beginnings of a 2nd version which was way faster again, based on a much faster serializer. But performance was still in roughly same ballpark as Babel, rather than the order of magnitude improvement I was hoping for.

I came to the conclusion that only way to achieve that kind of improvement was to remove serialization from the equation entirely, and this could only be achieved by using an arena allocator. It became clear that SWC's maintainers did not feel JS plugins were a priority, and so would not consider that kind of fundamental re-architecting of the project to support it. So I abandoned the effort.

OXC, of course, already has an arena allocator at its core, so the largest problem is already solved.

How to destroy the overhead

It's really simple.

The requirements of a serialization format are that it must be reasonably space-efficient, and well-specified. Such a format already exists in OXC - the native Rust types for AST nodes.

So don't serialize at all!

OXC stores the entire AST in an arena. Rust can transfer the arena allocator's memory blocks via napi-rs to JavaScript, where it becomes NodeJS Buffer objects. This transfer is just passing pointers, involves no memory copying, and the overhead is close to zero.

On the JS side, you need a deserializer which understands the memory layout of the Rust types. This is the tricky part, but the deserializer code can be generated from a schema, or even from analysis of the type layouts within Rust itself (layout_inspect is a prototype of the latter approach).

(side note: TS type defs can also be auto-generated at same time)

From my experiments on SWC, the JS deserializer can be surprisingly performant (see graph here). Deserializing on JS side was twice as fast as Rust-side serialization with rkyv. I suspect that because the deserializer code is so simple and completely monomorphic, V8 is able to optimize it very effectively.

It's also possible to do the same in reverse. JS passes Buffers back to Rust, you reconstruct the arena, and just cast a pointer back to a &mut Program. Again, this is only possible because of the arena, and because all the AST node types are non-drop.

Complications

Enabling this would require some changes to OXC's internals, some of which are a bit annoying. So there are some trade-offs, and it might only be workable if the project feels it's appropriate to make JS plugins a "first class citizen" of OXC.

Stable type layouts

  1. All AST node types would need to be #[repr(C)] to ensure a stable layout. That's not a big deal in itself, I think, but the annoyance would be that e.g. bool fields would need to move to the last fields of types, to avoid excess padding.

  2. All AST enums would likely need to be #[repr(u8)] with explicit discriminators.

  3. Maybe there'd be a problem maintaining the niche optimization for Options, as deserializer needs to know the niche value for None, which Rust does not expose (I say "maybe" as I can see potential solutions to that).

These annoyances could be largely negated by using proc macros, but at the cost of increased compile times (not sure to what degree).

Strings

2 problems here:

  1. All the data for the AST must be in the arena, or part of the source text, so JS can access it. This imposes some constraints on what you can put in an Atom.

  2. Decoding strings from UTF-8 is the most costly part of the JS deserializer. Each decode involves a call across the JS/native boundary, which is a major slow down. So by far the most efficient way to handle it is to ensure all strings are stored together in one buffer, decode the whole lot in one go, and then slice up the resulting JS string to get each individual string. The allocator would probably need a separate StringStore arena. NB: This does not apply to strings which are already in the source text, as JS has that as a string already.

I don't think either of these are a big problem in the parser, but maybe they are in transformer or minifier?

Pointers

Box and Vec contain 64-bit pointers. On JS side, the deserializer needs to be able to convert a pointer to an offset in a Buffer, but JS does not have a u64 type. A further complication is that the arena is composed of multiple buffers.

This is doable without any changes to OXC's allocator. But to make it really fast might require a new arena allocator implementation which e.g. aligns buffers on 4 GiB memory boundaries, so only the bottom 32 bits of memory addresses are relevant. Or to have a 2nd allocator implementation which uses a WebAssembly.Memory as the backing storage for the arena. WASM Memory in V8 already has the 4 GiB alignment property, and can be extended dynamically up to 4 GiB without memory copies, so entire arena could be a single buffer.

In my opinion, replacing bumpalo could be a gain in itself anyway, as I don't think it's quite as optimized as it could be for OXC's types. But obviously that's significant work.

Further optimizations

Lazy deserialization

The above assumes that the entire AST needs to be deserialized on JS side. But in most cases, a plugin only cares about a few AST node types, which will comprise a small subset of the entire AST. Lazy deserialization could reduce the overhead of deserialization to only the parts of the AST which are actually needed.

Updating the AST

A transformer visitor on JS side could make whatever changes it wants to the AST by directly mutating the data in the buffer. No need to convert to JS Objects and then serialize it all back to a buffer. The user-facing API would hide this behind a "facade" of AST node objects with getters/setters, or Proxys.

This would be difficult to make work without breaking Rust's aliasing rules, as JavaScript allows shared mutable references. And the JS code writing to the buffer would essentially be fiddling with bytes in Rust's memory, so would need to be absolutely bullet-proof to ensure no UB.

This would be a real challenge, but the reward would be extreme speed. JS plugins will never be as fast as native, but my guess is that this could get them at least in the same ball park.

I would not propose that this be part of the v1 implementation, but the potential is I think worth considering when weighing up whether this effort overall is worthwhile or not.

WASM traverser

The number-crunching of following pointers and traversing the AST could be performed in WASM, with WASM returning control back to JS when it's found the next node the visitor wants. WASM is faster than JS, but crossing the JS/WASM boundary can in some circumstances be very low cost.

Conclusion

In my personal opinion:

  • This could be a very performant solution to a common need.
  • This feature could be an opportunity for OXC to differentiate it from other JS tools. Because most other tools don't use arena allocators, they could not do this even if they wanted to.
  • I believe everything I've outlined above is technically achievable.
  • But there are significant challenges, and it would be a large effort.
  • Implementation could proceed in incremental steps. A working first version would only require a subset of the above.
  • In a few cases, some trade-offs with OXC's other aims might be required.

My questions are:

  • Do you see any potential in this?
  • If there are trade-offs required, would they be worth it?

Hopefully it goes without saying that if you are willing to consider something along these lines, I would be keen to work on it.

One last thing: I'm not sure if there's currently a solution for linter plugins on the table, but if not, perhaps this could be it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-astArea - AST

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions