Skip to content

Commit bc6f33d

Browse files
committed
fix(parser): correct capacity for tokens Vec
1 parent 68eecce commit bc6f33d

File tree

1 file changed

+6
-3
lines changed
  • crates/oxc_parser/src/lexer

1 file changed

+6
-3
lines changed

crates/oxc_parser/src/lexer/mod.rs

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -131,15 +131,18 @@ impl<'a, C: Config> Lexer<'a, C> {
131131
let source = Source::new(source_text, unique);
132132

133133
// If collecting tokens, allocate enough space so that the `Vec<Token>` will not have to grow during parsing.
134-
// `source_text.len()` is almost always a large overestimate of number of tokens, but it's impossible to have
135-
// more than N tokens in a file which is N bytes long, so it'll never be an underestimate.
134+
// `source_text.len() + 1` is almost always a large overestimate of number of tokens, but it's impossible to
135+
// have more than N + 1 tokens in a file which is N bytes long, so it'll never be an underestimate.
136+
//
137+
// + 1 is to account for the final `Eof` token. Without adding 1, the capacity could be too small for
138+
// minified files which have no space between any tokens. It would also be too small for empty files.
136139
//
137140
// Our largest benchmark file `binder.ts` is 190 KB, and `Token` is 16 bytes, so the `Vec<Token>`
138141
// would be ~3 MB even in the case of this unusually large file. That's not a huge amount of memory.
139142
//
140143
// However, we should choose a better heuristic based on real-world observation, and bring this usage down.
141144
let tokens = if config.tokens() {
142-
ArenaVec::with_capacity_in(source_text.len(), allocator)
145+
ArenaVec::with_capacity_in(source_text.len() + 1, allocator)
143146
} else {
144147
ArenaVec::new_in(allocator)
145148
};

0 commit comments

Comments
 (0)