feat(ast): add loc field support for AST nodes by mertcanaltin · Pull Request #13285 · oxc-project/oxc

mertcanaltin · 2025-08-24T20:09:11Z

Add line and column information to AST serialization:

extend UTF8ToUtf16 converter with line tracking
support all line break types (\n, \r, \r\n, LS, PS)
add loc options to ESTree serializer configs
implement efficient offset-to-line-column conversion
add SourceLocation struct for ESTree compatibility
integrate with NAPI parser bindings

fyi: #10307

graphite-app · 2025-08-24T20:09:18Z

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

Copilot

Pull Request Overview

This PR adds line and column information support to AST serialization by implementing the loc field for ESTree compatibility. The implementation extends the UTF8ToUtf16 converter with line tracking capabilities and integrates this functionality throughout the serialization pipeline.

Adds line tracking to the UTF8ToUtf16 converter with support for all line break types
Implements SourceLocation and Position structs for ESTree-compatible loc fields
Extends serialization methods to support optional loc information
Integrates loc functionality with NAPI parser bindings

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
napi/parser/src/types.rs	Adds loc field to ParserOptions struct for controlling line/column output
napi/parser/src/lib.rs	Integrates loc support into parser flow and AST serialization
crates/oxc_napi/src/lib.rs	Extends UTF8ToUtf16 conversion to support line tracking
crates/oxc_estree/src/serialize/structs.rs	Adds SourceLocation/Position structs and loc serialization methods
crates/oxc_estree/src/serialize/mod.rs	Extends ESTreeSerializer with loc support
crates/oxc_estree/src/serialize/config.rs	Updates Config structs to track loc option
crates/oxc_ast_visit/tests/test_loc_integration.rs	Adds integration tests for loc functionality
crates/oxc_ast_visit/src/utf8_to_utf16/translation.rs	Implements line break detection and tracking logic
crates/oxc_ast_visit/src/utf8_to_utf16/mod.rs	Adds line/column conversion methods to Utf8ToUtf16
crates/oxc_ast_visit/src/utf8_to_utf16/converter.rs	Adds utility method for offset-to-line-column conversion
crates/oxc_ast/src/serialize/mod.rs	Extends Program serialization methods with loc support

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

crates/oxc_estree/src/serialize/structs.rs

crates/oxc_ast_visit/src/utf8_to_utf16/converter.rs

crates/oxc_ast_visit/src/utf8_to_utf16/translation.rs

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

codspeed-hq · 2025-08-24T21:04:39Z

CodSpeed Performance Report

Merging #13285 will degrade performances by 43.59%

_{Comparing mertcanaltin:mert/feat/add-loc-field-support (0de8b57) with main (0740016)¹}

Summary

❌ 1 regression
✅ 41 untouched
⏩ 3 skipped²

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Mode	Benchmark	`BASE`	`HEAD`	Change
❌	Simulation	`estree[checker.ts]`	104.5 ms	185.3 ms	-43.59%

No successful run was found on main (9c573ec) during the generation of this report, so 0740016 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩
3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

overlookmotel

Obviously this is incomplete at present - nothing actually calls serialize_span_with_loc at the moment, so the JSON won't include locs (as per the TODO comments).

Please let me know if you need any help getting the translation table into the serializer.

I've made a couple of comments in the meantime - apologies if they're premature, and you just haven't got to this yet.

One other broader point: I don't think we need separate *_with_loc methods. serialize_span_with_loc shouldn't be needed. Ditto new_with_loc, and the extra to_estree_ts_json_with_loc etc methods. The existing methods can just have an additional loc: bool param added.

We're not at 1.0 yet, and regularly make breaking changes. Please don't worry about changing existing APIs.

I've marked the PR as draft for now.

crates/oxc_ast/src/serialize/mod.rs

overlookmotel · 2025-08-25T22:45:31Z

crates/oxc_ast_visit/src/utf8_to_utf16/translation.rs

+        let mut index = 0;
+        while index < slice.len() {
+            let byte = slice[index];
+
+            // Handle line breaks
+            if track_lines {
+                let mut is_line_break = false;
+                let mut line_break_len = 1;
+
+                match byte {
+                    b'\n' => is_line_break = true,
+                    b'\r' => {
+                        is_line_break = true;
+                        // Check for \r\n
+                        if index + 1 < slice.len() && slice[index + 1] == b'\n' {
+                            line_break_len = 2;
+                        }
+                    }
+                    // Unicode line separators LS (\u2028) and PS (\u2029)
+                    // LS: E2 80 A8, PS: E2 80 A9
+                    0xE2 if index + 2 < slice.len()
+                        && slice[index + 1] == 0x80
+                        && (slice[index + 2] == 0xA8 || slice[index + 2] == 0xA9) =>
+                    {
+                        is_line_break = true;
+                        line_break_len = 3;
+                    }
+                    _ => {}
+                }
+
+                if is_line_break {
+                    let line_end_offset = start_offset + index + line_break_len;
+                    if line_end_offset < source_text.len() {
+                        #[expect(clippy::cast_possible_truncation)]
+                        let utf8_offset = line_end_offset as u32;
+                        lines
+                            .as_mut()
+                            .unwrap()
+                            .push(LineTranslation { utf8_offset, utf16_difference });
+                    }
+                    index += line_break_len;
+                    continue;
+                }
+            }


I can see a few problems here.

process_slice is only called in the main search loop (while ptr < body_end_ptr { ... }) if the chunk contains non-ASCII characters. So line breaks won't be recorded unless they're at the very start or end of the source text, or they happen to have a non-ASCII char close to them.

The chunk.contains_unicode() check in that loop needs to be replaced with a check for "chunk contains unicode OR \r / \n".

Or maybe better: Check chunk for non-ASCII bytes and call process_slice if any are found (as now). Then do a 2nd check for any \n / \r bytes in the chunk, and call a different function if any are found. That 2nd function can be simpler and faster, as it doesn't have to deal with PS / LS (because they'd be caught by the non-ASCII check).

process_slice is passed a small subset of the source text (in the main loop, 32 bytes). If 0xE2 is the 31st byte, then PS/LS won't be identified, because the last 2 bytes are after the end of the slice.

Irregular line breaks are non-ASCII, so need a Translation record as well as LineTranslation. Simplest way to do that is skip the continue statement if line_break_len == 3.

Also, a small optimization: No need for 2 vars is_line_break and line_break_len. Just line_break_len would do, with 0 representing "not a line break".

overlookmotel · 2025-08-29T21:00:35Z

I'm afraid I had to make a few changes to the UTF-8 -> UTF-16 converter, so it could also be used to convert in the opposite direction UTF-16 -> UTF-8. That'll be the cause of the merge conflicts. The changes are fairly small, so hopefully not too difficult to rebase on top of them.

mertcanaltin · 2025-09-01T13:28:25Z

@overlookmotel You were spot on, fixed everything:

Line break detection now works throughout text (not just near Unicode)
Added chunk boundary fallback for PS/LS edge cases
Removed redundant *_with_loc methods, added loc param instead
serialize_span now auto-translates, showing real coordinates not (0,0)

overlookmotel

I've pushed a couple of fixes just to get more of it to compile.

But still some compilation failures, and some tests are failing.

All of these should pass:

cargo test --all-features
just test-estree
cd napi/parser; pnpm run build-test; RUN_RAW_TESTS=true pnpm test; cd ../..

I've made a few comments. I don't know if they explain the test failures or not.

Just to say (and I hope you don't consider this rude): Obviously we all rely on AI these days, but in my opinion our would-be robot overlords are not quite up to the task yet. So they need a bit of human hand-holding and checking of their work.

Could you please make sure the code compiles and tests pass before next round of review?

Of course, if you run into problems or want to discuss anything, very happy to help.

overlookmotel · 2025-09-02T23:42:36Z

crates/oxc_ast_visit/src/utf8_to_utf16/converter.rs

+        if bytes_from_start_of_range <= self.range_len_utf8 {
+            // Offset is within current range.
+            // `wrapping_add` because `range_start_utf16` can be `u32::MAX`.
+            *offset = self.range_start_utf16.wrapping_add(bytes_from_start_of_range);


Looks like this change was introduced while resolving merge conflicts from #13339. This PR has reverted the change #13339 made, which isn't right. Ditto other changes in this file. convert_offset_back and convert_span_back methods definitely shouldn't be removed!

thanks, I edited

overlookmotel · 2025-09-02T23:54:19Z

crates/oxc_ast_visit/src/utf8_to_utf16/translation.rs

+                let full_offset = start_offset + index;
+                let has_ls_ps = if index + 2 < slice.len() {
+                    // Can check within this slice
+                    slice[index + 1] == 0x80
+                        && (slice[index + 2] == 0xA8 || slice[index + 2] == 0xA9)
+                } else if full_offset + 2 < source_text.len() {
+                    // Need to check in full source text (boundary case)
+                    let source_bytes = source_text.as_bytes();
+                    source_bytes[full_offset + 1] == 0x80
+                        && (source_bytes[full_offset + 2] == 0xA8
+                            || source_bytes[full_offset + 2] == 0xA9)
+                } else {
+                    false
+                };


This can be simplified.

Always check against full source text - to remove a branch.

Don't check if 2 more bytes. source_text is a &str, which is guaranteed valid UTF-8. So if a byte is 0xE2, it's guaranteed there are 2 more bytes after it. So just assert!(full_offset + 2 < source_text.len()) and then check the bytes.

That assertion can never fail. Later on, once we're completely confident of the implementation, we can remove that assert! and use unsafe indexing to get the next 2 bytes without bounds checks. But for now, safer to keep the assertion.

Note: I'm suggesting an assert! as opposed to relying on the implicit bounds checks in source_bytes[full_offset + 1] and source_bytes[full_offset + 2] because the assertion will do 1 bounds check instead of 2.

thanks for optimize, I edited

overlookmotel · 2025-09-02T23:55:30Z

crates/oxc_ast_visit/src/utf8_to_utf16/translation.rs

+                    b'\r' => {
+                        // Check for \r\n
+                        if index + 1 < slice.len() && slice[index + 1] == b'\n' { 2 } else { 1 }
+                    }


I think same problem here as with LS / PS. What if \r is the last byte in the chunk, and \n is in the next chunk? Need to check full source text to avoid that problem.

(but unlike PS / LS, you can't guarantee that \r isn't last byte in the file, so it shouldn't be asserted)

thanks, I edited

crates/oxc_ast_visit/src/utf8_to_utf16/translation.rs

overlookmotel · 2025-09-03T00:03:49Z

crates/oxc_ast_visit/src/utf8_to_utf16/translation.rs

+        // SAFETY: `ptr` is equal to or after `start_ptr`. Both are within bounds of `bytes`.
+        // `ptr` is derived from `start_ptr`.
+        let offset = unsafe { ptr.offset_from_unsigned(start_ptr) };


Move this into the if has_unicode || has_line_breaks { ... } block. If no line breaks or unicode chars in the block, we don't use this value, so avoid calculating it in that case.

thanks, I edited

crates/oxc_ast_visit/tests/test_loc_integration.rs

mertcanaltin · 2025-09-10T20:35:06Z

Thank you for your feedback. I have tried to address the issues you mentioned,

You're absolutely right about the robot overlords needing human guidance 😄 - I appreciate your patience and thorough review.

overlookmotel · 2025-09-11T06:14:10Z

Does not compile. Please see CI fails.

mertcanaltin · 2026-02-21T15:10:39Z

Hey @overlookmotel I sended new commits for comments, Whenever you have time, could you take a look and resolve the open conversations?, really appreciate your patience and help throughout this PR!

Copilot AI review requested due to automatic review settings August 24, 2025 20:09

github-actions bot added A-ast Area - AST C-enhancement Category - New feature or request labels Aug 24, 2025

Copilot AI reviewed Aug 24, 2025

View reviewed changes

crates/oxc_estree/src/serialize/structs.rs Outdated Show resolved Hide resolved

crates/oxc_ast_visit/src/utf8_to_utf16/converter.rs Outdated Show resolved Hide resolved

crates/oxc_ast_visit/src/utf8_to_utf16/translation.rs Outdated Show resolved Hide resolved

Copilot AI reviewed Aug 24, 2025

View reviewed changes

Boshen requested a review from overlookmotel August 25, 2025 01:39

Boshen assigned overlookmotel Aug 25, 2025

overlookmotel reviewed Aug 25, 2025

View reviewed changes

overlookmotel marked this pull request as draft August 25, 2025 22:57

mertcanaltin force-pushed the mert/feat/add-loc-field-support branch from 9cf5a92 to d8f7d8a Compare August 31, 2025 09:08

mertcanaltin requested a review from overlookmotel September 1, 2025 07:26

github-actions bot added the A-parser Area - Parser label Sep 3, 2025

overlookmotel reviewed Sep 3, 2025

View reviewed changes

mertcanaltin force-pushed the mert/feat/add-loc-field-support branch from 3bfda73 to 2962db4 Compare September 13, 2025 11:36

github-actions bot added A-linter Area - Linter A-semantic Area - Semantic A-cli Area - CLI A-transformer Area - Transformer / Transpiler A-codegen Area - Code Generation A-ast-tools Area - AST tools A-editor Area - Editor and Language Server A-formatter Area - Formatter labels Sep 13, 2025

mertcanaltin force-pushed the mert/feat/add-loc-field-support branch 2 times, most recently from 3bfda73 to 566397d Compare September 13, 2025 11:44

mertcanaltin and others added 18 commits February 19, 2026 14:38

changes for review comment

e310614

changes for review comment

372bbd2

changes for review comment

eac3429

conflict solved

795f817

fix: address review feedback for loc field implementation

28a26e3

repair

cf2ce83

conflict solved

731a35f

ts repair for loc

22a06fd

[autofix.ci] apply automated fixes

ca5974e

changes for review comment

3d6d08b

changes for review comment

f93f234

changes for review comment

4b93ef1

changes for review comment

c095dcc

changes for review comment

9a45d48

fix: correct comments and assertions in UTF-8 to UTF-16 conversion logic

075660f

fix: add SourceLocation and loc field to generated types.d.ts

e617144

fix: use let chains to fix clippy collapsible_if warning

083c335

fix: add safety comment for unsafe block in converter.rs

3f3efaf

mertcanaltin force-pushed the mert/feat/add-loc-field-support branch from c15dd0e to 3f3efaf Compare February 19, 2026 11:54

github-actions bot added A-linter Area - Linter A-cli Area - CLI A-ast Area - AST A-ast-tools Area - AST tools labels Feb 19, 2026

mertcanaltin added 4 commits February 19, 2026 14:55

Merge branch 'main' into mert/feat/add-loc-field-support

99fbf4a

feat(napi): add loc field to Comment struct and integrate loc provider

c27154d

fix(napi):correct loc test assertions and fix invalid test cases

087095f

Merge branch 'main' into mert/feat/add-loc-field-support

e200488

mertcanaltin added 2 commits February 21, 2026 20:52

perf(napi): use linear cursorfor loc offset lookups during serialization

99e4d4e

Merge branch 'main' into mert/feat/add-loc-field-support

8c44028

Uh oh!

Comments

Conversation

mertcanaltin commented Aug 24, 2025

Uh oh!

graphite-app bot commented Aug 24, 2025

How to use the Graphite Merge Queue

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

codspeed-hq bot commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #13285 will degrade performances by 43.59%

Summary

Benchmarks breakdown

Footnotes

Uh oh!

overlookmotel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

overlookmotel Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

overlookmotel commented Aug 29, 2025

Uh oh!

mertcanaltin commented Sep 1, 2025

Uh oh!

overlookmotel left a comment

Choose a reason for hiding this comment

Uh oh!

overlookmotel Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

mertcanaltin Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

overlookmotel Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mertcanaltin Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

overlookmotel Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

mertcanaltin Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

overlookmotel Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

mertcanaltin Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mertcanaltin commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

overlookmotel commented Sep 11, 2025

Uh oh!

mertcanaltin commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

codspeed-hq bot commented Aug 24, 2025 •

edited

Loading

overlookmotel left a comment •

edited

Loading

overlookmotel Aug 25, 2025 •

edited

Loading

overlookmotel Sep 2, 2025 •

edited

Loading

mertcanaltin Oct 26, 2025 •

edited

Loading

mertcanaltin commented Sep 10, 2025 •

edited

Loading