feat(code-review): entity-level semantic diff in the code review panel by monotykamary · Pull Request #9917 · warpdotdev/warp

monotykamary · 2026-05-02T15:39:43Z

Description

Add entity-level (semantic) diff understanding to the Code Review panel, so diffs are presented in terms of code entities (functions, classes, structs, methods) rather than raw lines.

What changes:

New crates/languages/src/semantic_diff.rs module (~860 lines) that extracts named code entities from two versions of a file using existing tree-sitter infrastructure and classifies changes between them
SemanticDiff feature flag (dogfood-enabled) gating all new behavior — no impact when disabled
Entity summary pills rendered in the file sidebar row: fn foo (modified) · bar → baz (renamed) · struct X (added)
Entity-aware stats in the file header: +5 • -3 (2 modified, 1 renamed, 1 added)
Entity diff computed at FileDiffAndContent construction time alongside existing line-based diff

Why this matters:

Line-based diffs with ±4 context don't align with how humans read code (by function, by class)
A renamed function shows as "+5 -5" rather than "renamed" — this PR classifies it correctly
Formatting-only changes (cargo fmt, prettier) are indistinguishable from logic changes — this PR classifies them as "unchanged"
No major editor has shipped this as a built-in feature yet (JetBrains issue open for 21 years, SemanticDiff is a VS Code extension only) — Warp would be the first

How it works:

Reuses existing tree-sitter parsing (arborium) and identifiers.scm queries — no new parser dependencies
Three-phase entity matching: exact name → structural hash (detects renames) → fuzzy Jaccard similarity (>80% token overlap)
Body hash skips the signature line so renames produce matching hashes
Whitespace-normalized hashing so cargo fmt / prettier runs classify entities as "unchanged"
Falls back silently to standard line-based diff for: unsupported languages (no identifiers.scm), parse failures, binary files, new/untracked files
Works on all platforms including WASM (entity_diff is None when local_fs is unavailable)

Future phases (not in this PR, but the architecture supports them):

Entity-scoped context (show entire function instead of ±4 lines)
Rename/move visual indicators in the diff gutter
AI context integration (send entities instead of line hunks)

Linked Issue

The linked issue is labeled ready-to-spec or ready-to-implement.
- Closes Feature Request: Entity-level (semantic) diff review in the Code Review panel #9898

Screenshots / Videos

This is a feature-flagged change with no visual impact when SemanticDiff is disabled. When enabled, the file sidebar shows entity pills and the file header shows augmented stats. Screenshots will be added once the feature is enabled in a dogfood build.

Testing

New unit tests (19 in crates/languages/src/semantic_diff.rs):

Entity extraction: Rust, Python, JavaScript, unsupported language
Entity matching: modified, unchanged, renamed, added, deleted, moved, mixed
Rename detection end-to-end (Rust function rename)
Formatting-only changes classified as Unchanged
Body hash whitespace normalization
Tokenization and Jaccard similarity
Language without identifiers.scm returns None
Empty entity lists produce None

Existing tests continue to pass:

75 code_review tests
20 diff_state tests
22 languages tests
Test fixture in code_review_view_tests.rs updated with entity_diff: None

No integration test added — this feature is behind a feature flag and has no user-visible flow when disabled. The entity extraction and matching are purely computational (no UI interaction to test end-to-end). Integration coverage can be added when the feature is promoted beyond dogfood.

Agent Mode

Warp Agent Mode - This PR was created via Warp's AI Agent Mode

Co-Authored-By: Warp [email protected]

Add entity-level (semantic) diff understanding to the Code Review panel, so diffs are presented in terms of code entities (functions, classes, structs) rather than raw lines. What: - New semantic_diff module in crates/languages with entity extraction, 3-phase matching (exact name → structural hash → fuzzy similarity), and change classification (modified/renamed/added/deleted/moved/unchanged) - SemanticDiff feature flag (dogfood-enabled) gating all new behavior - Entity summary pills in file sidebar (e.g. 'fn foo (modified) · bar → baz (renamed) · struct X (added)') - Entity-aware stats in file header (e.g. '+5 -3 (2 modified, 1 renamed)') - Entity diff computed at FileDiffAndContent construction from base and current file content using existing tree-sitter infrastructure Why: - Line-based diffs with ±4 context don't align with how humans read code - Renamed functions show as +5 -5 rather than 'renamed' - Formatting-only changes are indistinguishable from logic changes - Upstream issue warpdotdev#9898 requests this capability How: - Reuses existing tree-sitter parsing (arborium) and identifiers.scm queries — no new parser dependencies - Body hash skips signature line so renames produce matching hashes - Whitespace-normalized hashing so formatting-only changes classify as Unchanged - Falls back to standard line-based diff for unsupported languages (no identifiers.scm) or parse failures - Works on all platforms including WASM (entity_diff is None when local_fs is unavailable) Tests: 19 unit tests covering extraction, matching, rename detection, formatting-only classification, and multi-language support (Rust, Python, JavaScript). All existing code_review tests (75) and diff_state tests (20) continue to pass. Co-Authored-By: Warp <[email protected]>

oz-for-oss · 2026-05-02T15:40:23Z

@monotykamary

I'm starting a first review of this pull request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

oz-for-oss

Overview

This PR adds entity-level semantic diff extraction, stores the result with code review file diffs, and renders semantic summaries in the code review UI behind a feature flag.

Concerns

The UI change has no attached screenshot or video. For faster review, please upload screenshots or a video of the feature working end to end.
The current implementation can leave semantic summaries stale after watcher-driven per-file invalidations.
Entity extraction currently misses bare @definition captures and can include non-definition captures from existing identifiers.scm files.
The body hash skips all content for single-line entities, so real one-line logic changes can be classified as unchanged.
The implementation adds synchronous full-file reads and tree-sitter parsing without the existing size/local-fs guardrails.

Security

The semantic diff path parses repository-controlled file contents without a size cap before invoking tree-sitter, which can create unbounded CPU/memory work in the code review load path.

Verdict

Found: 0 critical, 7 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

zachlloyd · 2026-05-02T15:54:54Z

could i see a screencast of this?

@Definition

Fix 6 issues from review on PR warpdotdev#9917: 1. fix(code_review_view): update entity_diff on watcher refresh - Add current.entity_diff = diff.entity_diff.clone() alongside current.file_diff = diff.file_diff in update_from_single_file_diff_result so entity summary doesn't go stale after edits 2. fix(diff_state): guard entity diff computation with diff-size limits - Skip entity diff for DiffSize::Large/Unrenderable files - Cap current file content at 4.375 MB before parsing - Prevents unbounded CPU/memory on large repo-controlled files 3. fix(diff_state): gate fs::read_to_string behind local_fs cfg - Wrap entity diff computation in #[cfg(feature = "local_fs")] with None fallback for non-local-fs/WASM builds - Both construction sites (diff_state_against_head and file_diff_for_path) are now properly gated 4. fix(semantic_diff): accept bare @Definition captures - Previous filter skipped bare @Definition (used by 18+ languages including C, C++, C#, Java, JavaScript) and leaked @reference.call - Now accepts any capture starting with "definition", rejects everything else - type_prefix extracted via strip_prefix("definition.") 5. fix(semantic_diff): mask name token instead of skipping signature - Skipping the first line made single-line entities hash the empty string (fn foo() { 1 } and fn foo() { 2 } classified as Unchanged) - Now masks (replaces with spaces) only the name token's byte range in the entity body, preserving signature structure while making the hash invariant to renames - Adds tests for rename-invariant hashing and single-line entity body changes 6. fix(semantic_diff): use neighbor-based Moved classification - Absolute line-shift heuristic caused false Moved when lines were inserted above an entity - Now compares immediate neighbors in the entity ordering: both neighbors must change for Moved, preserving robustness to insertions/deletions above - Adds test_insertion_above_does_not_cause_moved Co-Authored-By: Warp <[email protected]>

monotykamary · 2026-05-02T16:50:58Z

Oh yes, give me a hot moment 🚀

…diffing Add SemanticDiff feature flag (Dogfood) that enables two complementary improvements to the code review diff panel: 1. Entity classification (crates/languages/src/semantic_diff.rs) - Extract semantic entities (functions, methods, structs, impls) from tree-sitter using existing identifiers.scm queries. - 3-phase matching across base/current: exact name → structural body hash → fuzzy Jaccard similarity (>80%). - Classifies changes as: Unchanged, Modified, Renamed{old_name}, Added, Deleted, Moved (via neighbor position shift detection). - Body hashes use name-token masking so renames alone don't affect the hash. - Gated behind #[cfg(feature = "local_fs")] with 4.375 MB file size cap and DiffSize::Large/Unrenderable guard. 2. Syntax-token diff (replacing line-based hunks for supported languages) - Parse both files with tree-sitter, collect leaf tokens (DFS). - Comments and strings treated as opaque (atomic tokens) to avoid fragmented partial matches within rewritten comment blocks. - LCS diff on token hashes via similar::TextDiff (same Patience algorithm as line diff, but on tokens). - Map token-level ops to line-level SyntaxHunks. - Multi-line rewrites → clean Deletion + Addition blocks (no inline noise). - Single-line changes → Replacement with per-word inline highlights (e.g. renamed function names). - Falls back to line-based diff for unsupported languages or parse failures. 3. Entity-scoped context (app/src/code/editor/model.rs) - calculate_hidden_lines() expands visible ranges to cover entire entities containing changes instead of fixed ±4 context lines. - Added EntityRange { start_line, end_line, kind } to CodeEditorModel, wired from diff_state entity_diff data. 4. removed visual noise per user feedback - Removed sidebar entity summary pills (render_entity_summary). - Removed file header entity counts (render_file_stats_with_entities reverted to render_file_stats). - These were duplicating existing diff information and adding clutter. 5. Infrastructure - Added file_path to DiffModel for language detection in syntax diff. - Added similar.workspace = true to crates/languages/Cargo.toml. - Added 30 unit tests covering entity extraction, matching, classification, syntax-token diff, word-level highlight alignment, and comment opacity. Refs warpdotdev#9898

monotykamary · 2026-05-02T19:50:11Z

It's currently a bit naive, but currently experimenting what would look the best:

Current:

Upcoming:

The work before the visuals are closer to sem, but my colleague wanted something closer to difftastic. I'll continue experimenting a clean up the PR.

zachlloyd · 2026-05-02T19:53:18Z

@kevinyang372 and @alokedesai will be the most knowledgeable here

kevinyang372 · 2026-05-03T04:29:47Z

@monotykamary I am a bit confused by what I am looking at with your attached screenshot:

There are overlapping addition and deletion highlights
Line 17 should be a replacement from "match currency {" --> "let symbol = match currency {" but it's shown as a pure addition
Looks like there should be a deletion after line 31 but that no longer shows up in the screenshot below

Also looks like there are gaps between gutter diff highlighting and the line diff highlighting

I would love to see better semantic highlighting in Warp too! But my understanding is this is a much bigger change that changes the underlining meaning of the diff highlights. We should spec out what the right product behavior is first

monotykamary · 2026-05-03T05:01:11Z

Oh that's right, my approach is much too naive for what really good semantic diffs would offer for the code reviewer.

I'll leave this PR here for experimentation (and update the issue to capture the scope a bit better). Token based diffing is easy to test and verify, but surprisingly bad to look at. I understand why dax has been saying diffs are hard.

cla-bot Bot added the cla-signed label May 2, 2026

github-actions Bot added the external-contributor Indicates that a PR has been opened by someone outside the Warp team. label May 2, 2026

monotykamary marked this pull request as ready for review May 2, 2026 15:40

oz-for-oss Bot requested changes May 2, 2026

View reviewed changes

monotykamary marked this pull request as draft May 2, 2026 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(code-review): entity-level semantic diff in the code review panel#9917

feat(code-review): entity-level semantic diff in the code review panel#9917
monotykamary wants to merge 3 commits intowarpdotdev:masterfrom
monotykamary:monotykamary/semantic-diff-review

monotykamary commented May 2, 2026 •

edited

Loading

Uh oh!

oz-for-oss Bot commented May 2, 2026 •

edited

Loading

Uh oh!

oz-for-oss Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zachlloyd commented May 2, 2026

Uh oh!

monotykamary commented May 2, 2026

Uh oh!

monotykamary commented May 2, 2026

Uh oh!

zachlloyd commented May 2, 2026

Uh oh!

kevinyang372 commented May 3, 2026

Uh oh!

monotykamary commented May 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

monotykamary commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Linked Issue

Screenshots / Videos

Testing

Agent Mode

Uh oh!

oz-for-oss Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oz-for-oss Bot left a comment

Choose a reason for hiding this comment

Overview

Concerns

Security

Verdict

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zachlloyd commented May 2, 2026

Uh oh!

monotykamary commented May 2, 2026

Uh oh!

monotykamary commented May 2, 2026

Uh oh!

zachlloyd commented May 2, 2026

Uh oh!

kevinyang372 commented May 3, 2026

Uh oh!

monotykamary commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

monotykamary commented May 2, 2026 •

edited

Loading

oz-for-oss Bot commented May 2, 2026 •

edited

Loading

monotykamary commented May 3, 2026 •

edited

Loading