Skip to content

[2.10] [MOD-11233] - fix: empty string token should be counted in byteOffset…#6972

Merged
JoanFM merged 2 commits into2.10from
backport-6938-to-2.10
Oct 7, 2025
Merged

[2.10] [MOD-11233] - fix: empty string token should be counted in byteOffset…#6972
JoanFM merged 2 commits into2.10from
backport-6938-to-2.10

Conversation

@JoanFM
Copy link
Collaborator

@JoanFM JoanFM commented Oct 3, 2025

Note

Ensure byte offsets are written for empty tokens to fix cross-field highlighting; update forward index token API and add highlight tests.

  • Core indexing/highlighting:
    • Always write byte offsets during tokenization, including for empty tokens, via writeByteOffsets in src/document.c to preserve correct cross-field highlighting.
    • Move byte-offset writing responsibility out of forwardIndexTokenFunc into caller; compute indexesEmpty near token loop.
    • Change forwardIndexTokenFunc signature to int forwardIndexTokenFunc(ForwardIndexTokenizerCtx *tokCtx, const Token *tokInfo) and add detailed docs; remove internal allOffsets writes.
  • Headers:
    • Update declaration in src/forward_index.h to match new signature.
  • Tests:
    • Add tests/pytests/test_highlight.py covering highlighting across empty fields, arrays, field order, INDEXEMPTY, and empty-string queries.

Written by Cursor Bugbot for commit 975346a. This will update automatically on new commits. Configure here.

…s for highlighting (#6938)

* initial commit

* add newline at end of file

* add debug prints

* test: fix testing index being awaited

* add test from the MOD-11233

* test: simplify tests to be easier to handle/debug

* test: update test

* test: improve test assertion

* add test with some debug prints

* fix: fix tokenization empty

* remove useless field from tokenizer

* fix: simpler fix

* fix: improved behavior for forward index

* fix spell check

* change test name and var name

* change test text

* refactor: byteoffset handling independent

* test: add more tests with highlighting and empty combination

* apply changes as per code review

---------

Co-authored-by: jonathan keinan <[email protected]>
(cherry picked from commit 521847a)
@JoanFM JoanFM requested a review from kei-nan October 3, 2025 14:40
@JoanFM JoanFM changed the title [MOD-11233] - fix: empty string token should be counted in byteOffset… [2.10] [MOD-11233] - fix: empty string token should be counted in byteOffset… Oct 3, 2025
cursor[bot]

This comment was marked as outdated.

@codecov
Copy link

codecov bot commented Oct 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.29%. Comparing base (ba1eb00) to head (975346a).
⚠️ Report is 2 commits behind head on 2.10.

Additional details and impacted files
@@            Coverage Diff             @@
##             2.10    #6972      +/-   ##
==========================================
- Coverage   89.32%   89.29%   -0.03%     
==========================================
  Files         207      207              
  Lines       36165    36167       +2     
==========================================
- Hits        32304    32297       -7     
- Misses       3861     3870       +9     
Flag Coverage Δ
flow 83.81% <100.00%> (-0.16%) ⬇️
unit 42.72% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JoanFM JoanFM added this pull request to the merge queue Oct 6, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 6, 2025
@JoanFM JoanFM added this pull request to the merge queue Oct 6, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 6, 2025
@JoanFM JoanFM added this pull request to the merge queue Oct 6, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 6, 2025
@JoanFM JoanFM added this pull request to the merge queue Oct 7, 2025
Merged via the queue into 2.10 with commit 9f8b4b6 Oct 7, 2025
13 checks passed
@JoanFM JoanFM deleted the backport-6938-to-2.10 branch October 7, 2025 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants