fix: search not found with single char #8904

hengfeiyang · 2025-10-28T14:29:51Z

When we search this will got result not found:

match_all('INFO actix_web::middleware::logger: 10.1.62.162 "POST /api/quebec/file/_json HTTP/1.1" 200')

The problem is we removed tokens which length is less than 2 and more than 64. But we didn't do the same when search.

github-actions · 2025-10-28T14:30:34Z

Failed to generate code suggestions for PR

greptile-apps

Greptile Overview

Greptile Summary

Fixed a critical tokenizer inconsistency where o2_collect_tokens wasn't applying the same length filters (min 2 chars, max 64 chars) that are applied during indexing, causing search queries with single-character tokens to fail with "not found" results.

Key Changes:

Replaced direct instantiation of SimpleTokenizer/O2Tokenizer in o2_collect_tokens with a call to o2_tokenizer_build()
Now both indexing and search operations use identical tokenization logic with RemoveShortFilter and RemoveLongFilter
Ensures queries like match_all('INFO actix_web...') work correctly by filtering single-char tokens during search, matching the index behavior

Impact:

Resolves search failures when queries contain single-character tokens
Maintains consistency between index creation and search query processing

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The fix is a simple, well-targeted change that solves a clear tokenizer inconsistency bug. It replaces manual tokenizer instantiation with a call to the existing o2_tokenizer_build() function, ensuring search and indexing use identical token processing. The change is minimal (7 lines removed, 1 line added), has no side effects, and improves correctness without introducing new complexity or risks.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
src/config/src/utils/tantivy/tokenizer/mod.rs	5/5	Fixed tokenizer consistency issue where `o2_collect_tokens` wasn't applying length filters during search, causing single-character token mismatches

Sequence Diagram

sequenceDiagram
    participant User
    participant SearchAPI
    participant o2_collect_tokens
    participant o2_tokenizer_build
    participant RemoveShortFilter
    participant RemoveLongFilter
    participant Index

    User->>SearchAPI: match_all('INFO actix_web...')
    SearchAPI->>o2_collect_tokens: Tokenize search query
    
    Note over o2_collect_tokens: BEFORE: Used SimpleTokenizer/O2Tokenizer directly<br/>(no length filtering)
    Note over o2_collect_tokens: AFTER: Uses o2_tokenizer_build()<br/>(applies length filtering)
    
    o2_collect_tokens->>o2_tokenizer_build: Build tokenizer with filters
    o2_tokenizer_build->>RemoveShortFilter: Apply min_token_length >= 2
    RemoveShortFilter->>RemoveLongFilter: Apply max_token_length <= 64
    RemoveLongFilter-->>o2_collect_tokens: Configured tokenizer
    
    o2_collect_tokens->>o2_collect_tokens: Process tokens with filters
    Note over o2_collect_tokens: Single char tokens removed
    
    o2_collect_tokens-->>SearchAPI: Filtered tokens
    SearchAPI->>Index: Query with filtered tokens
    Index-->>User: Consistent results

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

testdino-playwright-reporter · 2025-10-28T14:55:29Z

⚠️ Test Run Unstable

Author: `hengfeiyang` | Branch: `fix/tantivy-search` | Commit: `9d56927`

Testdino Test Results

Status	Total	Passed	Failed	Skipped	Flaky	Pass Rate	Duration
All tests passed	366	341	0	19	6	93%	4m 39s

View Detailed Results

testdino-playwright-reporter · 2025-10-29T00:41:54Z

⚠️ Test Run Unstable

Author: `hengfeiyang` | Branch: `fix/tantivy-search` | Commit: `927bc0d`

Testdino Test Results

Status	Total	Passed	Failed	Skipped	Flaky	Pass Rate	Duration
All tests passed	366	342	0	19	5	93%	4m 39s

View Detailed Results

testdino-playwright-reporter · 2025-10-29T03:59:03Z

⚠️ Test Run Unstable

Author: `hengfeiyang` | Branch: `fix/tantivy-search` | Commit: `1223564`

Testdino Test Results

Status	Total	Passed	Failed	Skipped	Flaky	Pass Rate	Duration
All tests passed	366	344	0	19	3	94%	4m 57s

View Detailed Results

testdino-playwright-reporter · 2025-10-29T04:12:33Z

⚠️ Test Run Unstable

Author: `hengfeiyang` | Branch: `fix/tantivy-search` | Commit: `1223564`

Testdino Test Results

Status	Total	Passed	Failed	Skipped	Flaky	Pass Rate	Duration
All tests passed	366	346	0	19	1	95%	4m 38s

View Detailed Results

testdino-playwright-reporter · 2025-10-29T06:11:35Z

⚠️ Test Run Unstable

Author: `hengfeiyang` | Branch: `fix/tantivy-search` | Commit: `175f468`

Testdino Test Results

Status	Total	Passed	Failed	Skipped	Flaky	Pass Rate	Duration
All tests passed	366	342	0	19	5	93%	4m 39s

View Detailed Results

testdino-playwright-reporter · 2025-10-29T06:29:18Z

⚠️ Test Run Unstable

Author: `hengfeiyang` | Branch: `fix/tantivy-search` | Commit: `175f468`

Testdino Test Results

Status	Total	Passed	Failed	Skipped	Flaky	Pass Rate	Duration
All tests passed	366	346	0	19	1	95%	4m 39s

View Detailed Results

testdino-playwright-reporter · 2025-10-29T07:15:44Z

⚠️ Test Run Unstable

Author: `haohuaijin` | Branch: `fix/tantivy-search` | Commit: `03dc0ea`

Testdino Test Results

Status	Total	Passed	Failed	Skipped	Flaky	Pass Rate	Duration
All tests passed	366	345	0	19	2	94%	4m 39s

View Detailed Results

testdino-playwright-reporter · 2025-10-29T13:34:44Z

⚠️ Test Run Unstable

Author: `hengfeiyang` | Branch: `fix/tantivy-search` | Commit: `ee5540e`

Testdino Test Results

Status	Total	Passed	Failed	Skipped	Flaky	Pass Rate	Duration
All tests passed	366	343	0	19	4	94%	7m 5s

View Detailed Results

merge to rc9 Co-authored-by: Hengfei Yang <[email protected]>

fix: search not found with single char

9d56927

hengfeiyang requested review from haohuaijin and uddhavdave October 28, 2025 14:29

github-actions bot added the ☢️ Bug Something isn't working label Oct 28, 2025

haohuaijin approved these changes Oct 28, 2025

View reviewed changes

greptile-apps bot reviewed Oct 28, 2025

View reviewed changes

uddhavdave approved these changes Oct 28, 2025

View reviewed changes

fix: add logs for file cache

927bc0d

hengfeiyang added 2 commits October 29, 2025 11:31

fix: add more logs for file cache

2a88958

fix: change remove_file to async

1223564

hengfeiyang added 2 commits October 29, 2025 13:38

Merge branch 'main' into fix/tantivy-search

6e355fc

ci: cargo clippy

175f468

Merge branch 'main' into fix/tantivy-search

03dc0ea

Merge branch 'main' into fix/tantivy-search

ee5540e

hengfeiyang merged commit 848c604 into main Oct 29, 2025
31 of 32 checks passed

hengfeiyang deleted the fix/tantivy-search branch October 29, 2025 23:45

uddhavdave pushed a commit that referenced this pull request Nov 13, 2025

fix: search not found with single char (#8904)

b087d25

uddhavdave added a commit that referenced this pull request Nov 13, 2025

fix: search not found with single char (#8904) (#9099)

0a3535d

merge to rc9 Co-authored-by: Hengfei Yang <[email protected]>

uddhavdave added a commit that referenced this pull request Dec 10, 2025

fix: search not found with single char (#8904) (#9099)

f3d8043

merge to rc9 Co-authored-by: Hengfei Yang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: search not found with single char #8904

fix: search not found with single char #8904

Uh oh!

hengfeiyang commented Oct 28, 2025

Uh oh!

github-actions bot commented Oct 28, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

testdino-playwright-reporter bot commented Oct 28, 2025

Uh oh!

testdino-playwright-reporter bot commented Oct 29, 2025

Uh oh!

testdino-playwright-reporter bot commented Oct 29, 2025

Uh oh!

testdino-playwright-reporter bot commented Oct 29, 2025

Uh oh!

testdino-playwright-reporter bot commented Oct 29, 2025

Uh oh!

testdino-playwright-reporter bot commented Oct 29, 2025

Uh oh!

testdino-playwright-reporter bot commented Oct 29, 2025

Uh oh!

testdino-playwright-reporter bot commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: search not found with single char #8904

fix: search not found with single char #8904

Uh oh!

Conversation

hengfeiyang commented Oct 28, 2025

Uh oh!

github-actions bot commented Oct 28, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

testdino-playwright-reporter bot commented Oct 28, 2025

⚠️ Test Run Unstable

Author: hengfeiyang | Branch: fix/tantivy-search | Commit: 9d56927

Testdino Test Results

Uh oh!

testdino-playwright-reporter bot commented Oct 29, 2025

⚠️ Test Run Unstable

Author: hengfeiyang | Branch: fix/tantivy-search | Commit: 927bc0d

Testdino Test Results

Uh oh!

testdino-playwright-reporter bot commented Oct 29, 2025

⚠️ Test Run Unstable

Author: hengfeiyang | Branch: fix/tantivy-search | Commit: 1223564

Testdino Test Results

Uh oh!

testdino-playwright-reporter bot commented Oct 29, 2025

⚠️ Test Run Unstable

Author: hengfeiyang | Branch: fix/tantivy-search | Commit: 1223564

Testdino Test Results

Uh oh!

testdino-playwright-reporter bot commented Oct 29, 2025

⚠️ Test Run Unstable

Author: hengfeiyang | Branch: fix/tantivy-search | Commit: 175f468

Testdino Test Results

Uh oh!

testdino-playwright-reporter bot commented Oct 29, 2025

⚠️ Test Run Unstable

Author: hengfeiyang | Branch: fix/tantivy-search | Commit: 175f468

Testdino Test Results

Uh oh!

testdino-playwright-reporter bot commented Oct 29, 2025

⚠️ Test Run Unstable

Author: haohuaijin | Branch: fix/tantivy-search | Commit: 03dc0ea

Testdino Test Results

Uh oh!

testdino-playwright-reporter bot commented Oct 29, 2025

⚠️ Test Run Unstable

Author: hengfeiyang | Branch: fix/tantivy-search | Commit: ee5540e

Testdino Test Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Author: `hengfeiyang` | Branch: `fix/tantivy-search` | Commit: `9d56927`

Author: `hengfeiyang` | Branch: `fix/tantivy-search` | Commit: `927bc0d`

Author: `hengfeiyang` | Branch: `fix/tantivy-search` | Commit: `1223564`

Author: `hengfeiyang` | Branch: `fix/tantivy-search` | Commit: `1223564`

Author: `hengfeiyang` | Branch: `fix/tantivy-search` | Commit: `175f468`

Author: `hengfeiyang` | Branch: `fix/tantivy-search` | Commit: `175f468`

Author: `haohuaijin` | Branch: `fix/tantivy-search` | Commit: `03dc0ea`

Author: `hengfeiyang` | Branch: `fix/tantivy-search` | Commit: `ee5540e`