-
Notifications
You must be signed in to change notification settings - Fork 10.9k
索引与查找使用相同的analyzer,结果无法命中 #1851
Copy link
Copy link
Closed
Labels
Description
以下是lucene9.7的官方示例,仅修改了保存值。
@org.junit.jupiter.api.Test
public void test3() throws IOException, ParseException {
Analyzer analyzer = new HanLPAnalyzer();
Path indexPath = Files.createTempDirectory("tempIndex");
Directory directory = FSDirectory.open(indexPath);
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
Document doc = new Document();
String text = "中国人";
doc.add(new TextField("fieldname", text, Field.Store.YES));
iwriter.addDocument(doc);
iwriter.close();
// Now search the index:
DirectoryReader ireader = DirectoryReader.open(directory);
IndexSearcher isearcher = new IndexSearcher(ireader);
// Parse a simple query that searches for "text":
QueryParser parser = new QueryParser("fieldname", analyzer);
Query query = parser.parse(text);
ScoreDoc[] hits = isearcher.search(query, 10).scoreDocs;
assertEquals(1, hits.length);
// Iterate through the results:
StoredFields storedFields = isearcher.storedFields();
for (int i = 0; i < hits.length; i++) {
Document hitDoc = storedFields.document(hits[i].doc);
assertEquals("中国人", hitDoc.get("fieldname"));
}
ireader.close();
directory.close();
IOUtils.rm(indexPath);
}
运行结果:
org.opentest4j.AssertionFailedError:
Expected :1
Actual :0
调试过程中发现:analyzer的查找分词会将 中国人 分成 中国,人。导致查询不到。
但commit 和 search 是使用的同一个analyzer。
尝试将搜索条件 修改成 A 中国人,发现可以命中结果,此时查询时分词正常,分成 A,中国人。、
这是一个bug还是特性?
System information
- WIN11
- HanLP-portable:1.8.4
- hanlp-lucene-plugin:1.1.7
- I've completed this form and searched the web for solutions.
Reactions are currently unavailable