Skip to content

Java 版本部分含分号中文句子,无法正常分词 #1936

@Leon406

Description

@Leon406

Describe the bug
A clear and concise description of what the bug is.

Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.

        List<Term> terms = HanLP.newSegment().seg("重度,她,芹菜,大,不,旅游;未;重度;她,芹菜;大;不;旅游;未");
        for (Term term : terms) {
            System.out.printf("%s\t%s \r\n", term.word, term.nature);
        }

Describe the current behavior
A clear and concise description of what happened.

输出结果如下

重度	b 
,	w 
她	r 
,	w 
芹菜	n 
,	w 
大	a 
,	w 
不	d 
,	w 
旅游	vn 
;未	d 
;重度	b 
;她	rr 
,	w 
芹菜	n 
;大	a 
;不	d 
;旅游	vn 
;未	d 

分词包含

Expected behavior
A clear and concise description of what you expected to happen.
分词没有
System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 11 23H2
  • Python version:
  • HanLP version: portable-1.8.5

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

  • I've completed this form and searched the web for solutions.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions