-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Java 版本部分含分号中文句子,无法正常分词 #1936
Copy link
Copy link
Closed
Labels
Description
Describe the bug
A clear and concise description of what the bug is.
Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.
List<Term> terms = HanLP.newSegment().seg("重度,她,芹菜,大,不,旅游;未;重度;她,芹菜;大;不;旅游;未");
for (Term term : terms) {
System.out.printf("%s\t%s \r\n", term.word, term.nature);
}Describe the current behavior
A clear and concise description of what happened.
输出结果如下
重度 b
, w
她 r
, w
芹菜 n
, w
大 a
, w
不 d
, w
旅游 vn
;未 d
;重度 b
;她 rr
, w
芹菜 n
;大 a
;不 d
;旅游 vn
;未 d
分词包含 ;
Expected behavior
A clear and concise description of what you expected to happen.
分词没有 ;
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 11 23H2
- Python version:
- HanLP version: portable-1.8.5
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
- I've completed this form and searched the web for solutions.
Reactions are currently unavailable