-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
DeepPavlov version: 1.1.1
Python version: 3.10
Operating system: Ubuntu
Issue:
I am using the ner_ontonotes_bert_mult model to predict entities for text. For sentences with interpunction in the entities, this gives unexpected results. Before the 1.0.0 release, I used the Deeppavlov docker image with the ner_ontonotes_bert_mult config as well. I didn't encounter these issues with the older version of Deeppavlov.
Content or a name of a configuration file:
[ner_ontonotes_bert_mult](https://github.com/deeppavlov/DeepPavlov/blob/1.0.2/deeppavlov/configs/ner/ner_ontonotes_bert_mult.json)
Command that led to the unexpected results:
from deeppavlov import build_model
deeppavlov_model = build_model(
"ner_ontonotes_bert_mult",
install=True,
download=True)
sentence = 'Today at 13:10 we had a meeting'
output = deeppavlov_model([sentence])
print(output[0])
[['Today', 'at', '13', ':', '10', 'we', 'had', 'a', 'meeting']]
print(output[1])
[['O', 'O', 'B-TIME', 'O', 'B-TIME', 'O', 'O', 'O', 'O']]As you can see 13:10 is not recognized as a time entity as a whole, but 13 as B-TIME, : as O, and 10 as B-time. The same happens for names with interpunctions such as E.A. Jones. I also tried the ner_ontonotes_bert configuration, but this gave the same results. Since I want to use the model also for languages other than English, this is not an option at all.
I already opened an issue about this problem. However, the issue was closed without giving me a satisfying outcome.
I was wondering what I could do to solve this issue, is it possible to fine-tune the model on such examples?