Predictions NER for entities with interpunction

DeepPavlov version: 1.1.1
Python version: 3.10
Operating system: Ubuntu

**Issue**:
I am using the ner_ontonotes_bert_mult model to predict entities for text. For sentences with interpunction in the entities, this gives unexpected results. Before the 1.0.0 release, I used the [Deeppavlov docker image](https://hub.docker.com/r/deeppavlov/base-cpu) with the ner_ontonotes_bert_mult config as well. I didn't encounter these issues with the older version of Deeppavlov.

**Content or a name of a configuration file**:
```
[ner_ontonotes_bert_mult](https://github.com/deeppavlov/DeepPavlov/blob/1.0.2/deeppavlov/configs/ner/ner_ontonotes_bert_mult.json)
```

**Command that led to the unexpected results**:
```python
from deeppavlov import build_model

deeppavlov_model = build_model(
        "ner_ontonotes_bert_mult",
        install=True,
        download=True)

sentence = 'Today at 13:10 we had a meeting'
output = deeppavlov_model([sentence])
print(output[0])
[['Today', 'at', '13', ':', '10', 'we', 'had', 'a', 'meeting']]
print(output[1])
[['O', 'O', 'B-TIME', 'O', 'B-TIME', 'O', 'O', 'O', 'O']]
```

As you can see 13:10 is not recognized as a time entity as a whole, but 13 as B-TIME, : as O, and 10 as B-time. The same happens for names with interpunctions such as `E.A. Jones`. I also tried [the ner_ontonotes_bert configuration](https://github.com/deeppavlov/DeepPavlov/blob/master/deeppavlov/configs/ner/ner_ontonotes_bert.json), but this gave the same results. Since I want to use the model also for languages other than English, this is not an option at all.

I already opened an [issue](https://github.com/deeppavlov/DeepPavlov/issues/1628) about this problem. However, the issue was closed without giving me a satisfying outcome.

I was wondering what I could do to solve this issue, is it possible to fine-tune the model on such examples?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictions NER for entities with interpunction #1642

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Predictions NER for entities with interpunction #1642

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions