datahandler.processor does not covert hundreds of samples to features

**Describe the bug**
The question_answering_crossvalidation.py example discards hundreds of samples and does not convert them to features. Best guess is that there is a problem with the document striding and the tokenizer. Maybe it has been introduced with recent transformer updates.

**Error message**
`ERROR - farm.data_handler.processor -   Unable to convert 577 samples to features. Their ids are : 119-0-0, 127-1-4, ...` and hundreds of warnings like the following:
```
WARNING - farm.data_handler.processor -   Answer using start/end indices is ' 7.5 days' while gold label text is ' 7.5 days'.
Example will not be converted for training/evaluation.
```
Although the gold label text matches the text at the start/end indices, the sample is not converted.
**Expected behavior**
All samples with gold label text matching text at start/end indices should be converted to features.

**To Reproduce**
Steps to reproduce the behavior
Run question_answering_crossvalidation.py from examples of current master branch 1f046cadf9e5090d93a9760593b9788e1fe012c1
@Timoeller reproduced the behavior on a different machine. 

**System:**
 - OS: ubuntu
 - GPU/CPU: GPU 
 - FARM version: 0.6.0


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datahandler.processor does not covert hundreds of samples to features #673

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

datahandler.processor does not covert hundreds of samples to features #673

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions