Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

datahandler.processor does not covert hundreds of samples to features #673

@julian-risch

Description

@julian-risch

Describe the bug
The question_answering_crossvalidation.py example discards hundreds of samples and does not convert them to features. Best guess is that there is a problem with the document striding and the tokenizer. Maybe it has been introduced with recent transformer updates.

Error message
ERROR - farm.data_handler.processor - Unable to convert 577 samples to features. Their ids are : 119-0-0, 127-1-4, ... and hundreds of warnings like the following:

WARNING - farm.data_handler.processor -   Answer using start/end indices is ' 7.5 days' while gold label text is ' 7.5 days'.
Example will not be converted for training/evaluation.

Although the gold label text matches the text at the start/end indices, the sample is not converted.
Expected behavior
All samples with gold label text matching text at start/end indices should be converted to features.

To Reproduce
Steps to reproduce the behavior
Run question_answering_crossvalidation.py from examples of current master branch 1f046ca
@Timoeller reproduced the behavior on a different machine.

System:

  • OS: ubuntu
  • GPU/CPU: GPU
  • FARM version: 0.6.0

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions