Catch empty datasets in Inferencer by Timoeller · Pull Request #605 · deepset-ai/FARM

Timoeller · 2020-10-27T14:26:09Z

fixes #454
Only happens during multiprocessing.

When one multiprocessing chunk only contains invalid examples we run into errors. This PR catches these errors and produces an error message. It does not solve the root cause.

Invalid examples can be:

empty/malformed documents for classification,
empty/malformed documents for QA
QA annotations where answer text and offsets do not conform

tholor

This would fix the inferencer, but don't we have similar issues in other use cases (e.g. training deepset-ai/haystack#488)? Maybe a similar fix inside DataSilo could tackle this?

Timoeller · 2020-10-27T16:21:31Z

I thought so, too and used corrupted files for normal training (using the datasilo). I could not get QA processing to fail with these corrupted files.

I can try corrupting the files a bit more to see if we can reproduce datasilo errors. Will report back here.

Timoeller · 2020-10-28T16:42:46Z

Ok I found out some strange behaviour when the context is either empty or does not contain the answer. I opened a separate issue and assigned @bogdankostic to it.

I would prefer to merge this solution for the inferencer now and work on the datasilo in a separate PR.

tholor

Ok sounds good. Let's tackle that separately

Catch empty datasets

9196119

Timoeller requested a review from tholor October 27, 2020 14:28

tholor reviewed Oct 27, 2020

View reviewed changes

tholor approved these changes Oct 28, 2020

View reviewed changes

Timoeller merged commit c21466c into master Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catch empty datasets in Inferencer#605

Catch empty datasets in Inferencer#605
Timoeller merged 1 commit intomasterfrom
remove_empty_chunks

Timoeller commented Oct 27, 2020

Uh oh!

tholor left a comment

Uh oh!

Timoeller commented Oct 27, 2020

Uh oh!

Timoeller commented Oct 28, 2020

Uh oh!

tholor left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Timoeller commented Oct 27, 2020

Uh oh!

tholor left a comment

Choose a reason for hiding this comment

Uh oh!

Timoeller commented Oct 27, 2020

Uh oh!

Timoeller commented Oct 28, 2020

Uh oh!

tholor left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants