Data Handler dataset creation fails if features are not computed for some baskets 

**Describe the bug**
method `farm.data_handler.processor.Processor._create_dataset` fails if features are not computed for some basket.
The specific case of the error message below is the training of SQuAD question answering, the context of the issue is the following:
 * the input SQuAD data contains some "errors" (some misalignment between answer text and selected text in the context);
 * this misalignment cause the exclusion of the features computation the `_featurize_samples` method (see the try-except):  https://github.com/deepset-ai/FARM/blob/b8b59c4c8fb98b92f24f0e92f9e4f44db5549323/farm/data_handler/processor.py#L295
 * finally in `_create_dataset` when the basket has no feature  `features_flat.extend`  https://github.com/deepset-ai/FARM/blob/b8b59c4c8fb98b92f24f0e92f9e4f44db5549323/farm/data_handler/processor.py#L308
gives the error message reported below.

**Error message**
```
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/fabio/src/git_repositories/FARM/farm/data_handler/data_silo.py", line 124, in _dataset_from_chunk
    dataset = processor.dataset_from_dicts(dicts=dicts, indices=indices)
  File "/home/fabio/src/git_repositories/FARM/farm/data_handler/processor.py", line 1144, in dataset_from_dicts
    dataset, tensor_names = self._create_dataset(keep_baskets=False)
  File "/home/fabio/src/git_repositories/FARM/farm/data_handler/processor.py", line 308, in _create_dataset
    features_flat.extend(sample.features)
TypeError: 'NoneType' object is not iterable
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/fabio/src/git_repositories/DocumentFeaturesIdentificator/bin/farm_qa", line 10, in <module>
    farm_qa.main()
  File "/home/fabio/src/git_repositories/DocumentFeaturesIdentificator/bin/../documentfeaturesidentificator/farm_utils/farm_qa.py", line 211, in main
    options.func(options)
  File "/home/fabio/src/git_repositories/DocumentFeaturesIdentificator/bin/../documentfeaturesidentificator/farm_utils/farm_qa.py", line 107, in train_general_purpose
    train_mock(options, base_lm_model, out_model_basename, train_filename, dev_filename, test_filename)
  File "/home/fabio/src/git_repositories/DocumentFeaturesIdentificator/bin/../documentfeaturesidentificator/farm_utils/farm_qa.py", line 146, in train_mock
    question_answering(options_mock)
  File "/home/fabio/src/git_repositories/DocumentFeaturesIdentificator/bin/../documentfeaturesidentificator/farm_utils/question_answering.py", line 79, in question_answering
    data_silo = DataSilo(processor=processor, batch_size=batch_size, distributed=False)
  File "/home/fabio/src/git_repositories/FARM/farm/data_handler/data_silo.py", line 105, in __init__
    self._load_data()
  File "/home/fabio/src/git_repositories/FARM/farm/data_handler/data_silo.py", line 207, in _load_data
    self.data["train"], self.tensor_names = self._get_dataset(train_file)
  File "/home/fabio/src/git_repositories/FARM/farm/data_handler/data_silo.py", line 176, in _get_dataset
    for dataset, tensor_names in results:
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 865, in next
    raise value
TypeError: 'NoneType' object is not iterable
```

**Expected behavior**
Logging the errors (perhaps with more information) and automatically exclude the basket with errors from the dataset.

**Additional context**
Add any other context about the problem here, like type of downstream task, part of  etc.. 

**To Reproduce**
Manually corrupt squad data test (e.g.: add a character in an answer in train-v2.0.json) and then run examples/question_answering.py 

**System:**
 - OS: Linux
 - GPU/CPU: CPU
 - FARM version: 0.4.6


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Handler dataset creation fails if features are not computed for some baskets #457

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data Handler dataset creation fails if features are not computed for some baskets #457

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions