`examples/natural_questions.py` evaluation not computed for yes/no classes 

**Describe the bug**
Running the `examples/natural_questions.py` script I obtain the following evalution:

```
\\|//       \\|//      \\|//       \\|//     \\|//
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
***************************************************
***** EVALUATION | DEV SET | AFTER 2000 BATCHES *****
***************************************************
\\|//       \\|//      \\|//       \\|//     \\|//
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

09/07/2020 11:56:01 - INFO - farm.eval -   
 _________ question_answering _________
09/07/2020 11:56:01 - INFO - farm.eval -   loss: 1.1128250570618547
09/07/2020 11:56:01 - INFO - farm.eval -   task_name: question_answering
09/07/2020 11:56:01 - INFO - farm.eval -   EM: 0.632
09/07/2020 11:56:01 - INFO - farm.eval -   f1: 0.6685447088801921
09/07/2020 11:56:01 - INFO - farm.eval -   top_n_accuracy: 0.918
09/07/2020 11:56:01 - INFO - farm.eval -   report: 
 Not Implemented
09/07/2020 11:56:01 - INFO - farm.eval -   
 _________ text_classification _________
09/07/2020 11:56:01 - INFO - farm.eval -   loss: 0.3979932257842103
09/07/2020 11:56:01 - INFO - farm.eval -   task_name: text_classification
09/07/2020 11:56:01 - INFO - farm.eval -   f1_macro: 0.8045831308172903
09/07/2020 11:56:01 - INFO - farm.eval -   report: 
               precision    recall  f1-score   support

   no_answer     0.8355    0.8806    0.8575       871
        span     0.7878    0.7188    0.7517       537
         yes     0.0000    0.0000    0.0000         0
          no     0.0000    0.0000    0.0000         0

   micro avg     0.8189    0.8189    0.8189      1408
   macro avg     0.4058    0.3999    0.4023      1408
weighted avg     0.8173    0.8189    0.8171      1408
```
In the text_classification part `yes` and `no` classes are reported as 0 for all the scores also support, but I checked that in the dev-set some `yes`/`no` classes are present.

**Expected behavior**
Compute the right scores for `yes`/`no` classes. 



**To Reproduce**
run `examples/natural_questions.py`

**System:**
 - OS: Linux Ubuntu 20.04.1 LTS
 - GPU/CPU: both
 - FARM version: 0.4.7


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`examples/natural_questions.py` evaluation not computed for yes/no classes #529

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

examples/natural_questions.py evaluation not computed for yes/no classes #529

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`examples/natural_questions.py` evaluation not computed for yes/no classes #529