Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

examples/natural_questions.py evaluation not computed for yes/no classes  #529

@ftesser

Description

@ftesser

Describe the bug
Running the examples/natural_questions.py script I obtain the following evalution:

\\|//       \\|//      \\|//       \\|//     \\|//
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
***************************************************
***** EVALUATION | DEV SET | AFTER 2000 BATCHES *****
***************************************************
\\|//       \\|//      \\|//       \\|//     \\|//
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

09/07/2020 11:56:01 - INFO - farm.eval -   
 _________ question_answering _________
09/07/2020 11:56:01 - INFO - farm.eval -   loss: 1.1128250570618547
09/07/2020 11:56:01 - INFO - farm.eval -   task_name: question_answering
09/07/2020 11:56:01 - INFO - farm.eval -   EM: 0.632
09/07/2020 11:56:01 - INFO - farm.eval -   f1: 0.6685447088801921
09/07/2020 11:56:01 - INFO - farm.eval -   top_n_accuracy: 0.918
09/07/2020 11:56:01 - INFO - farm.eval -   report: 
 Not Implemented
09/07/2020 11:56:01 - INFO - farm.eval -   
 _________ text_classification _________
09/07/2020 11:56:01 - INFO - farm.eval -   loss: 0.3979932257842103
09/07/2020 11:56:01 - INFO - farm.eval -   task_name: text_classification
09/07/2020 11:56:01 - INFO - farm.eval -   f1_macro: 0.8045831308172903
09/07/2020 11:56:01 - INFO - farm.eval -   report: 
               precision    recall  f1-score   support

   no_answer     0.8355    0.8806    0.8575       871
        span     0.7878    0.7188    0.7517       537
         yes     0.0000    0.0000    0.0000         0
          no     0.0000    0.0000    0.0000         0

   micro avg     0.8189    0.8189    0.8189      1408
   macro avg     0.4058    0.3999    0.4023      1408
weighted avg     0.8173    0.8189    0.8171      1408

In the text_classification part yes and no classes are reported as 0 for all the scores also support, but I checked that in the dev-set some yes/no classes are present.

Expected behavior
Compute the right scores for yes/no classes.

To Reproduce
run examples/natural_questions.py

System:

  • OS: Linux Ubuntu 20.04.1 LTS
  • GPU/CPU: both
  • FARM version: 0.4.7

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions