Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

QA Answers at the Beginning of the Document are Labeled as (0, 0) #558

@EhsanM4t1qbit

Description

@EhsanM4t1qbit

I suspect that there is a bug in the function generate_labels https://github.com/deepset-ai/FARM/blob/master/farm/data_handler/input_features.py#L577. The conditional statement should be changed to passage_len > start_idx >= 0. In its current form, this causes an answer that starts from the beginning of the sentence (i.e. start_idx =0) to be labeled as (0, 0). This might be related to #552 .

processor = SquadProcessor(...)
data_silo = DataSilo(processor=processor, batch_size=16, automatic_loading=False)
basic_texts = {"context": "endesa, s.a. financial statements for the year ended 31 december 2018 5 endesa, s.a. "
                          "and subsidiaries consolidated financial statements for the year ended 31 december 2018 207",
 "qas": [{"question": "What is the company name?", "id": "0",
          "answers": [{"text": "endesa", "answer_start": 0},
                      ], "is_impossible": False}]}

data_silo._load_data(train_dicts=[basic_texts])
print(data_silo.data['train'].datasets[0].tensors[6]) # labels
tensor([[[ 0,  0],
         [-1, -1],
         [-1, -1],
         [-1, -1],
         [-1, -1],
         [-1, -1]]])

print(data_silo.data['train'].datasets[0].tensors[-1]) # seq_2_start_t
tensor([8])

After changing the conditional statement to passage_len > start_idx >= 0:

print(data_silo.data['train'].datasets[0].tensors[6]) # labels
tensor([[[ 8, 10],
         [-1, -1],
         [-1, -1],
         [-1, -1],
         [-1, -1],
         [-1, -1]]])
print(data_silo.data['train'].datasets[0].tensors[-1]) # seq_2_start_t
tensor([8])
  • FARM version: 0.4.8

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions