Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

Bug in Tokenization, token_offsets not sorted #420

@bogdankostic

Description

@bogdankostic

Describe the bug
While doing inference on QA, I was getting an AssertionError which probably has to do with Tokenization. When I looked into the cause for the AssertionError, I encountered the following weird behavior:

  • AssertionError is raised because answer string is empty but answer offsets are not -1
  • Answer string is empty because start_character is smaller than end_character
  • start_character is smaller than end_character because token_offsets list is not sorted
    (see screenshot)

Bildschirmfoto 2020-06-23 um 17 34 52

Error message

Traceback (most recent call last):
  File "/home/ubuntu/.pycharm_helpers/pydev/pydevd.py", line 1438, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/ubuntu/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/ubuntu/pycharm/FARM/bogdan_scripts/qa_test.py", line 6, in <module>
    result = inferencer.inference_from_dicts(dicts, return_json=True)
  File "/home/ubuntu/pycharm/FARM/farm/infer.py", line 388, in inference_from_dicts
    return list(predictions)
  File "/home/ubuntu/pycharm/FARM/farm/infer.py", line 459, in _inference_with_multiprocessing
    dataset, tensor_names, baskets
  File "/home/ubuntu/pycharm/FARM/farm/infer.py", line 570, in _get_predictions_and_aggregate
    baskets=baskets)
  File "/home/ubuntu/pycharm/FARM/farm/modeling/adaptive_model.py", line 102, in formatted_preds
    preds = head.formatted_preds(logits=logits_for_head, **kwargs)
  File "/home/ubuntu/pycharm/FARM/farm/modeling/prediction_head.py", line 1226, in formatted_preds
    doc_preds = self.to_qa_preds(top_preds, no_ans_gaps, baskets)
  File "/home/ubuntu/pycharm/FARM/farm/modeling/prediction_head.py", line 1276, in to_qa_preds
    qa_answer.add_answer(pred_str)
  File "/home/ubuntu/pycharm/FARM/farm/modeling/predictions.py", line 107, in add_answer
    assert self.offset_answer_end == -1
AssertionError

To Reproduce
https://gist.github.com/bogdankostic/5ac3d246033486cd0fdeb3cdb50e1bde

System:

  • OS: Ubuntu 18.04.3 LTS
  • GPU/CPU: V100
  • FARM version: current master

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingstale

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions