This repository was archived by the owner on Apr 8, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 248
This repository was archived by the owner on Apr 8, 2025. It is now read-only.
Bug in Tokenization, token_offsets not sorted #420
Copy link
Copy link
Closed
Labels
Description
Describe the bug
While doing inference on QA, I was getting an AssertionError which probably has to do with Tokenization. When I looked into the cause for the AssertionError, I encountered the following weird behavior:
- AssertionError is raised because answer string is empty but answer offsets are not -1
- Answer string is empty because start_character is smaller than end_character
- start_character is smaller than end_character because token_offsets list is not sorted
(see screenshot)
Error message
Traceback (most recent call last):
File "/home/ubuntu/.pycharm_helpers/pydev/pydevd.py", line 1438, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/ubuntu/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/ubuntu/pycharm/FARM/bogdan_scripts/qa_test.py", line 6, in <module>
result = inferencer.inference_from_dicts(dicts, return_json=True)
File "/home/ubuntu/pycharm/FARM/farm/infer.py", line 388, in inference_from_dicts
return list(predictions)
File "/home/ubuntu/pycharm/FARM/farm/infer.py", line 459, in _inference_with_multiprocessing
dataset, tensor_names, baskets
File "/home/ubuntu/pycharm/FARM/farm/infer.py", line 570, in _get_predictions_and_aggregate
baskets=baskets)
File "/home/ubuntu/pycharm/FARM/farm/modeling/adaptive_model.py", line 102, in formatted_preds
preds = head.formatted_preds(logits=logits_for_head, **kwargs)
File "/home/ubuntu/pycharm/FARM/farm/modeling/prediction_head.py", line 1226, in formatted_preds
doc_preds = self.to_qa_preds(top_preds, no_ans_gaps, baskets)
File "/home/ubuntu/pycharm/FARM/farm/modeling/prediction_head.py", line 1276, in to_qa_preds
qa_answer.add_answer(pred_str)
File "/home/ubuntu/pycharm/FARM/farm/modeling/predictions.py", line 107, in add_answer
assert self.offset_answer_end == -1
AssertionError
System:
- OS: Ubuntu 18.04.3 LTS
- GPU/CPU: V100
- FARM version: current master
Reactions are currently unavailable
