This repository was archived by the owner on Apr 8, 2025. It is now read-only.
Convert QACandidates with empty or whitespace answers to no_answers on doc level#756
Merged
julian-risch merged 2 commits intomasterfrom May 10, 2021
Merged
Conversation
Timoeller
reviewed
May 7, 2021
Contributor
Timoeller
left a comment
There was a problem hiding this comment.
Interesting insights about the white spaces.
Your fix to that seems good.
I made a comment about your use of aggregation level that I dont understand...
…preds on doc level
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There are no_answer QACandidates with non-zero start/end indices that cause errors as reported in #729 .
Here are two examples. The first examples contains multiple consecutive whitespaces "mass of _ _ will" because latex commands were removed from the original text. The second example contains a whitespace at the very end "geographer. _"
This PR fixes the issue by converting all predicted answers to no_answers that consist of an empty string or contain no other symbols than whitespaces (including tabs). Further, the start/end indices are set to zero and the aggregation level is set to "document" for all no_answers.
Limitations
The fix prevents the question answering models from predicting answers that contain only whitespaces or tabs.
Answers with empty string were already prevented before.
fixes #729