Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

Convert QACandidates with empty or whitespace answers to no_answers on doc level#756

Merged
julian-risch merged 2 commits intomasterfrom
fix_qacandidates_wrong_indices
May 10, 2021
Merged

Convert QACandidates with empty or whitespace answers to no_answers on doc level#756
julian-risch merged 2 commits intomasterfrom
fix_qacandidates_wrong_indices

Conversation

@julian-risch
Copy link
Copy Markdown
Member

@julian-risch julian-risch commented May 5, 2021

There are no_answer QACandidates with non-zero start/end indices that cause errors as reported in #729 .

Here are two examples. The first examples contains multiple consecutive whitespaces "mass of _ _ will" because latex commands were removed from the original text. The second example contains a whitespace at the very end "geographer. _"

    QA_input = [
        {
            "questions": ["What has a magnitude of about 8.81 meters per second squared?"],
            "text": """What we now call gravity was not identified as a universal force until the work of Isaac Newton. Before Newton, the tendency for objects to fall towards the Earth was not understood to be related to the motions of celestial objects. Galileo was instrumental in describing the characteristics of falling objects by determining that the acceleration of every object in free-fall was constant and independent of the mass of the object. Today, this acceleration due to gravity towards the surface of the Earth is usually designated as  and has a magnitude of about 9.81 meters per second squared (this measurement is taken from sea level and may vary depending on location), and points toward the center of the Earth. This observation means that the force of gravity on an object at the Earth's surface is directly proportional to the object's mass. Thus an object that has a mass of  will experience a force:"""
        }]
    QA_input = [
        {
            "questions": [" When was Isiah Bowman not appointed to President Wilson\'s Inquiry?"],
            "text": """One key figure in the plans for what would come to be known as American Empire, was a geographer named Isiah Bowman. Bowman was the director of the American Geographical Society in 1914. Three years later in 1917, he was appointed to then President Woodrow Wilson's inquiry in 1917. The inquiry was the idea of President Wilson and the American delegation from the Paris Peace Conference. The point of this inquiry was to build a premise that would allow for U.S authorship of a 'new world' which was to be characterized by geographical order. As a result of his role in the inquiry, Isiah Bowman would come to be known as Wilson's geographer. """
        }]

This PR fixes the issue by converting all predicted answers to no_answers that consist of an empty string or contain no other symbols than whitespaces (including tabs). Further, the start/end indices are set to zero and the aggregation level is set to "document" for all no_answers.

Limitations
The fix prevents the question answering models from predicting answers that contain only whitespaces or tabs.
Answers with empty string were already prevented before.

fixes #729

@julian-risch julian-risch requested a review from Timoeller May 5, 2021 13:06
@julian-risch julian-risch changed the title WIP: Convert QACandidates with empty or whitespace answers to no_answers on doc level Convert QACandidates with empty or whitespace answers to no_answers on doc level May 5, 2021
@julian-risch julian-risch marked this pull request as ready for review May 5, 2021 13:07
Copy link
Copy Markdown
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting insights about the white spaces.
Your fix to that seems good.

I made a comment about your use of aggregation level that I dont understand...

@Timoeller Timoeller self-requested a review May 10, 2021 14:47
Copy link
Copy Markdown
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG!

@julian-risch julian-risch merged commit b9fcd26 into master May 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

QACandidates with wrong indices for a no_answer prediction

2 participants