0% found this document useful (0 votes)
118 views37 pages

XRay OCR 4 Polygon-Based Annotation

The XRayOCR4 Polygon-based Annotation Guidelines detail the process for annotating images with polygon bounding boxes around words, lines, and paragraphs. It includes step-by-step instructions for drawing polygons, transcription guidelines, and reject reasons for incorrect annotations. The document emphasizes the importance of accurate language selection and proper categorization of text elements during the annotation process.

Uploaded by

correamauespedro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views37 pages

XRay OCR 4 Polygon-Based Annotation

The XRayOCR4 Polygon-based Annotation Guidelines detail the process for annotating images with polygon bounding boxes around words, lines, and paragraphs. It includes step-by-step instructions for drawing polygons, transcription guidelines, and reject reasons for incorrect annotations. The document emphasizes the importance of accurate language selection and proper categorization of text elements during the annotation process.

Uploaded by

correamauespedro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

XRayOCR4 Polygon-based

Annotation Guidelines
Last Updated: October 30, 2024
Authors: Pooja Sethi, Jason Kim (BI), Alan Li, Shivangini Patil, Christian Arroyo
Adapted from FAIR guidelines: [Link] (by Jing Huang, Guan Pang)

Goal........................................................................................................................................ 1
Step 1: Word-level polygons.................................................................................................... 4
How to draw polygons........................................................................................................4
What to annotate................................................................................................................5
Step 2: Line-level polygons...................................................................................................... 6
EXAMPLES:............................................................................................................................ 7
Step 3: Paragraph-level polygons............................................................................................ 11
EXAMPLES:.......................................................................................................................... 13
Reject Reasons.......................................................................................................................... 19
Step 4: Transcription Guidelines.............................................................................................20
Examples of Annotations....................................................................................................... 22
Step 5. Language Annotation................................................................................................. 32
FAQs........................................................................................................................................... 33

Goal
●​ Draw polygon bounding box/es around words and paragraphs in images and annotate
them.
●​ Here is a short step-by-step instruction:
1.​ Draw polygon bounding boxes around all single words even if illegible.
2.​ Transcribe the words. Please type a single period in the transcription boxes if you
cannot transcribe them.
3.​ In the annotation field, please choose "Single Word" from the drop-down menu.
4.​ Choose the language from the drop-down menu. If the language is unknown or
not present on the list, select “Other”.
5.​ After that, draw line bounding boxes.
6.​ Do not transcribe lines - simply type </s> in the transcription boxes.

1
7.​ In the annotation field, please choose “Line Of Text” from the drop-down menu.
8.​ Choose the language from the drop-down menu. If the language is unknown or
not present on the list, select “Other”.
9.​ And finally, draw paragraph bounding boxes in addition to word-level and
line-level bounding boxes.
10.​Do not transcribe paragraphs - simply type </p> in the transcription boxes.
11.​In the annotation field, please choose “Block Of Text” from the drop-down
menu.
12.​Choose the language from the drop-down menu. If the language is unknown or
not present on the list, select “Other”.

Once all polygons have been drawn, they will look as shown below:

The annotations should look as follows:

1.​ Single Word Level Annotation:

2
2.​ Line Level Annotation:

3.​ Paragraph Level Annotation:

3
Step 1: Word-level polygons
This is the first step in the annotation process. Polygons need to be drawn around words and
numbers. Make sure all words inside the polygon are annotated accurately and that both the
language and “Single Word” are marked correctly in the annotation box.

How to draw polygons

●​ Please draw polygons ideally with 4 points, more only if necessary (e.g. for curved
words).

4
●​ Start drawing a polygon from the top-left corner and proceed clockwise, so that the
dotted line is always above the word. To close the polygon, right-click the mouse
button.
o​ Please always make the dotted line above the words:

●​ When annotating single words, select "Single Word" from the drop-down menu.
●​ Choose the language from the drop-down menu. If the language is unknown or not
present on the list, select “Other”. If you cannot transcribe the word, please type a single
period in the transcription box.

What to annotate
●​ Draw moderately tight boxes around all individual words even if they are illegible.
●​ Annotate even if the word is vertical/ diagonal/ curved.
●​ Annotate boxes around words independently of the language/ alphabet used. An
educated guess is good enough for languages where word separation is unclear.
●​ Annotate numbers (e.g. dates, telephone numbers).
●​ Annotate signatures.
●​ Include punctuation signs such as exclamation/question marks, periods, slashes,
parenthesis, quotes, etc. together with the nearest word. This applies to numbers too.
●​ Emails/ URLs/ telephone numbers/ dates (e.g. 13/11/18) are almost always considered
one word. Sometimes dates can appear as different words, particularly if the month is
spelled out and there are clearly spaces. For example, 11/04/2017 would be one word,
but November 4, 2017, would be 3 words: “November”, “4”, and “2017.”
●​ If there's a space, it probably should be two words. This includes people's names (first
name and last name are different words) and units of measurement (numbers and units
are also different words).

●​ If the word is partly occluded / on the margin, select the visible part of the word. If it is
mostly occluded, just ignore it.

5
●​ If the word is tiny (as in really tiny), no need to annotate it.
●​ Sometimes text boxes overlap. In the extreme case, text can be completely inside more
text. In these cases, the box of the larger word can contain part / all of the smaller
words, but the smaller words still need their own bounding box.

●​ Text may appear in the background in an almost transparent manner. In these cases,
unless the background is very clearly visible, just annotate the foreground text.

Step 2: Line-level polygons


This second step must always come after word-level annotation. Make sure all words inside the
polygon are annotated by their own bounding boxes correctly.

A line is a sequence of words that starts at the edge of the image or after a space, tab, or new
line. It continues until it hits the other edge of the image. A line is based on where it appears on
the page. Sometimes, a sentence or even a single word can take up one or more than one line.

When drawing polygons, think about how the line fits with others. Some lines should be
grouped together, while others should be separate. For instance, if a page has two columns of
text, there should be two separate sets of polygons for each column.

●​ Before drawing line-level polygons, make sure all word-level polygons and annotations
are done per the requirements.

6
●​ Use polygons with 4 points (ideally rectangles), but if not possible (e.g. for curved words
or strangely shaped sentences), feel free to add more points.
●​ Do not draw line polygons for single words - a line polygon is a sequence of at least
two words.
●​ Consider the following when grouping lines together:
■​ Text style (font, color, size);
■​ Context;
■​ Distance and format of lines from each other.
●​ Try to enclose all word-level annotations within the line-level annotation.
●​ Annotate line-level polygons as </s> instead of transcribing them.

●​ In the annotation field, please choose “Line Of Text” from the drop-down menu.
●​ Choose the language from the drop-down menu. If the language is unknown or not
present on the list, select “Other”.
●​ Please use the same order as for the word-level bounding boxes, i.e. start from the
top-left level and proceed clockwise. The dotted lines must be above the lines.

EXAMPLES:
Disclaimer!
The images below are illustrative to show line-level polygons only, and they do not contain
other types of polygons. Submitting jobs that only contain line-level annotations will be
considered incorrect.

Example 1:

7
NOTE:
-​ FREAKONOMICS was not labeled with a text line since it is just one word on the line.
-​ LEVITT is not labeled as textline since it is one word and belongs to “STEVEN B. LEVITT”.
-​ DUBNER is not labeled as texline since it is one word and belongs to “STEPHEN J.
DUBNER”.

Example 2:

8
NOTE:
-​ Samosas was not labeled with a text line since it is just one word on the line.
-​ $10.99 was not labeled with a text line since it is just one word on the line.
-​ vegetarian was not labeled with a text line since it is just one word on the line.
-​ $11.99 was not labeled with a text line since it is just one word on the line.

Example 3:

NOTE:

9
-​ 13:45 was not labeled with a text line since it is just one word on the line. This should be
annotated only at the single-word level. No need to draw paragraph polygons in this
case.
-​ The line containing “Flug nach uber …” is grouped together since they all fall into the
information regarding the flight. It is recognized that this ordering is somewhat
subjective.

Example 4:

NOTE:
-​ Note the two columns here should be recognized as separate lines.
-​ “STEAK” was not annotated as a line since it is just one word.

Example 5:

10
-​ In this case, only the first line which reads: “You are the…” needs to be annotated in
line-level. Everything else is just one word.

Step 3: Paragraph-level polygons


This third and final step must always come after word-level and line-level annotations. Make
sure all words inside the polygon are annotated by their own bounding boxes correctly and that
</s> is used as per the instructions for line-level annotations (point 2 above). Then, continue to
draw paragraph-level polygons where needed. To do so, please follow the instructions below:

●​ One or more than one sentence CAN be considered one paragraph.


●​ Do not draw paragraph polygons for single words - a paragraph polygon must
contain at least two words.
●​ If there is just one line that could be treated as a line and also as a paragraph
because of being one logical group of text, then it is just enough to go with the
line annotation without marking it again with a paragraph.
●​ Consider the following when grouping sentences together:
○​ Text style (font, color, size).
○​ Context (Is the sentence or phrase standalone? Can it be understood
without the previous or next sentences?).
○​ Distance and format of sentences from each other.
○​ Determine whether the current sentence needs context from other
surrounding sentences to make sense. If not, then it should be labeled as
a separate paragraph box.

11
*Note that the above are not hard and fast rules and must be taken into
consideration together. For example, LOGOs tend to be of different font sizes but
should be grouped together because of context*.
●​ Annotate paragraph-level polygons as </p> instead of transcribing them.

●​ In the annotation field, please choose “Block Of Text” from the drop-down menu.
●​ Choose the language from the drop-down menu. If the language is unknown or not
present on the list, select “Other”.
●​ Use polygons with 4 points (ideally rectangles), but if not possible (e.g. for curved words
or strangely shaped sentences), feel free to add more points.
●​ Please use the same order as for the word-level bounding boxes, i.e. start from the
top-left level and proceed clockwise. The dotted lines must be above paragraphs.
●​ Please do not label overlaid/overlapping text as it is difficult to define and the payout
is unclear.
Exception for Transcribing Paragraph Polygons: If the text order is unclear (for example, if a
large font is followed by a small font, or it can’t be read simply from left to right or top to
bottom), please type the text inside the paragraph transcription box.

Please type “UP TO 35% OFF + EXTRA 10% OFF” in the transcription box and mark the annotation as
“Block Of Text”

12
Please type “FROM €256* PER MONTH *Terms and conditions apply” in the transcription box and mark
the annotation as “Block Of Text”

EXAMPLES:
Disclaimer!
The images below are illustrative to show single-word and paragraph-level annotations, and
they do not contain line-level polygons. Submitting jobs that do not comply with the guidelines
will be considered incorrect.

Example 1:
●​ In addition to single-word bounding boxes, please proceed to create bounding boxes for
Stylized Paragraphs.
●​ Please note that this example will result in 2 different Stylized Bounding Boxes.

Example 1 Stylized Paragraph Bounding Boxes:

13
Example 2:
●​ In addition to single-word bounding boxes, please
proceed to create bounding boxes for Stylized
Paragraphs.
●​ Please note that this example will result in 2
different Stylized Bounding Boxes.

Example 2 Stylized Paragraph Bounding Boxes:

14
Example 3:
●​ In addition to single-word bounding boxes, please proceed to create bounding boxes for
Stylized Paragraphs.
●​ Please note that this example will result in 1 Stylized Bounding Box.

15
Example 3 Paragraph Bounding Box:

16
Example 4:
●​ In addition to single-word bounding boxes, please proceed to create a bounding box for
a Stylized Paragraph.

Example 4 Annotation Box:

Example 5:
●​ In addition to single-word bounding boxes, please proceed to create bounding boxes for
Stylized Paragraphs.
●​ Please note that this example will result in 12 different Stylized Paragraph Bounding
Boxes.

17
Example 5 Bounding Boxes:

18
Reject Reasons
Here is a list of all possible reasons for rejections. Please spend no more than half a minute on 1
rejected image. The list of reasons might be different for each specific task.
1.​ Image not Loaded: Please refresh the browser, as it might help you load your image. If
refreshing doesn’t help, reject that image with the “Image not Loaded” reason.
2.​ Incorrect Language: The image doesn’t contain any text in the target language. If an
image contains a mix of languages and one of them is your target language, please do
not reject it, but annotate all the words and lines/paragraphs. Please do not reject for
the “No Text” reason if there are words in other languages!
3.​ No Text: No form of text in an image.
4.​ Duplicate: If you notice any obvious duplicated images, use this reason to reject them.
This rejection reason does not apply in the Audit queue.
5.​ Text in Logo Area Only: Image that only contains text in the target language in the
station or program logo area.
6.​ Too Many Text: If an image contains more than 100 words or its annotation will take you
more than 30 minutes, make sure to reject it for this reason.
7.​ Complex Math Equation: If a math equation is relatively simple, e.g. x+y=z, it should be
annotated normally, but if an image contains a more complicated equation like with
superscripts/subscripts, then reject it for this reason.

19
8.​ Offensive Content: If an image contains offensive content such as nudity or
pornography, please reject it.
9.​ PII: This rejection reason should be used for rejecting images that contain Personal
Identifiable Information (PII Data), e.g. driver licenses, visas, ID cards, passports, social
security numbers or other identifying numbers or codes. Information that is shared
publicly or about public people (e.g. politicians or actors) and places (e.g. hair salons
or restaurants) does not fall under this rejection reason and must be annotated.

Step 4: Transcription Guidelines


●​ If an image does not contain any words in the target language, reject it (the reject
reason to be used should be: Incorrect Language). But if an image contains a mix of
languages and one of them is your target language, please annotate all words.
●​ If you are unsure how to separate single words in an unknown language, please check
the spaces. If there is a space, it is most probably a new word, so a new polygon is
needed.
●​ If you can transcribe words in non-target languages, please do so. But if you do not know
the languages and cannot transcribe them, it is enough to type a single period in the
transcription boxes.

A Bulgarian word in the English queue:

●​ Some languages can be written with different alphabets, e.g. Hindi can be written with
the Hindi alphabet, but can also be romanized and written using English characters. For
such cases, annotate all the words as they are shown in images, transcribe, and mark
appropriately. If an image in a Hindi job contains words in Hindi Romanized, please
transcribe them in the Latin alphabet and select "Other" as their language.
●​ If a word cannot be read/identified (because of the bad quality of an image, illegible
handwritten text, tiny letters, ambiguous symbols, or if you are unsure about a correct

20
transcription in general), type a single dot in the transcription box. The single dot
means “Ignore transcription”.
●​ Always transcribe numbers. There may be exceptions, but they are rare (e.g. an
annotator receives an image in a different language with a different script that the
annotator doesn't know, and there is a number but is inside a word with a different
alphabet. Please ignore the numbers in such cases).
●​ Please do not fix any grammar and spelling issues or typos shown in images. But also
make sure not to introduce your own grammar or spelling issues or typos in the
annotation box.
●​ Transcribe all characters in the word that are included in the box, including things like
brackets, slashes, colons, dots, @, etc. If an inverted slash appears, transcribe as
\backslash (i.e., a backslash followed by the word backslash). Do not add a space before
or after \backslash.
o​ e.g: \a\b = \backslasha\backslashb
●​ Recreate all upper and lower cases as they are shown in images. If a word is shown fully
in uppercase letters in an image, transcribe it with all uppercase letters.
●​ Respect all original accents, language-specific symbols, and diacritics, e.g. use é or ñ in
Spanish when appropriate (instead of e or n), use the dotless ı in Turkish, ě in Czech, etc.
●​ Please transcribe all special symbols such as Copyright (©) and Trademark (®), the Euro
symbol (€), pound (£), etc. If they are not present on your keyboard, search them on the
internet and copy them to your transcriptions. Using substitutes, e.g. brackets as in (R)
or (TM), is not acceptable.

Vertical text: In case you stumble across a vertical text, annotate it and select “Vertical Text”
from the drop-down menu.

Please use this tag for single words, and then additionally draw line and paragraph polygons (if
needed) for vertical text.

21
Examples of Annotations
1.​ A simple example: single-word polygons are inside a paragraph polygon. The dotted lines
are above words.

Polygons for individual words are marked as “Single Word” and transcribed. The first letters are
in uppercase:

22
The paragraph polygon marked as a “Block Of Text” with “</p>” in the transcription box:

23
2.​ Upside-Down Text. Again, make sure that the dotted line is along the top side of the
word itself and that the order of points is clockwise.

24
3.​ Mixed languages in the English queue. All the words are annotated, but only the English
words are transcribed since the other language was unknown to the annotator. The
dotted lines are above words, and polygons for single words are inside the paragraph
polygon. “Protein” is annotated once, as there is no need to bound isolated words twice:

25
4.​ New polygons created for single words - words separated with spaces. The dotted lines
are above words:

26
27
5.​ Tiny text (Coca-cola) should not be annotated. Please reject such images with the “No
Text” reason.

6.​ Perspective words. Note the dotted lines between the first two clicks.

28
7.​ Curved Text. For curved text, there will be more than 4 points. In general, please try to
make the number of points above the word and below the word equal. An
easy/recommended way to do so is to imagine/click a point at every corner
of each character in the word and connect them together in a clockwise order (so that
the line between the first two clicks is on the top side of the first character).

29
8.​ All URLs and names should be annotated. Please note that the transcription box is not
showing the full transcription of the URLs/codes, but they are there. In this example, the

30
Chinese text is not transcribed assuming it’s an English queue; however, if it’s a Chinese
queue it needs to be transcribed as well.

Incorrect Annotations:
●​ ‘INK’ is labeled wrongly in the below image. The point to the right of 'K' should not be
there. Always use 4 points whenever possible, and don't use more points only to enclose
shapes within a character. Do not forget to add paragraph polygons.

31
●​ ‘S’ is labelled wrongly in the below image. Never use less than 4 points to annotate
anything.

Step 5. Language Annotation


After you finish transcribing a word, please annotate the language in the dropdown menu. If the
language is unknown select “Other” in the dropdown.
Please also mark words in Hindi Romanized as “Other” in the Hindi queue.

32
FAQs

Question Answer Image/Example

Should it be transcribed as Yes, transcribe as “VOTE”.


VOTE, i.e. with O instead of
image?

Should the dash sign be put Because there is a space on


into a separate polygon or both sides of the hyphen,
added to the preceding one transcribe it as a separate
(Share)? word.

Should ASKA’S and TICO be Please skip the 'A' in ASKA's, as


transcribed as incomplete it is not visible. Please
words? transcribe “TICO” as “TICO”. If
you don’t see a letter, don’t
guess it. All words and letters
must be visible and eligible.

Which parts of text should be Annotate anything that’s


annotated, as some are hardly clearly visible.
legible?

Is handwriting out of the scope All visible words/letters are in


of this project? the scope. If the handwriting is
clearly legible, please annotate
it. In the case of this picture,
reject as “Too Much Text”.

33
Is it one or two words? If two, This seems to be a URL. Please
should the dot be part of the annotate as one word.
first word?

Is it one or two words? There is a space between “00”


and “PM”, so two words, two
polygons.

Should the quotation marks be Since there is no space, please


included with the first and last add the opening quotation
words? mark to the first word, and the
ending quotation mark to the
last word.

Should this image be rejected Yes, please reject it for this


as “Too Much Text”? reason.

Should the whitespaces be Yes, transcribe ASSOCIATION as


removed when transcribing A S one word.
S O C I A T I O N?

Should be the whole phone Yes, phone numbers should


number selected as a single always be one word.
word even if it includes
spaces?

Should the punctuation signs Yes, punctuation should be


be part of the word/polygon? part of the word.
LIE,’

Should it be transcribed as Yes, transcribe as VOTE!


VOTE! even if V is in the form
of an image?

34
How to select and transcribe If the text is occluded, and it’s
the text if part of a word not possible to infer what the
(name in this case) is covered whole word should be based
in the mid-section? on context, please annotate
this as two separate words:
“T” and “NGSON”.

Should numbers and texts on Yes, please annotate the


the chairs be clearly visible numbers and
selected/transcribed as it is text on the chairs.
rather handwriting?

Is it one, two, or three words, Since there are no spaces, it


and where to include slashes? should be annotated as one
word: “GNM/BSC/MSC”.

Should the plus sign be Yes, since there is a clear


annotated as a separate word? space, please annotate it as a
separate word.

Should pipes be annotated as Yes, pipes should be separate


separate words? words/polygons here.

Should signatures be Yes, if it’s possible to read


annotated too? them. If not, put a dot in the
transcription fields. Please
remember that one word - one
polygon.

Should the plus sign be Please annotate the plus sign


annotated here? (it’s ok to ignore the
surrounding heart).

35
How to transcribe text that One word, “AC/DC”, would be
includes characters not ideal since this refers to the
available on the keyboard? band name. But in general,
please search for the signs on
the internet and copy them to
the transcription boxes.

How to split and annotate Since it’s a URL, please


this? annotate it as one word.

Should the dollar signs be Yes, please always annotate all


annotated? visible letters/words/signs.

Should the ampersand be Yes, since there are spaces


annotated as a separate word? before and after the sign,
please annotate it as a
separate word.

Is the hash sign supposed to be Yes, please annotate it as a


annotated? single word.

36
37

You might also like