-
Notifications
You must be signed in to change notification settings - Fork 6
Possible duplicates in the Arabic wordnet #49
Copy link
Copy link
Closed
Labels
dataSomething is wrong in the dataSomething is wrong in the data
Milestone
Description
I'm making a separate issue because the reason for the duplicates seems sufficiently different.
The Arabic wordnet doesn't just have arb:lemma for its entries in the TSV file, but also arb:lemma:root and arb:lemma:brokenplural. Often the ...:root one is identical to the regular lemma. The script to check for duplicates only looks at the offset+pos and the lemma, not the second column, so these look like duplicates:
05169813-n arb:lemma شأن
-05169813-n arb:lemma:root شأن
05169813-n arb:lemma:brokenplural شؤونWe are currently not doing anything with roots and brokenplurals when converting to WN-LMF, but maybe we want to keep the redundant lemmas here in case we want to add alternative <Form> elements or something later.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
dataSomething is wrong in the dataSomething is wrong in the data