Skip to content

Possible duplicates in the Arabic wordnet #49

@goodmami

Description

@goodmami

I'm making a separate issue because the reason for the duplicates seems sufficiently different.

The Arabic wordnet doesn't just have arb:lemma for its entries in the TSV file, but also arb:lemma:root and arb:lemma:brokenplural. Often the ...:root one is identical to the regular lemma. The script to check for duplicates only looks at the offset+pos and the lemma, not the second column, so these look like duplicates:

 05169813-n	arb:lemma	شأن
-05169813-n	arb:lemma:root	شأن
 05169813-n	arb:lemma:brokenplural	شؤون

We are currently not doing anything with roots and brokenplurals when converting to WN-LMF, but maybe we want to keep the redundant lemmas here in case we want to add alternative <Form> elements or something later.

Metadata

Metadata

Assignees

Labels

dataSomething is wrong in the data

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions