Skip to content

Inconsistent lemma row types for Arabic wordnet #46

@goodmami

Description

@goodmami

In the Arabic wordnet (wn-data-arb.tab), instead of arb:lemma we get either arb:lemma:brokenplural or arb:lemma:root, however there are a small number of arb:lemma:brokenPlural (note the uppercase P):

grep ':lemma:' wns/arb/wn-data-arb.tab | cut -f2 | sort | uniq -c
   2770 arb:lemma:brokenplural
    180 arb:lemma:brokenPlural
  14683 arb:lemma:root

I think we should normalize brokenPlural to brokenplural.

There is a separate issue where the Arabic wordnet file without diacritics (wn-nodia-arb.tab) only has lemma for that column, not even arb:lemma. I'm not sure what to do with this file.

Metadata

Metadata

Assignees

Labels

dataSomething is wrong in the data

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions