Skip to content

Unexpected identifiers in OMW 1.4 and nltk_data #24

@ekaf

Description

@ekaf

The new OMW version 1.4 includes the "-s" type in the identifier of many synsets:

4011 slk/wn-data-slk.tab
2649 slk/wn-data-lit.tab
1 nld/wn-data-nld.tab

Also, across the whole OMW 1.4, the following identifiers are not found in PWN 3.0

01498548-a
01505508-a
02002046-a
02917945-a
03202339-n
14869976-n
14869977-n
15168570-n
15171146-n
15171147-n
15171739-n
15171858-n
15172882-n
15173065-n
15176162-n
15177867-n
15178842-n
15300653-n

The latter problem also occurs with the current nltk_data/corpora/omw package (unknown OMW version?), where the following identifiers are not found in PWN 3.0:

Bad offset: cmn 14869976-n ['污点']
Bad offset: cmn 14869977-n ['小斑']
Bad offset: cmn 15168570-n ['规定的睡觉时间']
Bad offset: cmn 15171146-n ['节日']
Bad offset: cmn 15171147-n ['纪念日']
Bad offset: cmn 15171739-n ['竞技状态不佳的日子']
Bad offset: cmn 15171858-n ['存取时间']
Bad offset: cmn 15172882-n ['选举日']
Bad offset: cmn 15173065-n ['教会年']
Bad offset: cmn 15176162-n ['雾月']
Bad offset: cmn 15177867-n ['希伯来历']
Bad offset: cmn 15178842-n ['回历']
Bad offset: hrv 00003093-b ['jedva', 'teško']
Bad offset: hrv 00004967-b ['jednostavno', 'potpuno', 'sasvim', 'stvarno']
Bad offset: hrv 01498548-a ['amoralan', 'nemoralan']
Bad offset: hrv 01505508-a ['mnogo_više', 'puno_više']
Bad offset: hrv 02002046-a ['izuzev', 'izuzevši', 'izuzimajući', 'osim']
Bad offset: hrv 02917945-a ['mahunast']
Bad offset: hrv 03202339-n ['modne_potrepštine']

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions