-
Notifications
You must be signed in to change notification settings - Fork 6
Unexpected identifiers in OMW 1.4 and nltk_data #24
Copy link
Copy link
Closed
Milestone
Description
The new OMW version 1.4 includes the "-s" type in the identifier of many synsets:
4011 slk/wn-data-slk.tab
2649 slk/wn-data-lit.tab
1 nld/wn-data-nld.tab
Also, across the whole OMW 1.4, the following identifiers are not found in PWN 3.0
01498548-a
01505508-a
02002046-a
02917945-a
03202339-n
14869976-n
14869977-n
15168570-n
15171146-n
15171147-n
15171739-n
15171858-n
15172882-n
15173065-n
15176162-n
15177867-n
15178842-n
15300653-n
The latter problem also occurs with the current nltk_data/corpora/omw package (unknown OMW version?), where the following identifiers are not found in PWN 3.0:
Bad offset: cmn 14869976-n ['污点']
Bad offset: cmn 14869977-n ['小斑']
Bad offset: cmn 15168570-n ['规定的睡觉时间']
Bad offset: cmn 15171146-n ['节日']
Bad offset: cmn 15171147-n ['纪念日']
Bad offset: cmn 15171739-n ['竞技状态不佳的日子']
Bad offset: cmn 15171858-n ['存取时间']
Bad offset: cmn 15172882-n ['选举日']
Bad offset: cmn 15173065-n ['教会年']
Bad offset: cmn 15176162-n ['雾月']
Bad offset: cmn 15177867-n ['希伯来历']
Bad offset: cmn 15178842-n ['回历']
Bad offset: hrv 00003093-b ['jedva', 'teško']
Bad offset: hrv 00004967-b ['jednostavno', 'potpuno', 'sasvim', 'stvarno']
Bad offset: hrv 01498548-a ['amoralan', 'nemoralan']
Bad offset: hrv 01505508-a ['mnogo_više', 'puno_više']
Bad offset: hrv 02002046-a ['izuzev', 'izuzevši', 'izuzimajući', 'osim']
Bad offset: hrv 02917945-a ['mahunast']
Bad offset: hrv 03202339-n ['modne_potrepštine']
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels