Skip to content

Update the MCR wordnets to the 2016 version#27

Merged
fcbond merged 6 commits intoomwn:mainfrom
ekaf:mcr-2016
Apr 16, 2023
Merged

Update the MCR wordnets to the 2016 version#27
fcbond merged 6 commits intoomwn:mainfrom
ekaf:mcr-2016

Conversation

@ekaf
Copy link
Copy Markdown
Contributor

@ekaf ekaf commented May 7, 2022

Fix #25: updated mcr2tab.py script to handle the MCR definitions and examples.
MCR examples are linked to specific lemma ranks, which OMW doesn't support yet, so this proposal links them at the synset level, like in PWN.

I applied the new mcr2tab.py on the much larger wordnets from MCR-2016, and tested the new tab files with NLTK's standard wordnet module:

from nltk.corpus import wordnet as wn
print(wn.synsets("entity")[0].definition(lang="spa"))

['aquello que se percibe o se sabe o se infiere que tiene su existencia propia distinta (viva o no viva)']

@ekaf
Copy link
Copy Markdown
Contributor Author

ekaf commented May 9, 2022

Like in PWN, the examples may include a synonym, instead of the expected lemma:

for ss in wn.synsets("casa", lang="spa"):
    examples = ss.examples(lang="spa")
    if examples:
        print(f'{ss}:\n\t{ss.lemmas(lang="spa")}\n\t{examples}\n')
Synset('building.n.01'):
	[Lemma('building.n.01.casa'), Lemma('building.n.01.construcción'), Lemma('building.n.01.edificación'), Lemma('building.n.01.edificio'), Lemma('building.n.01.edificios'), Lemma('building.n.01.inmueble')]
	['había un edificio de tres pisos en la esquina', 'se trataba de un edificio imponente']

Synset('dwelling.n.01'):
	[Lemma('dwelling.n.01.casa'), Lemma('dwelling.n.01.domicilio'), Lemma('dwelling.n.01.habitación'), Lemma('dwelling.n.01.habitáculo'), Lemma('dwelling.n.01.hogar'), Lemma('dwelling.n.01.morada'), Lemma('dwelling.n.01.piso'), Lemma('dwelling.n.01.residencia'), Lemma('dwelling.n.01.vasa'), Lemma('dwelling.n.01.vivienda')]
	['construyó una modesta vivienda cerca del estanque', 'ellos recaudaron fondos para proporcionar casa a los sin techo']

Synset('house.n.01'):
	[Lemma('house.n.01.casa')]
	['él tiene una casa en Cape Cod']

Synset('firm.n.01'):
	[Lemma('firm.n.01.casa'), Lemma('firm.n.01.compañía'), Lemma('firm.n.01.empresa'), Lemma('firm.n.01.empresas'), Lemma('firm.n.01.firma')]
	['el trabajó en una empresa de bolsa']

Synset('home.n.03'):
	[Lemma('home.n.03.casa'), Lemma('home.n.03.hogar'), Lemma('home.n.03.país')]
	['los aranceles canadienses permitieron a las empresas madereras de los Estados Unidos aumentar los precios en el país']

Synset('home.n.01'):
	[Lemma('home.n.01.casa')]
	['entregar el paquete en mi casa', 'no tiene una casa a donde ir']

@fcbond
Copy link
Copy Markdown
Contributor

fcbond commented Feb 23, 2023

Hi,

this looks great, thank you. I notice there are a couple of problems with the format of definitions (they are there in MCR), which I think we should fix.

  • remove underlines:

02881906-n spa:def 0 nudo que ni se_suelta ni se_aprieta
02881906-n spa:def 0 nudo que ni se suelta ni se aprieta

  • remove spaces around punctuation as appropriate

spa-30-00040325-a a 0 - - 0 ( dícese de ,_por_ejemplo, los volcanes ) en_erupción o con_posibilidad de erupcionar
spa-30-00040325-a a 0 - - 0 (dícese de, por ejemplo, los volcanes) en erupción o conposibilidad de erupcionar

I think that this would make it easier for people to use, ...

What for you think @ekaf ?

@ekaf
Copy link
Copy Markdown
Contributor Author

ekaf commented Feb 23, 2023

Thanks @fcbond! I was not aware of these problems, and intend to fix them very soon.

@ekaf
Copy link
Copy Markdown
Contributor Author

ekaf commented Feb 24, 2023

The latest update fixes the problems mentioned by @fcbond, and removes a spurious space at the end of many definitions.

Copy link
Copy Markdown
Contributor

@fcbond fcbond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I am ready to merge.

Thanks!

@fcbond fcbond merged commit e38fb86 into omwn:main Apr 16, 2023
@ekaf ekaf deleted the mcr-2016 branch April 17, 2023 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Spanish definitions and examples

2 participants