Update wndb2lmf to build Pre-3.0 WordNets#42
Merged
Conversation
Earlier versions of the Princeton WordNet did not include verb.Framestext, and they never changed across versions, so it's easier to just hard-code them than to load them from a file. The only potential issue I see is that this is content copied from the copyrighted WordNet documentation and there might not be enough attribution. I do link back to the documentation, so hopefully we're good there.
build_senseidx.py will create exact replicas of the index.sense files for WordNet 1.7 and higher versions. For WordNet 1.6, you can get close with the --use-adjposition option, but the counts for any sense key with an adjposition (a) or (p) after the head word of satellite adjective sense keys would need to be reset to 0. WordNet 1.5 did not have an index.sense file distributed with it.
- word = "Original_word" as in the WNDB data files - respaced = "Original word" with spaces instead of _ - lemma = "original_word" as in the WNDB index files
The frames were being sent to Wn's LMF in the 1.0 format and weren't being written in the 'subcat' attribute on senses. This is now fixed.
Also try to make it more robust for WN1.5
Not worth the trouble.
Part of #38
This was
linked to
issues
Jan 18, 2025
Collaborator
Author
|
@fcbond Sorry for the large PR. All the lexicons (including the new ones) now pass validation: $ ./validate.sh 1.5
build/omw-1.5/omw-arb/omw-arb.xml - valid
build/omw-1.5/omw-bg/omw-bg.xml - valid
build/omw-1.5/omw-ca/omw-ca.xml - valid
build/omw-1.5/omw-cmn/omw-cmn.xml - valid
build/omw-1.5/omw-da/omw-da.xml - valid
build/omw-1.5/omw-el/omw-el.xml - valid
build/omw-1.5/omw-en15/omw-en15.xml - valid
build/omw-1.5/omw-en16/omw-en16.xml - valid
build/omw-1.5/omw-en171/omw-en171.xml - valid
build/omw-1.5/omw-en17/omw-en17.xml - valid
build/omw-1.5/omw-en20/omw-en20.xml - valid
build/omw-1.5/omw-en21/omw-en21.xml - valid
build/omw-1.5/omw-en30/omw-en30.xml - valid
build/omw-1.5/omw-en31/omw-en31.xml - valid
build/omw-1.5/omw-es/omw-es.xml - valid
build/omw-1.5/omw-eu/omw-eu.xml - valid
build/omw-1.5/omw-fi/omw-fi.xml - valid
build/omw-1.5/omw-fr/omw-fr.xml - valid
build/omw-1.5/omw-gl/omw-gl.xml - valid
build/omw-1.5/omw-he/omw-he.xml - valid
build/omw-1.5/omw-hr/omw-hr.xml - valid
build/omw-1.5/omw-id/omw-id.xml - valid
build/omw-1.5/omw-is/omw-is.xml - valid
build/omw-1.5/omw-it/omw-it.xml - valid
build/omw-1.5/omw-iwn/omw-iwn.xml - valid
build/omw-1.5/omw-ja/omw-ja.xml - valid
build/omw-1.5/omw-lt/omw-lt.xml - valid
build/omw-1.5/omw-nb/omw-nb.xml - valid
build/omw-1.5/omw-nl/omw-nl.xml - valid
build/omw-1.5/omw-nn/omw-nn.xml - valid
build/omw-1.5/omw-pl/omw-pl.xml - valid
build/omw-1.5/omw-pt/omw-pt.xml - valid
build/omw-1.5/omw-ro/omw-ro.xml - valid
build/omw-1.5/omw-sk/omw-sk.xml - valid
build/omw-1.5/omw-sl/omw-sl.xml - valid
build/omw-1.5/omw-sq/omw-sq.xml - valid
build/omw-1.5/omw-sv/omw-sv.xml - valid
build/omw-1.5/omw-th/omw-th.xml - valid
build/omw-1.5/omw-zsm/omw-zsm.xml - validFeel free to review the whole thing if you have time, but otherwise please just pay attention to the changes to |
Collaborator
Author
|
I forgot to have the non-English lexicons require Also note that I now have the |
Collaborator
Author
|
@fcbond merging so we can move ahead. If you have |
Contributor
|
Thanks. I have some changes to tsv2lmf.py, but have not had a chance to
give them a final check yet, sorry!
…On Fri, 24 Jan 2025 at 00:35, Michael Wayne Goodman < ***@***.***> wrote:
@fcbond <https://github.com/fcbond> merging so we can move ahead. If you
have tsv2lmf.py changes please do see what has changed here (similarly if
you have changes to wndb2lmf.py). Let me know if you need help with any
conflicts.
—
Reply to this email directly, view it on GitHub
<#42 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIPZRSK7RYHMVDNXAVSWS32MF4EVAVCNFSM6AAAAABQX3UBPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJRGIZDQMBYGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See #38
Summary of changes:
wndb.pymodulebuild_senseidx.pyto rebuild anindex.sensefile from WordNet data/index/cntlist filesbuild.shto build all versions of WordNetindex.sensefiles as appropriate for each WordNet (as discussed here)omw-en*lexicon (summary of changes, include original README)wns/en30/,wns/en31/, andwns/pwn/<Requires>element on non-English lexicons to point toomw-en30:1.5index.toml