Release 2.0 by goodmami · Pull Request #61 · omwn/omw-data

goodmami · 2025-05-22T23:06:17Z

This branch is for final changes before producing the release.

Go back to using omw-en instead of omw-en30 for WordNet 3.0. I no longer think the benefit of clarity and consistency outweighs the disruption caused by the change.

goodmami · 2025-05-27T00:44:01Z

I should have marked this as a draft as I didn't intend to merge it just yet. I've restored the branch so I can commit some more to it, but there's no need to revert the merge.

ekaf · 2025-05-27T08:11:07Z

Go back to using omw-en instead of omw-en30 for WordNet 3.0

Or recognizing both omw-en and omw-en30 as WordNet 3.0?

goodmami · 2025-05-27T16:58:42Z

Or recognizing both omw-en and omw-en30 as WordNet 3.0?

That may be easier in the consumer application (e.g., Wn), where I can probably just create a second index entry pointing to the same file. The index.toml file in this repository is used mainly to support building the WN-LMF data from .tab files, so I would need to create an alias mechanism or something. The alternative I don't prefer is to actually create two near-identical files, but the lexicon's ID would be omw-en30 instead of omw-en, as well as in all identifiers (e.g., omw-en30-woodworker-n in one and omw-en-woodworker-n in the other).

ekaf · 2025-05-28T04:38:30Z

It is better when the name tells you what's in the data. A name like omw-en has poor information value, compared to omw-pwn30 or omw-oewn31.

Concerning the use of aliases, the situation is indeed different for a data distribution like OMW-data, compared to a downstream library like Wn. For ex., a recent nltk PR #3378 makes it easy for users to install any wordnet, and call them what they want. A versatile approach like that can be nice in an application, but less so in a data distribution.

goodmami · 2025-05-28T17:22:09Z

It is better when the name tells you what's in the data. A name like omw-en has poor information value, compared to omw-pwn30 or omw-oewn31.

I'm not too concerned about the "information value" of the identifier. The XML has a label attribute that is more descriptive:

<LexicalResource xmlns:dc="https://globalwordnet.github.io/schemas/dc/">
  <Lexicon id="omw-en"
           label="OMW English Wordnet based on WordNet-3.0"
           language="en"
           email="[email protected]"
           license="https://wordnet.princeton.edu/license-and-commercial-use"
           version="2.0"
           url="https://github.com/omwn/omw-data"
           citation="Christiane Fellbaum (1998, ed.) *WordNet: An Electronic Lexical Database*. MIT Press.">

Also we had to change the lexicon from mentioning PWN or the Princeton WordNet because they only refer to the original WNDB files (which the NLTK reads) and not the WN-LMF derivatives (which Wn reads).

I'm more concerned with confusion when using the identifiers with the OMW version, e.g.:

>>> import wn
>>> en = wn.Wordnet("omw-en:2.0")

The 2.0 is above is the version of the OMW, but it loads the data derived from WordNet 3.0. To get data from WordNet 2.0, you'd do:

>>> en20 = wn.Wordnet("omw-en20:2.0")

ekaf · 2025-05-29T05:42:11Z

You just showed that omw-en20 is a more informative id than omw-en. In my opinion, the same applies to omw-en30. That's information value, preventing confusion.

goodmami · 2025-05-29T16:09:51Z

... My point is not that the identifier has no information value, it's that there are other attributes with more and clearer information about the source data.

To be clear, we've never released the lexicon derived from WordNet 3.0 as omw-en30. I changed it in the repository here and then changed it back before making a release because, even though it is more explicit (helping resolve confusion about omw-en:2.0 as described above), I decided it would be more confusing to change years of precedent. Probably for the same reason, the NLTK doesn't allow this:

>>> from nltk.corpus import wordnet30
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    from nltk.corpus import wordnet30
ImportError: cannot import name 'wordnet30' from 'nltk.corpus' (...). Did you mean: 'wordnet'?

but does allow this:

>>> from nltk.corpus import wordnet
>>> wordnet.get_version()
'3.0'

ekaf · 2025-05-30T04:31:27Z

You're completely right @goodmami, nltk has the same issue, and well-established habits are unlikely to change.
My point is that while it was clearer in the past what "the English Wordnet" meant, there is now a greater need of future-proofing by using more explicit identifiers.

goodmami · 2025-05-30T04:42:09Z

My point is that while it was clearer in the past what "the English Wordnet" meant, there is now a greater need of future-proofing by using more explicit identifiers.

That's fair. I don't think we'll make this change in the data for 2.0, but if you think it's something we should consider for a future release, please raise a new issue so we can track it. The comments here will become harder to find when we're done with the PR.

ekaf · 2025-05-30T17:33:58Z

Thanks, there is no need for a new issue. As you wrote earlier "I no longer think the benefit of clarity and consistency outweighs the disruption caused by the change".

goodmami · 2025-06-02T19:35:05Z

I should have marked this as a draft as I didn't intend to merge it just yet. I've restored the branch so I can commit some more to it, but there's no need to revert the merge.

Pushing more commits to the merged branch may cause some confusion, so I pushed a new branch release-2.0-pt2 and created a draft pull request #62.

Go back to omw-en instead of omw-en30

ed6cd84

goodmami mentioned this pull request May 22, 2025

Create a new release with some improvements (2.0) #31

Closed

fcbond merged commit d515b98 into main May 26, 2025

fcbond deleted the release-2.0 branch May 26, 2025 15:06

goodmami restored the release-2.0 branch May 27, 2025 00:43

goodmami deleted the release-2.0 branch June 2, 2025 19:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 2.0#61

Release 2.0#61
fcbond merged 1 commit intomainfrom
release-2.0

goodmami commented May 22, 2025

Uh oh!

goodmami commented May 27, 2025

Uh oh!

ekaf commented May 27, 2025

Uh oh!

goodmami commented May 27, 2025

Uh oh!

ekaf commented May 28, 2025 •

edited

Loading

Uh oh!

goodmami commented May 28, 2025

Uh oh!

ekaf commented May 29, 2025

Uh oh!

goodmami commented May 29, 2025

Uh oh!

ekaf commented May 30, 2025

Uh oh!

goodmami commented May 30, 2025

Uh oh!

ekaf commented May 30, 2025

Uh oh!

goodmami commented Jun 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

goodmami commented May 22, 2025

Uh oh!

goodmami commented May 27, 2025

Uh oh!

ekaf commented May 27, 2025

Uh oh!

goodmami commented May 27, 2025

Uh oh!

ekaf commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goodmami commented May 28, 2025

Uh oh!

ekaf commented May 29, 2025

Uh oh!

goodmami commented May 29, 2025

Uh oh!

ekaf commented May 30, 2025

Uh oh!

goodmami commented May 30, 2025

Uh oh!

ekaf commented May 30, 2025

Uh oh!

goodmami commented Jun 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ekaf commented May 28, 2025 •

edited

Loading