Skip to content

Comments

Replace copy of license with an SPDX identifier.#171

Merged
jaraco merged 1 commit intomainfrom
spdx-license
May 3, 2025
Merged

Replace copy of license with an SPDX identifier.#171
jaraco merged 1 commit intomainfrom
spdx-license

Conversation

@jaraco
Copy link
Owner

@jaraco jaraco commented Mar 21, 2025

Keeping a separate copy of a well-known license is just extra maintenance burden. With the introduction of SPDX identifiers in PyPI metadata, the most straightforward way to indicate the license for the project is through the license metadata field.

Comment on lines -8 to -9
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't removal violate this requirement?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, no. The permission notice is still included by way of the SPDX identifier. It merely reduces the redundancy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPDX is a reference to a license in an external DB, not to a notice within a license. And the requirement is to include this notice in all copies of the project.

IANAL, but I'm like 99% sure downstreams would not be able to distribute said software if it doesn't have a license file.

It's probably a good idea to ask @hroncok @befeleme @mgorny if the respective distros have policies that would cause problems.

Also, GitHub will probably stop being able to detect said licenses. It uses https://licensee.github.io/licensee/ to perform detection. You can run it as a CLI tool in a container to see what it'd return.

Copy link
Contributor

@mgorny mgorny Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IANAL but the way I understand it, the notice works "downwards". Basically, if the project sources include one, then all redistributions of these sources must include one as well. So if the project sources no longer include the notice, then it simply means redistributions don't have to include it either.

That said, if the project has had third party contributions, then the situation might be different. In particular, since all contributions were made under the license in question and under the assumption that such a notice is present, then the notice effectively holds an obligation for the primary author from other authors. Therefore, unilaterally removing the notice without agreement from other contributors could be perceived as a license violation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR; In Fedora, this wouldn't make us too happy, but there are ways forward.

Fedora's licensing guidelines strongly state the preference of including the license file into the distribution, if the license itself states it must be distributed with copies of the software. We must include the file, if present, and ask upstream developers to include the file if it isn't there. If that doesn't render the result, we can either include a text of that well known license (ideally after confirming with upstream this is the correct text), or decide not to package such project. I believe MIT is an example of a license that requires going through the process.
Source: https://docs.fedoraproject.org/en-US/packaging-guidelines/LicensingGuidelines/#_license_text

@sanjay900
Copy link

sanjay900 commented Apr 6, 2025

At the moment home assistant is unhappy as the deprecated classifier is removed and the SPDX identifier isn't there, and it was expecting one or the other for validating if all libs are OSI approved.

Currently thats causing pipelines to fail when any pull requests are raised that update dependencies, since jaraco.abode is pulling in jaraco.itertools==6.4.2.

We could not detect an OSI-approved license for [email protected]: None -- None -- []

https://github.com/home-assistant/core/actions/runs/14297140403/job/40065782550?pr=142436

@jaraco
Copy link
Owner Author

jaraco commented May 3, 2025

In my opinion, there should be one preferred way to declare the license. I want to avoid having multiple, redundant, duplicative forms of license declaration. Such duplication can lead to inconsistencies (if one license indicator strays from the other). Having the license indicator on exactly one line in the codebase also makes it easier to avoid subtle tweaks. The elimination of the LICENSE file also has the benefit of reducing the number of files lying around in the root of the project (see the essential layout for a hint of why that's a problem).

Also, GitHub will probably stop being able to detect said licenses. It uses https://licensee.github.io/licensee/ to perform detection. You can run it as a CLI tool in a container to see what it'd return.

The implication is that every project should provide a tool-specific license declaration for every possible downstream license rendering tool. In other words, because GitHub reads license files and PyPI reads SPDX identifiers (and tool X reads Classifiers, and tool Y reads some other format, ...), maintainers of each project should declare the license in every conceivable format. I reject that approach.

Instead, I propose that each project should declare the license in exactly one canonical form. I'm happy to adjust to whatever form fits best (although my preference is for it to be lightweight and minimally duplicative, such as a reference or hyperlink over a copy/paste).

It's my understanding that currently in the Python ecosystem, the best canonical form for declaring the license is with the SPDX identifiers. That's what's recognized by PyPI and it also has the nice property that it's a one-line declaration following a well-established schema for popular licenses (having the lightweight property). Hopefully GitHub will respect the Python community's work here and honor the identifier (although I'm not sure it even can in general, if the metadata isn't static, which may prove a problem for systems other than this skeleton).

IANAL, but I'm like 99% sure downstreams would not be able to distribute said software if it doesn't have a license file.

I'm 99% sure that the project merely needs to have a license, and the responses from our integrators seem to confirm that understanding. A requirement that it must be a license file or that it must contain the complete copy of the license sound to me like legacy concerns similar to how "not a lawyers" in the 90's and 00's insisted that signatures needed to be on paper.

In my opinion, until there's a legal opinion that sets a precedent that a license must be materialized in the code repository (or distribution artifacts) and cannot reference a well-established database on the Internet (and all the redundancies that brings), then I'm inclined to think it's unnecessary extra maintenance to consider more.

Therefore, to unblock the broken PyPI metadata in all skeleton projects, I'm going to proceed with this change. I'm certain that this change will introduce some disruption and consternation, but I'd like to work through those issues rather than imagine what those issues might be.

Thanks everybody for the thoughtful and insightful comments.

@jaraco jaraco merged commit 9a81db3 into main May 3, 2025
6 of 45 checks passed
@jaraco jaraco deleted the spdx-license branch May 3, 2025 07:57
@hroncok
Copy link
Contributor

hroncok commented May 3, 2025

So, basically, would you rather we ask for the license file for each project we package?

@rgommers
Copy link

rgommers commented May 3, 2025

Looking at the discussion above, I think all distro packagers are on the same page: the license file must be included.

In my opinion, no. The permission notice is still included by way of the SPDX identifier. It merely reduces the redundancy.

This seems pretty clearly incorrect. If 4-5 people all tell you the same thing, it seems like a very strange decision to just go ahead and remove the license file anyway @jaraco.

Removing trove classifiers seems inconsequential (it's PyPI-specific and not relevant for legal compliance), but removing license files cannot be right.

@jaraco
Copy link
Owner Author

jaraco commented May 4, 2025

@mtelka added in pypa/setuptools#4981

Recently the license file got removed from git and so it is no longer in the sdist package too. I think this is wrong because the MIT license explicitly states this:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

and so I believe the full license wording should be included in the sdist package.

In addition to that license notice statement being patently incorrect due to the missing copyright, it's also incorrect due to the missing permission notice. It seems to me the wording indicates that the copyright and permission must be included in all substantial portions of the software, but it's in fact not included in any substantial portions of the software except the sdist. When it's expanded on disk, it's no longer even in the same directory as the software, but in a separate folder with no direct link to the metadata folder containing the license. Some have argued, and I think the text backs up, that the copyright and license text must be included in every file.

The problem is that the paradigm for indicating the license for each file in the package has evolved over time. It used to be a lot more common to include licenses in every file, but people realized that was unsustainable and unnecessary, that it's reasonable enough to indicate the license by including alongside the package, attached to the package, even though that makes it trivially easy to detach it (simply remove the file).

To ameliorate these concerns, thankfully, we have VCS systems to track copyrights, attribution, and licenses for those copyrighted works.

I'm mainly seeking a way to unambiguously license a body of work (the package) with minimal boilerplate and overhead. I struggle to see how any reasonable human or machine could not easily detect the license terms for these well-known licenses. If it would help to include the license in the system (here in the skeleton or in coherent-oss/system for example), I'd be open to providing that as a hyperlink, but again, I'd like to avoid doing that duplicatively and introducing potential for mistakes.

This feels mostly like bikeshedding to me. Can someone explain to me why it's so important to include the body of the text other than "a lot of us feel that way?"

@rgommers
Copy link

rgommers commented May 5, 2025

In addition to that license notice statement being patently incorrect due to the missing copyright

So then please fix it, don't just delete it. You are creating a ton of work for distro packagers (I got here because the conda-forge feedstock for setuptools broke; it checks for licensing info being present - many packagers will be in that situation). Even if you are correct that it's fine legally to just delete a license file (you're not, but let's assume): you are creating way more work for other maintainers than you're saving for yourself. And even for yourself, you're making things harder rather than easier - because you have to deal with discussions like this one.

This feels mostly like bikeshedding to me.

It isn't or you wouldn't get many complaints so quickly. Seriously, what do you want or need here? Even more people to tell you that you are wrong? A legal opinion directly from a commercial Linux distro that can afford to have legal experts on staff to deal with this sort of question? Or ...?

@mtelka
Copy link

mtelka commented May 5, 2025

@jaraco what about to explore some other way how to avoid the license info duplication? E.g. keep the license file and so the License-File field in the metadata and use some tool(s) to automatically detect (and create) the License-Expression field there?

@rgommers
Copy link

rgommers commented May 5, 2025

Also from the Python Packaging User Guide: https://packaging.python.org/en/latest/tutorials/packaging-projects/#creating-a-license

"It’s important for every package uploaded to the Python Package Index to include a license. This tells users who install your package the terms under which they can use your package."

If you disagree with what the packaging guide says, you should probably discuss that with the community and maintainers of the guide. setuptools should set a good example here.

@richardfontana
Copy link

I'm mainly seeking a way to unambiguously license a body of work (the package) with minimal boilerplate and overhead. I struggle to see how any reasonable human or machine could not easily detect the license terms for these well-known licenses.

The possibly-only problem here is that SPDX identifiers do not unambiguously refer to a single license; they are not designed to do so. The SPDX-legal project attempts to make sure that the range of different license texts associated with a given identifier are not "substantively different" in a legal sense, but that's kind of a fuzzy inquiry and somewhat prone to error and disagreement (admittedly maybe of little consequence).

For example, SPDX MIT refers to an open-ended set of licenses that "match" the template defined in this file
as supplemented by these guidelines

IMO, it might be slightly better for you to provide a link to an actual license text (for example, to https://opensource.org/license/mit ) if you are trying to avoid inclusion of license text files.

clrpackages pushed a commit to clearlinux-pkgs/pypi-setuptools that referenced this pull request May 6, 2025
…version 80.3.1

Jason R. Coombs (25):
      Remove reference in test_windows_wrappers to easy_install.
      Moved some fixtures out of test_easy_install.
      Moved scripts functionality into its own module.
      Reference utility functions from _shutil.
      Remove easy_install and package_index.
      Revert "Merge pull request pypa/distutils#332 from pypa/debt/unify-shebang"
      Remove support for special executable under a Python build.
      In build_editable, ensure that 'executable' is hard-coded to #!python for portability.
      Add news fragment.
      Fix import.
      Cast is unnecessary, apparently
      Bump version: 80.1.0 → 80.2.0
      Replace copy of license with an SPDX identifier. (jaraco/skeleton#171)
      Add news fragment.
      Update tests in setuptools/dist not to rely on Setuptools having a license file.
      Rely on path.Path for directory context.
      Add news fragment.
      Bump version: 80.2.0 → 80.3.0
      Moved pbr setup into a fixture.
      Add a failing integration test. Ref #4976
      Restore ScriptWriter and sys_executable properties.
      Render the attributes dynamically.
      Add the deprecation warning to attribute access.
      Add news fragment.
      Bump version: 80.3.0 → 80.3.1
@jaraco
Copy link
Owner Author

jaraco commented May 6, 2025

See also #1, where I've been pushing for one unambiguous way to declare a license, but was bullied into carrying both the license text and the classifier.

@jaraco what about to explore some other way how to avoid the license info duplication? E.g. keep the license file and so the License-File field in the metadata and use some tool(s) to automatically detect (and create) the License-Expression field there?

I've been thinking of something like this. I was actually thinking of going the other direction, and resolving the SPDX identifier to the license text, using something like:

url = f"https://raw.githubusercontent.com/spdx/license-list-data/main/json/licenses/{license_id}.json"
text = requests.get(url).json()['licenseText']

I'd just need to update setuptools (or a plugin) to inject the license file using that routine. Then, the sdist would contain a license and it would be reflected from the declared identifier.

I'll probably attempt this approach first in coherent.build, which relies entirely on the classifier (and soon the SPDX identifier).

If I can prove the concept there, I can extend it to setuptools-based projects here in skeleton.

@jaraco
Copy link
Owner Author

jaraco commented May 6, 2025

I created #174 to track the issue in skeleton.

@jaraco
Copy link
Owner Author

jaraco commented May 10, 2025

Currently thats causing pipelines to fail when any pull requests are raised that update dependencies, since jaraco.abode is pulling in jaraco.itertools==6.4.2.

We could not detect an OSI-approved license for [email protected]: None -- None -- []

Following #174, I've rebuilt jaraco.itertools 6.4.3 using the fix. Please confirm this change has the intended effect.

@rgommers
Copy link

The LICENSE file is back in the sdist, which is good. The copyright statement is:

Copyright (c) 2025 <copyright holders>

which seems odd, but it's much less of a problem for tooling at least than no license file at all. I can't speak to whether all distros will be happy with this unorthodox approach (copyright statement as well as the issue not being fixed when pulling a release tag rather than an sdist from PyPI), but at least it should fix the issues I was encountering with conda-forge packaging.

Please re-include a license file in the next setuptools release as well.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request May 14, 2025
v80.4.0

Features

- Simplified the error reporting in editable installs.


v80.3.1

Bugfixes

- Restored select attributes in easy_install for temporary pbr compatibility.


v80.3.0

Features

- Removed easy_install and package_index modules.
- Restored license declaration in package metadata. See jaraco/skeleton#171.


v80.2.0

Features

- Restored support for install_scripts --executable (and classic behavior for the executable for those invocations). Instead, build_editable provides the portable form of the executables for downstream installers to rewrite.


v80.1.0

Features

- Added a deadline of Oct 31 to the setup.py install deprecation.


Bugfixes

- With ``setup.py install --prefix=...``, fall back to distutils install rather than failing. Note that running ``setup.py install`` is deprecated.


v80.0.1

Bugfixes

- Fixed index_url logic in develop compatibility shim.


v80.0.0

Bugfixes

- Update test to honor new behavior in importlib_metadata 8.7.

Deprecations and Removals

- Removed support for the easy_install command including the sandbox module.
- Develop command no longer uses easy_install, but instead defers execution to pip (which then will re-invoke Setuptools via PEP 517 to build the editable wheel). Most of the options to develop are dropped. This is the final warning before the command is dropped completely in a few months. Use-cases relying on 'setup.py develop' should pin to older Setuptools version or migrate to modern build tooling.
@jlovejoy
Copy link

I'm admittedly a bit too late to this party, but as the SPDX legal team co-lead, I thought I should chime in. A few general thoughts/suggestions:

  1. if you have questions about how to use SPDX license ids or the SPDX License List and it's repo, I would highly recommend you join that community (we are very friendly) and ask your question there :)
  2. If I understand correctly, the issue here seems to be about not providing the text of the license in a file for the given project repo. If I have that right, I'm not sure I understand why providing one text file in a repo is seen as such an inconvenience? In any case, as per many of the license terms: yes, you should absolutely do this. SPDX license ids are not meant to be a replacement for the full text, but an easier way to refer to a license shorthand in other places like... source files. See https://spdx.dev/learn/handling-license-info/
  3. As to @richardfontana's point about the SPDX matching guidelines, this is not so much an issue if you a) provide a full text of the license; and b) if you use the SPDX License List repo as a source of the text, make sure to pull from the appropriate place (in which case, see point no. 1)
  4. a copyright notice is a different and distinct thing from a license notice
  5. I really appreciate all the discussion and interest here! We are a very small team at SPDX-legal, supporting a project that is used by many. Always good to see people debating stuff and using the SPDX license ids, and always happy to have more join our small and friendly community!

@AlanCoding
Copy link

Would this work? Several-character change. Accomplishes the original goal of @jaraco which I agree with, but follows proper process, which is the objection here.

diff --git a/pyproject.toml b/pyproject.toml
index e916f46..0096b6d 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -21,7 +21,7 @@ classifiers = [
        "Programming Language :: Python :: 3 :: Only",
 ]
 requires-python = ">=3.9"
-license = "MIT"
+license = { text = "MIT-0" }
 dependencies = [
 ]
 dynamic = ["version"]

@richardfontana
Copy link

Would this work? Several-character change. Accomplishes the original goal of @jaraco which I agree with, but follows proper process, which is the objection here.

diff --git a/pyproject.toml b/pyproject.toml
index e916f46..0096b6d 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -21,7 +21,7 @@ classifiers = [
        "Programming Language :: Python :: 3 :: Only",
 ]
 requires-python = ">=3.9"
-license = "MIT"
+license = { text = "MIT-0" }
 dependencies = [
 ]
 dynamic = ["version"]

MIT-0 still is associated with an actual license text. The difference between MIT and MIT-0 has to do with obligations of licensees - licensees can remove the license text, but this doesn't affect the policy question of whether the original developer should include the license text in the first place.

I guess the other practical difference is that there are already a number of different license texts out in the wild that "match" SPDX MIT, while I am unaware of there being multiple matching license texts for MIT-0, which is much less widely used.

SeanMooney pushed a commit to SeanMooney/oslo-pkg-resources that referenced this pull request Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.