Skip to content

Python: autogenerate PyPI source data (2)#15007

Closed
FRidh wants to merge 3 commits intoNixOS:masterfrom
FRidh:buildpypi
Closed

Python: autogenerate PyPI source data (2)#15007
FRidh wants to merge 3 commits intoNixOS:masterfrom
FRidh:buildpypi

Conversation

@FRidh
Copy link
Member

@FRidh FRidh commented Apr 26, 2016

Alternative to #14927. Here we use a separate function.

Getting rid of the urls is possible since #15001.

cc @domenkozar

(I also would like to get rid of buildPythonApplication again and instead introduce an option like kind ? "library" with options [ "application" "library" ] or else application ? false since I don't foresee any other types than applications and libraries that are relevant.
)

@FRidh FRidh added the 6.topic: python Python is a high-level, general-purpose programming language. label Apr 26, 2016
{
"toolz": {
"latest_version": "0.7.4",
"meta": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the past i found that metadata (license in my case) changed from version to version. it would be better to have all metadata copied per version.

i'm not 100% for this, but if i find the package with this weird thing i'll link it here...

Copy link
Member Author

@FRidh FRidh Apr 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @garbas
As far as I can see the JSON API doesn't give such meta data for each version. It can definitely change in setup.py (license field, or trove classifier) but I am not using that data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 i thought i'd just mentioned in case json api makes this possible

@FRidh FRidh added 0.kind: enhancement Add something new or improve an existing system. 9.needs: reporter feedback This issue needs the person who filed it to respond 2.status: work-in-progress labels Apr 28, 2016
@FRidh
Copy link
Member Author

FRidh commented May 11, 2016

Note that I plan to merge this sometime this weekend. At that point I will also convert all the packages I maintain to use this.

@zimbatm
Copy link
Member

zimbatm commented May 11, 2016

Try to convert the entrypoints package. I think we need some sort of auto-detection of the format.

@FRidh
Copy link
Member Author

FRidh commented May 11, 2016

@zimbatm the script only keeps urls to source archives, not wheels. I can also keep the urls to wheels, but that would increase the size of the JSON again. Or, I keep only urls to wheels in case urls to source archives are not included. That's an easy change to include in the script and buildPyPIPackage.

Autodetection is then straight forward: first check if the JSON has source, if not, wheel, also not, fail.


buildPythonApplication = args: buildPythonPackage ({namePrefix="";} // args );

pypi-sources = builtins.fromJSON (builtins.readFile ../development/python-modules/pypi-sources.json);
Copy link
Member

@zimbatm zimbatm May 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use the shortcut pypi-sources = lib.importJSON ../development/python-modules/pypi-sources.json;

@zimbatm
Copy link
Member

zimbatm commented May 11, 2016

@FRidh yeah that could work. It's better than detecting by the url file extension.

@zimbatm
Copy link
Member

zimbatm commented May 11, 2016

How hard would it be to convert all of python-packages into a json file ?

If it's possible to take the pypi metadata and convert it into <package>-<version>.json files, maybe with a prefix folder to avoid listing limit and dump that into a repo (eg: nixos/python-packages). Then nixpkgs can source that repo for all the metadata and stay lean.

If the python-packages src hash doesn't change and the python packages are pre-built the users wouldn't have to load it either isn't it ?

If that works out okay I think we could do the same for ruby and npm packages as well.

@FRidh
Copy link
Member Author

FRidh commented May 11, 2016

How hard would it be to convert all of python-packages into a json file ?

Not at all, it's just a matter of adding the desired packages to the input file for the script.

If it's possible to take the pypi metadata and convert it into -.json files, maybe with a prefix folder to avoid listing limit and dump that into a repo (eg: nixos/python-packages). Then nixpkgs can source that repo for all the metadata and stay lean.

I would also prefer to have a separate repo with autogenerated data as I wrote on the mailing list. We could do this for PyPI, Haskell, GitHub, ... separate expressions from data.

If the python-packages src hash doesn't change and the python packages are pre-built the users wouldn't have to load it either isn't it ?

If the packages are pre-built, then users won't have to download it. If they aren't, they will have to. In that case, we have to think about whether the HEAD of the repo should contain the data for all versions, or just for the last couple or something, so that when an archive is created (automated release/tag) the file will be much smaller.

@zimbatm
Copy link
Member

zimbatm commented May 11, 2016

GitHub

you're crazy :)

Alright, I'll try to do the same with rubygems since I'm more fluent in ruby. I will start by exporting everything and see how much data it would amount to. Having all the versions is going to be the most useful for users who use nix in their projects since most projects lag a bit behind in terms of dependency versions. And then we can also avoid the talks where we need N versions of a package because of version boundaries. Also, maybe gzipping the json file could be enough.

@FRidh
Copy link
Member Author

FRidh commented May 11, 2016

GitHub
you're crazy :)

We have a lot of packages in here that get their archives from GitHub, so why not? :-) I don't mean building all of GitHub ;-)

I will start by exporting everything and see how much data it would amount to.

Once I extracted for all packages on PyPI the name, versions, descriptions, license and the resulting JSON was 156 MB (#11587 (comment)). That's something we don't want to. Of course, there's a lot of rubbish on PyPI that would never find its way into Nixpkgs anyway.

I think we have something like 1000 Python packages, so I guess it would be something between 500 KB and 1 MB, mostly depending on whether you would include the longDescription. It's not so much, and it also wouldn't grow as fast compared to say Haskell.

@zimbatm
Copy link
Member

zimbatm commented May 15, 2016

I've been playing on converting the rubygems to nix. Converting all 800k of them to json metadata would take 3.2GB. That's just for the sha256, description, license and homepage. Maybe it's enough to just store the sha256 base32 hashes, in which case it would only take 40MB of non-compressible data. What do you think ? My playground project is at https://github.com/zimbatm/rubygems2nix

@FRidh
Copy link
Member Author

FRidh commented May 15, 2016

@zimbatm that's a lot! Let's discuss this further in #15480 .

@FRidh
Copy link
Member Author

FRidh commented Jun 6, 2016

Alternative in #16005 where an external repositorty with metadata is used.

@FRidh
Copy link
Member Author

FRidh commented Jul 28, 2016

Closing in favor of #16005.

@FRidh FRidh closed this Jul 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

0.kind: enhancement Add something new or improve an existing system. 6.topic: python Python is a high-level, general-purpose programming language. 9.needs: reporter feedback This issue needs the person who filed it to respond

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants