Python: autogenerate PyPI source data (2)#15007
Conversation
| { | ||
| "toolz": { | ||
| "latest_version": "0.7.4", | ||
| "meta": { |
There was a problem hiding this comment.
in the past i found that metadata (license in my case) changed from version to version. it would be better to have all metadata copied per version.
i'm not 100% for this, but if i find the package with this weird thing i'll link it here...
There was a problem hiding this comment.
thanks @garbas
As far as I can see the JSON API doesn't give such meta data for each version. It can definitely change in setup.py (license field, or trove classifier) but I am not using that data.
There was a problem hiding this comment.
+1 i thought i'd just mentioned in case json api makes this possible
|
Note that I plan to merge this sometime this weekend. At that point I will also convert all the packages I maintain to use this. |
|
Try to convert the |
|
@zimbatm the script only keeps urls to source archives, not wheels. I can also keep the urls to wheels, but that would increase the size of the JSON again. Or, I keep only urls to wheels in case urls to source archives are not included. That's an easy change to include in the script and Autodetection is then straight forward: first check if the JSON has source, if not, wheel, also not, fail. |
|
|
||
| buildPythonApplication = args: buildPythonPackage ({namePrefix="";} // args ); | ||
|
|
||
| pypi-sources = builtins.fromJSON (builtins.readFile ../development/python-modules/pypi-sources.json); |
There was a problem hiding this comment.
you can use the shortcut pypi-sources = lib.importJSON ../development/python-modules/pypi-sources.json;
|
@FRidh yeah that could work. It's better than detecting by the url file extension. |
|
How hard would it be to convert all of python-packages into a json file ? If it's possible to take the pypi metadata and convert it into If the python-packages src hash doesn't change and the python packages are pre-built the users wouldn't have to load it either isn't it ? If that works out okay I think we could do the same for ruby and npm packages as well. |
Not at all, it's just a matter of adding the desired packages to the input file for the script.
I would also prefer to have a separate repo with autogenerated data as I wrote on the mailing list. We could do this for PyPI, Haskell, GitHub, ... separate expressions from data.
If the packages are pre-built, then users won't have to download it. If they aren't, they will have to. In that case, we have to think about whether the HEAD of the repo should contain the data for all versions, or just for the last couple or something, so that when an archive is created (automated release/tag) the file will be much smaller. |
you're crazy :) Alright, I'll try to do the same with rubygems since I'm more fluent in ruby. I will start by exporting everything and see how much data it would amount to. Having all the versions is going to be the most useful for users who use nix in their projects since most projects lag a bit behind in terms of dependency versions. And then we can also avoid the talks where we need N versions of a package because of version boundaries. Also, maybe gzipping the json file could be enough. |
We have a lot of packages in here that get their archives from GitHub, so why not? :-) I don't mean building all of GitHub ;-)
Once I extracted for all packages on PyPI the name, versions, descriptions, license and the resulting JSON was 156 MB (#11587 (comment)). That's something we don't want to. Of course, there's a lot of rubbish on PyPI that would never find its way into Nixpkgs anyway. I think we have something like 1000 Python packages, so I guess it would be something between 500 KB and 1 MB, mostly depending on whether you would include the |
|
I've been playing on converting the rubygems to nix. Converting all 800k of them to json metadata would take 3.2GB. That's just for the sha256, description, license and homepage. Maybe it's enough to just store the sha256 base32 hashes, in which case it would only take 40MB of non-compressible data. What do you think ? My playground project is at https://github.com/zimbatm/rubygems2nix |
|
Alternative in #16005 where an external repositorty with metadata is used. |
|
Closing in favor of #16005. |
Alternative to #14927. Here we use a separate function.
Getting rid of the urls is possible since #15001.
cc @domenkozar
(I also would like to get rid of
buildPythonApplicationagain and instead introduce an option likekind ? "library"with options[ "application" "library" ]or elseapplication ? falsesince I don't foresee any other types than applications and libraries that are relevant.)