Skip to content

Conversation

@astrofrog
Copy link
Member

(This is an alternative to #3166)

With this PR, which starts from one of @jwoillez's branches, ERFA files are generated on-the-fly, but only if any of the input files or scripts have been modified since the files were generated.

This does not require any changes to astropy-helpers and makes use of the current infrastructure.

I also made sure that sdist does not include the template files, and this also allows us to determine whether we are in release mode.

One change that could be made would be to just move the contents of cython_generate.py inside setup_package.py but I'll let @eteq decide on this.

cc @jwoillez @embray @mdboom @eteq @mhvk @taldcroft

@astrofrog
Copy link
Member Author

The reason Travis is failing is because python setup.py egg_info crashes because jinja2 is not installed. What is the best way around this - should I check if egg_info is in sys.argv and skip the generating if so?

@mdboom
Copy link
Contributor

mdboom commented Dec 2, 2014

@astrofrog wrote:

I also made sure that sdist does not include the template files, and this also allows us to determine whether we are in release mode.

Why not include the template files in sdist? We include .pyx files already. It should still be possible to do development from a tarball (even if a git checkout is the sanest way to go). I think the important thing is that the generated files are in the tarball, but I would explicitly include the template files also.

@astrofrog
Copy link
Member Author

@mdboom - ok, sounds good, I added it back (note that there was a typo in that before, so it wasn't including the py template)

@astrofrog
Copy link
Member Author

@mdboom - what do you think is the right approach re: egg_info?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @mdboom that the template files should be in the release (if only to ensure the poor Debian packagers don't have to put it back ;-). If so, this if statement can be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will remove that :)

@mhvk
Copy link
Contributor

mhvk commented Dec 2, 2014

@astrofrog - looks good! And I think it passes @eteq's condiment test....

@mdboom
Copy link
Contributor

mdboom commented Dec 2, 2014

As for the egg_info problem... Checking sys.argv seems messy to me. Maybe this is an argument for a new hook to generate these files (and I realise I'm backtracking on earlier statements). That hook would only be run if not in egg_info mode. get_package_data etc. would still always be run so that the generated files would still end up in the egg_info SOURCES.txt file. But maybe that's too much change for the sake of elegance. Maybe @embray has some thoughts...

@eteq eteq mentioned this pull request Dec 2, 2014
@embray
Copy link
Member

embray commented Dec 2, 2014

get_package_data doesn't need to actually generate the file--just report that it exists, or will. Maybe even create a stub. I wouldn't do anything that involves generating source code until the build stage.

@embray
Copy link
Member

embray commented Dec 2, 2014

I will add that what is currently written to SOURCES.txt is not even accurate anyways. That's something to be fixed, but it's not really a showstopper either.

@embray
Copy link
Member

embray commented Dec 2, 2014

Actually, this is all about generating some C sources anyways right? That shouldn't have anything to do with get_package_data. get_package_data is for non-Python files that should nonetheless be installed alongside the Python package. That doesn't apply to C sources.

@astrofrog
Copy link
Member Author

@mdboom - #3166 implements this as a separate step (thought I have to backport changes from this PR to there) but @embray was opposed to having the separate step since it means modifying setup.py. I'm happy to go either way, and don't have a strong opinion.

One idea would be to go with what this PR implements, but create a custom type of Extension say ERFAExtension which gets returned by get_extensions instead of the current one in astropy.erfa and make it that the ERFA sources are generated in that extension at build time (instead of when get_extensions is called. Would that work? I can try?

@astrofrog
Copy link
Member Author

I'm going to go ahead and try and implement ERFAExtension and we can always remove that commit later if we decide we don't like it.

@astrofrog
Copy link
Member Author

@mdboom @embray - check out the ERFAExtension class I just added. When we run the cythonization, we do:

        for extension in self.extensions:
            if 'numpy' in extension.include_dirs:

so my approach is to make include_dirs into a property which when accessed will auto-generate the files if needed.

This works correctly with build, and should also do the right thing for sdist, for a release (not if we run it on a developer version because we don't run the cythonization for that, but then we never use sdist out of releases, so not an issue).

So just to clarify, it's a hack, but it's a hack we have control over because we are the ones who decide to access include_dirs in setup_helpers.py.

@astrofrog
Copy link
Member Author

Also, @eteq and @mhvk, let me know what you think of the latest approach

@astrofrog
Copy link
Member Author

(debugging Travis but still open to comments in mean time)

@eteq
Copy link
Member

eteq commented Dec 2, 2014

@astrofrog - I like this approach. I just issued a PR against your branch (astrofrog#50) that adds an option to suppress the output from cython_generator and turns it on here. (This is based on a commit I added in #3173 ).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This same thing is defined further down inside the line that starts with erfafns=, so you can reuse it there (and you may also want to follow that and add ERFAPKGDIR as the starting point of the ERFA_SRC path - I'm not sure if that's necessary, but it might be safer?).

add option to silence cython_generator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize you're doing this right now for testing, but in the end you might leave these in as distuitls.log.debug calls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and it wouldn't hurt to have an INFO call to say that we're generating the ERFA wrappers if we do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, definitely.

@astrofrog
Copy link
Member Author

@eteq - does this work locally for you if you run build? (just trying to understand if the Travis issue is Travis-specific)

@eteq
Copy link
Member

eteq commented Dec 3, 2014

@astrofrog - Nope, I get:

% python setup.py build
... normal stuff about copying ...
cythoning astropy/erfa/erfa.pyx to astropy/erfa/erfa.c
error: [Errno 2] No such file or directory: '/Users/erik/src/astropy/astropy/erfa/erfa.pyx'

It doesn't seem to ever reach the generating code at all...

@eteq
Copy link
Member

eteq commented Dec 3, 2014

@astrofrog - I think I may see the problem... distutils.extension.Extension is an old-style class in py2 (doesn't inherit from object), so when it does self.include_dirs = include_dirs in the initializer, it overwrites the property... the solution might be to override the __init__ in ERFAExtension to put the property back into place after calling the Extension.__init__?

@eteq
Copy link
Member

eteq commented Dec 3, 2014

See http://nbviewer.ipython.org/gist/eteq/3134913ecac69b93912f for a demonstration of the problem. AFAICT old-style classes work only with property getters, not setters. I'm guessing you can do something in the __init__, but it will probably require some further hackery...

@astrofrog
Copy link
Member Author

Ah, I think I didn't see that because I was using Python 3 locally. In any case, I think this is fixable, I can just make use of __getattribute__ instead of properties. ... and that also applies only to new-style classes. Digging further.

@astrofrog
Copy link
Member Author

Ok, so I got one step further and the extension stuff works. BUT - erfa.py doesn't get copied because extensions get built after build_py gets run.

At this point I don't think we can do anything without modifying astropy-helpers. Maybe something like #3166 ain't so bad after all.

cc @eteq @mdboom @embray

@astrofrog
Copy link
Member Author

Note that if we didn't have the separate .py and .pyx files, this would work. Ironic! :-/

@astrofrog
Copy link
Member Author

@embray @mdboom @eteq - this is turning into something too hacky, and if we are going to have to modify the helpers anyway, i think the approach in #3166 is better. I updated that approach and now there are no changes in setup.py - can we move the discussion back there?

@embray
Copy link
Member

embray commented Dec 3, 2014

Can somebody explain to me, like I'm 5, why this is desirable at all? How often do these templates need to be updated? Is this something we can just include in the source?

Also, on closer inspection, I don't think the generated .py file is needed at all (the Cython/C code, sure). But most of the functions in the Python module could instead be generated on-the-fly as needed. It would be nice if we just included a dump of the parsed erfa.h as like a JSON file or something, and that would provide all the information needed to return new Python functions if/when needed. If that sounds appealing I can submit a PR. It won't require Jinja or anything like that.

@astrofrog
Copy link
Member Author

@embray - the generated files are ~1Mb. Every time we modify something in the template (a) we have to remember to re-generate the files, and (b) we increase the size of the repository quickly and (c) the diffs in PRs are too large.

The approach here is too complicated but in #3166 is pretty simple. I'm not sure that the solution you are suggesting is any simpler, and may also introduce performance penalties, right?

@mhvk
Copy link
Contributor

mhvk commented Dec 3, 2014

I like the idea of generating the python parts of the wrapper on the fly -- cython_generator.py is already very fast now, so it would appear it would notl add much overhead. It should be even better with @jwoillez's factorisation in #3176. But it does seem to depend on whether this is easy without jinja2 -- having a new intermediary json file does not seem appealing.

@astrofrog
Copy link
Member Author

I'm going to close this specific issue because this was a dead-end. Any subsequent conversation should be continued on #3166 which is still up for consideration.

@astrofrog astrofrog closed this Dec 3, 2014
@embray
Copy link
Member

embray commented Dec 3, 2014

@mhvk We're already building this huge Python file, though most of the code in it could be vastly simplified such that a single function, plus a little introspection, could implement the wrapper for all the ERFA functions. The information needed for the introspection has to come from somewhere, which is why I suggested a JSON file. Though it could just as easily be a Python module with a dict in it--I just suggested JSON because it could be used for other purposes as well--it's just metadata about the ERFA API, basically.

Generating the Python functions on the fly, as needed, would have a small impact. As I said, for the most part they can be generated by wrapping a single function. Once a particular function is loaded it doesn't have to be re-generated either, so it's a one-time cost.

The generated Cython file on the other hand comes in at 133 kB, and how often does it really need to be regenerated? If we switch to just using C directly instead of Cython, as I've heard suggested, it will probably be a little larger, but only a little. So I don't see the big deal.

@astrofrog
Copy link
Member Author

@embray - I can see the appeal with the JSON approach, so it's just a question of whether someone has the time to develop this 😄. I agree that with the current Cython/Python split in the wrappers the Cython/C part is not really a big deal. At the end of the day, if we can find a way to do without having to generate or auto-generate that doesn't take too much space and doesn't have a big performance impact, I'm all for it.

@eteq
Copy link
Member

eteq commented Dec 3, 2014

Sounds fine - I'll go ahead and close this then, although if we end up thinking this is a better way after all it can always be re-opened.

@embray
Copy link
Member

embray commented Dec 4, 2014

I'm working on it (my suggestion).

@eteq
Copy link
Member

eteq commented Dec 4, 2014

@embray - in case this rolls into the changes you're working on right now, I am firmly against the idea of keeping the C checked in - that was always intended as a temporary measure. The reality is that people forget to do this when they update the templates, so they're out-of-sync with the generated source for a lot of the commit history, and that makes for headaches.

Although I have no objections to some sort of on-the-fly intospection/generation on the python side.

@astrofrog astrofrog deleted the generate-erfa branch July 5, 2016 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants