-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Auto-generated ERFA wrappers #2992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@eteq - thanks for working on this! Just out of curiosity, how long does the erfa.pyx source take to be converted to C and also compiled? Does it slow down the build process significantly?
This could be a lot of extra work - an alternative is to simply have verbose comments before an ERFA call to explain what we are calling and why. In any case, I agree this can be done later. +1 to having the vectorized ERFA interface not be part of the public API for now. |
|
In liberfa/erfa-wrapping-experiments#6, @mdboom asked something that should probably be discussed here:
Not necessarily, now that you mention it. I think the main reason originally was that it makes it much easier to tell cython what to link to (just |
|
@astrofrog - The process of converting erfa to
Yeah, that's a good point. Not sure how well we'd enforce it long-term, but it is a good short-term solution. |
|
@eteq - right, I guess my thinking is, let's see first which functions we end up using, and once that's done then we can more easily figure out where to concentrate efforts for providing functions with better names (no point in doing it for all ERFA functions). |
|
@astrofrog - ok, yeah, that sounds good. Any opinion on whether we should try to automate the |
|
@eteq - I think it'd be better to add the pyx source to the repository, it's less that can break, and it allows us (if needed) to treat special cases as needed. Regarding @mdboom's suggestion of not having a single C file (and I imagine by extension, not a single pyx file), I'm +0.5 on the suggestion since I'm not a fan of 17,000 line files. |
|
@astrofrog - it would be just one I think actually that just one For the |
|
@eteq - ok, that makes sense. In that case, I agree with the separate C file/single pyx file approach. Fewer steps is better. |
|
Yes -- we definitely want only a single .pyx. Each .pyx has all the overhead of initializing a Python extension module (both in file size and import time). Whereas the separate .c files has no real overhead in the final product. |
dbafaec to
16b336f
Compare
|
Alright, I just updated this to use the multi-file version of ERFA instead of the single-file, as well as a fix that at least allows everything to import fine with erfa_time removed. I pushed this up such that we should get two travis builds. The first one is with From local testing, I think most of the problems in I'll try to get the |
|
Hmm, Travis ran both tests on the version that doesn't have |
|
@eteq, all - Sorry for the silence on this.
This is the part that I wasn't too happy about. But as you suggest, I wouldn't wait for this to be fixed (at least by me) to move forward. This is even more critical for the matrix operations available in ERFA/SOFA. The documentation at http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NpyIter_AdvancedNew seems to indicate that the broadcasting rules can be tweaked. Ideally, we would want for a C function with signature |
|
The record aspect seems not that bad, as it separates the dimenions of the number of items from what is contained within them. And with this, broadcasting already works as I would expect it! Of course, a change is only a But first some debugging... I found at least one error due to wrong casting. As this would be for |
|
On what to include in tar balls: I'm somewhat in favour of just including the actual source, not an intermediate product, i.e., go all the way from the |
|
@mhvk - so just to be clear, you are saying we keep only the initial code in the repository, and only the final wrappers in the release tar balls? Nothing should get generated for users who download stable releases (like for Cython, where we include only the final .c files) |
|
yes, sorry for being confusing, I would put only the initial code in the repository, not intermediate products such as the |
88ba61a to
80fd741
Compare
|
@mhvk - In theory I like your suggestion of not having the If we don't do that, we have to tell anyone building from source to cd into the relevant directory, run the generator script, and then go back, which will likely be a big pain. |
|
Now, for the changes I just pushed up: there's quite a bit there. What I've added now is:
With these changes, I think this is ready to go (modulo the next paragraph). @mhvk - At first I thought the same, that maybe leaving these as structured arrays is a good thing. But after playing with it a bit, I started running into oddities, like it not being possible to do Also, this PR currently retains |
|
@eteq - just to make sure I understand, does the fact the build without |
|
@eteq - OK on leaving On the record arrays: recasting them, isn't Fortran order even more problematic if you decide to use the final dimenion? In a record array, the values in the record are stored right next to each other, so I would have thought that for Fortran order you'd need to use the first dimenion(s) to ensure you have the same view of the memory. For matrices, it may be even less obvious what to do, since the matrix itself is presumably always stored in C order... |
|
With record arrays recast, Currently, is a validation done of the python wrappers? (e.g., run a version of the erfa test file) But perhaps this suggests starting to use the wrapped erfa routines should indeed be separated from getting the package to compile properly. |
|
@eteq - sorry, with the many files it seemed hard, but just looking at the commit doing record->regular array (eteq@25acb27), I see you do some tests on fortran order. Just to be sure, maybe add a test with matrix input/output? Though I'm still not completely sure the record array approach isn't better. Do you have an explicit example of what goes wrong with it? |
|
@mhvk - good idea on the matrix input/output - will add something for that and push it up soon (along with some fixes that will hopefully get the tests passing).
I am not running the full erfa test suite mostly because translating it to python would be a painful act: They aren't as consistent as the docs are, so it's probably not easy to automate the translation, so it would need to be done by hand for every function. There are some "spot checks" where I compare a few functions here and there to what the erfa tests say the answer should be (and they are passing). Will add some for
I agree with you in theory, but in practice a bunch of awkward things jumped out once I tried to actually use it this way:
It's not obvious, but the problem is that arrvec's dtype here is int, but we cast it to a double. The current scheme automatically takes care of that by using numpy's casting rules, but
It may be there's some clever trick to get around all of that, but this just seemed easier. |
|
@eteq - yes, (fastest dimension has to match the total length of the field) And one can get the name via Some more tricks at http://wiki.scipy.org/Cookbook/Recarray But as you say, it may not be worth it to stick to the record arrays; as long as it is always the same set of indices that iterate over input and output, it should be fine (and I think I convinced myself there is no fortran issue here...). |
(also remove a non-blank empty line)
but still fails (now xfail) due to different error/warning behavior
00b8d98 to
4b03513
Compare
|
Rebased on master and it like the tests have passed, so this should be set to merge! Once this is in I can add an issue about cleaning up |
Auto-generated ERFA wrappers
|
@eteq - sorry for the delay, I've been at a meeting. I just merged this. 👏 |
|
Fantastic, thanks @taldcroft ! |
|
🎉 |
|
This is very cool, guys. Good work. I'll be using this stuff immediately. |
|
Now I can just: So... 😄 |
This isn't necessarily ready to be merged yet, but it's most of the way there, at least for what I think we want for 1.0.
This PR adapts the approach (and much of the code) from @jwoillez in liberfa/erfa-wrapping-experiments#6, which uses jinja2 to generate a Cython wrapper around ERFA. The resulting Cython file is pretty straightforward to use: typically it's just
outarr = func(arr1,arr2)or similar, using numpy arrays where relevant.There's a fair amount of discussion in liberfa/erfa-wrapping-experiments#6 about ways to improve this more, but in the name of moving forward with coordinates work for 1.0, I suggest we table those for now (or at least integrate this soon and allow the two efforts to move in parallel). So I think it's ok for us to use this but not consider it part of the public API yet (or with big warnings that say it's likely to change), and work towards a full wrapper for public consumption in future versions.
A few things that need to be dealt with before this is even mergeable:
erfa_timeneeds to work. Right now it's just a re-naming scheme to maperfa_timenames to the baseerfanames, which seems to get about 2/3 of the tests to pass, but I don't really understand many of the failures. Perhaps @taldcroft or @mhvk can provide some guidance there? (Hopefully once travis runs you'll see the failures/errors come up there.) If it seems like it's going to be a lot of trouble to get this working, I could rollback the last commit and just leaveerfa_time.pyxin place for now...check_errorsthat's in the currenterfa_time.pyx. That could be implemented fairly easily in the template that generates the functions, but to clone the funcationality of the currentcheck_errors, there needs to either be some clever parsing of the ERFA documentation strings, or some way to manually specify what the error codes mean.And a few items from liberfa/erfa-wrapping-experiments#6 that I think could be separate issues to be addressed later:
astropy/erfa/__init__.py, but it might be better to do something fancier.erfa.pyxactually checked in as source code. In theory, we could instead have it be auto-generated as part of the build process, just like Cython does in generating.cfile from.pyxfiles. Of course, that might lead to awkwardness given that we now need to go fromerfa.pyx.templ->erfa.pyx->erfa.cin the build process... I'm not sure how hard that will be, but ERFA updates are uncommon enough that I think we could just say "run thecython_generator.pyscript when there's a new ERFA" and keeperfa.pyxconstantly in source.cc @astrofrog @mhvk @taldcroft @mdboom @embray @jwoillez