Skip to content

Conversation

@mdboom
Copy link
Contributor

@mdboom mdboom commented Dec 1, 2014

This work is much less necessary now that #3141 is merged, but I still thought I'd share it. The measurable differences are fairly small, so I think it ultimately comes down to maintainability and elegance. I, personally, don't care much for Cython-based wrappers as it can be very difficult to optimize and debug due to the extra level of indirection, but I know there are those that prefer it to C.

This is basically a direct port of the @jwoillez' Cython wrappers to pure C. It shaves 3 seconds off of the compile time vs. current master (after the Cython/Python splitting of the ERFA wrappers). from astropy import erfa uses about 750k less memory in this version.

As for speed, there is a speed advantage to this, particularly for small arrays (with larger arrays the time of ERFA itself dominates):

c_vs_cython

@astrofrog
Copy link
Member

@mdboom - which function did you test this for? It would be interesting to test this for a function where the ERFA execution is expected to be fast, such as aper (as discussed in #3141, see second example plot there).

@mdboom
Copy link
Contributor Author

mdboom commented Dec 1, 2014

This is atco13 (the first one in the #3141 thread). I'll do again with aper.

@astrofrog
Copy link
Member

@mdboom - thanks! I suspect the biggest differences we will see (if any) are going to be for the fast ERFA functions.

@eteq
Copy link
Member

eteq commented Dec 1, 2014

This is very much subjective, but my preference is for Cython over C if there's not a signficiant performance difference. My reasoning is that contributors who want to work on this and don't know C very well can read and edit Cython quite easily (it looks like python). But using the raw C API can be challenging and therefore error-prone for the uninitiated.

Of course the definition of "significant" matters a lot here...

@mhvk
Copy link
Contributor

mhvk commented Dec 2, 2014

hmm, 750kb of memory usage is not insignificant. That said, I agree with @eteq that it helps the uninitiated like me if most of the code is in python. I don't want to ask for needless effort, but would it be easier if this followed @jwoillez's split, and had python parts for ensuring that all the arrays could be broadcast together (and undo any broadcasting of the results) and C only for the actual iteration? It does seem that that is a very natural division, and in that case it would be much less likely the C code template would ever need to be edited.

@mdboom
Copy link
Contributor Author

mdboom commented Dec 2, 2014

The difference is more pronounced with aper, but still in the realm of "perhaps not significant enough to justify other tradeoffs":

c_vs_cython

Like I said, for me it's more a matter of preference -- I almost never find Cython to be easier in the long run, because it hides so much of what is really going on. It works well when it works, but when something goes wrong it's a major pain without proper debugging support. My experience definitely doesn't back up @eteq's assertion that the C API is more challenging than Cython when doing this kind of low-level C wrapping work (writing algorithms from scratch, such as astropy's convolution routines is another matter). But that's obviously fairly subjective.

@mhvk: Your suggestion to separate out along the lines of @jwoillez's split certainly makes sense. This PR is a hack I began before that split was made. I suspect the timing would be pretty close to master, particularly based on the two benchmarks above. The inner loop in both cases is pretty optimal: it's the one-time overhead of setting everything up where there is a speed difference, and by not doing that in C, you'd lose that advantage here. (All that based on a hunch, of course).

So, I think it comes down to:

  1. Is the performance improvement significant enough that we care?

  2. Are there further performance gains to this approach that aren't accessible in a more Python/Cython approach? I suspect there are, but have no evidence at this time.

@jwoillez
Copy link
Member

jwoillez commented Dec 2, 2014

@mdboom - I like the way you factored out the setup part of each wrapper (broadcasts, shape checks, iterator configurations...). This might explain part of the reduced memory footprint. In any case this is certainly something to keep, whatever the direction is chosen.

I know that I would not have been able to come up with a pure C version of the wrapper without some significant time overhead. Yet, after looking at what the C code generated by cython looks like, I feel like converting the cython leftover to pure C might not be a bad idea.

@mdboom
Copy link
Contributor Author

mdboom commented Dec 2, 2014

Well, likewise, it would have taken a lot longer for me to write the C
implementation without the Cython implementation as a guide.

If we stick with the Cython approach currently on master, I agree that
there might be some additional gains there to putting some parts of the
generated Python functions into non-generated functions as I've done here.
It would make erfa.py a lot smaller which should decrease the memory
footprint.

On Tue Dec 02 2014 at 2:23:51 AM Julien Woillez [email protected]
wrote:

@mdboom https://github.com/mdboom - I like the way you factored out the
setup part of each wrapper (broadcasts, shape checks, iterator
configurations...). This might explain part of the reduced memory
footprint. In any case this is certainly something to keep, whatever the
direction is chosen.

I know that I would not have been able to come up with a pure C version of
the wrapper without some significant time overhead. Yet, after looking at
what the C code generated by cython looks like, I feel like converting the
cython leftover to pure C might not be a bad idea.


Reply to this email directly or view it on GitHub
#3164 (comment).

@eteq
Copy link
Member

eteq commented Dec 2, 2014

@mdboom - I think it's mostly a question of familiarity. You're certainly right that this is easier to follow than the Cython-generated C for someone already comfortable with the python and numpy C APIs. But the side I'm coming from is that most of the user base are likely to know python well but are not going to be as experienced with the C API (that includes me). Having them able to provide changes to this is good, and for them Cython is simpler to understand and change.

Regardless, I agree with you that performance/compile time/memory is a more important concern.

Do you think if this were split into python/C chunks like what is now in master would also singificantly shrink down erfa.c? Right now the generated erfa.c is 3MB, which is quite a bit larger than the combined erfa.pyx and erfa.py in the current master (although maybe that's a moot point if #3170 or similar is merged).

Also, I think it might make sense to hold off on this a bit until some of the other ERFA PRs are in - we certainly don't want to lead down @mdboom down a wild goose chase of rebasing if we don't actually end up using this...

@mdboom
Copy link
Contributor Author

mdboom commented Dec 3, 2014

The generated C file here is roughly the same size (mainly because it includes the docstrings in a verbose but MSVC-compatible format, not because of greater code complexity). It is an improvement in runtime, though.

I don't really care much about the size of the generated code matters much except insofar as it affects compile-time speed and memory usage.

@mhvk
Copy link
Contributor

mhvk commented Dec 3, 2014

I think I agree with @eteq that python is easier to read, especially the input/output/broadcasting stuff, which becomes nearly unreadable in C -- indeed, it is this saner handling of I/O that attracted me to python in the first place (from fortran, but C is no better). However, the inner loops are easy to understand in both C and python, so I think for those we can just decide based on the benefits. Furthermore, those are unlikely ever to need changing.

Hence, my preference would be to keep the setup in python, as in the current implementation. For the inner loop, I would slightly prefer a pure C option, as it is cleaner and uses less memory, but it is not a big deal. As @mdboom said, this may not give any speed-up anymore, though I do think that the python setup could still be optimized a bit further (which would obviously benefit both approaches).

@mdboom
Copy link
Contributor Author

mdboom commented Dec 3, 2014

Ok. I'll probably wait until the additional improvements to the ERFA wrappers die down and then rewrite this to only to the inner loops -- should be quite straightforward.

@hamogu
Copy link
Member

hamogu commented Dec 4, 2014

Just a remark about C vs. Cython:
In general I like Cython better, because (as other people said above) it is easier to write and read for the python programmer. However, in this case I think C is a more natural choice, because is matches the language ERFA is written in, thus reducing the stack to two languages (C and Python) instead of three (C, Cython, Python).

  • ERFA is a stable library, so it is very unlikely that major work needs to be done on the ERFA wrappers, thus, it's sufficient is only a subset of developers is confident enough to edit the wrappers.
  • If somebody wants to understand what exactly is going on he/she will have to know C anyway, since more of what is going on is inside the ERFA C code, not in the wrappers.
  • There are enough developers with C experience to edit and review the wrappers if needed, after all @mdboom did not suggest to convert to COBOL or Assambler.

@eteq
Copy link
Member

eteq commented Dec 4, 2014

When I look more closely at the current master, I can see the point that if only the inner loop is in C, then dropping the Cython probably makes sense, as the inner loop is much more straightforward.

There are enough developers with C experience to edit and review the wrappers if needed, after all @mdboom did not suggest to convert to COBOL or Assambler.

I kind of want to add the COBOL suggestion as an issue for "Future" now 😉

@perrygreenfield
Copy link
Member

On Dec 4, 2014, at 4:29 PM, Erik Tollerud wrote:

There are enough developers with C experience to edit and review the wrappers if needed, after all @mdboom did not suggest to convert to COBOL or Assambler.

I kind of want to add the COBOL suggestion as an issue for "Future" now

Could I suggest INTERCAL or something of that ilk? No point in half measures...=

@embray
Copy link
Member

embray commented Dec 5, 2014

I'm reminded of this bizarre message on the python-dev mailing list last month. It's bad they reached out in the wrong direction because I'm sure if they asked in the right place they'd have learned a lot, but alas the typical antipathy of development mailinglists to off-topic messages.

So on that note, I think we should we should write this in "the lowest level of computer programming" and ship Turing machines to all Astropy users.

@mdboom
Copy link
Contributor Author

mdboom commented Feb 19, 2015

Closed in favor of #3521.

@mdboom mdboom closed this Feb 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants