Skip to content

Conversation

@embray
Copy link
Member

@embray embray commented Aug 13, 2015

Alternative to #3930--speed up Column.__getitem__ using Cython base classes. This eliminates most of the overhead associated with calling pure-Python __getitem__, and speeds up the body of the method as well.

This is a more general solution than the one in #3930--I think this could be useful for #3915. Checking whether the indices need to be updated adds only a trivial amount of overhead, so we can perform that check in the actual __getitem__ without incurring additional overhead. So unless we still want to be able to use table[slice] without incurring index adjustment overhead, the get_item methods won't be necessary.

@embray
Copy link
Member Author

embray commented Aug 13, 2015

The one thing I don't like about this is the duplicate code in _ColumnGetitemShim and _MaskedColumnGetitemShim. I really tried to find a workaround to that, but anything I could come up with added noticeable overhead.

@embray embray added the Affects-dev PRs and issues that do not impact an existing Astropy release label Aug 14, 2015
embray added a commit to embray/astropy that referenced this pull request Aug 20, 2015
Column.__getitem__ speedup branch of astropy#4075, and made adjustments to take
advantage of it in the indices code.  This included getting rid of the
`def get_item(...)` hacks, which I think are no longer necessary--the
base Column.__getitem__ is now almost exactly as fast as the stock
implementation in the basic case of single item scalar access.

In the process I also refactored the Cython code a bit so that it could
be less repetitive.

For now, tests that failed due to the change in behavior (indicies now
*are* copied by default when slicing) are commented out; but all other
tests pass.

It should also that the 'copy_on_getitem' index mode is now a no-op
since that's the default behavior.  We could still change this so that
it's not the default behavior, if we prefer, or add a no_copy_on_getitem
mode instead.
@taldcroft
Copy link
Member

This looks good to me, and hurts my brain less, as long as it's fast!

@embray
Copy link
Member Author

embray commented Aug 21, 2015

I'll try to get some performance results up on this PR. From my rough testing though this adds < 1% overhead compared to the base __getitem__.

@embray
Copy link
Member Author

embray commented Aug 27, 2015

Added a regression test proving that this PR also fixes #4098--also rebased this on top of #4099 so that the test I added can pass.

@embray embray added this to the v1.1.0 milestone Aug 27, 2015
@embray
Copy link
Member Author

embray commented Aug 27, 2015

Would be good to merge this sooner rather than later since it fixes #4098. However, I'm not sure what impact that will have on #3915. I don't want to introduce too many merge conflicts (though I don't think any such conflict will be too difficult to overcome...)

… base classes. This eliminates most of the overhead associated with calling pure-Python __getitem__, and speeds up the body of the method as well.
…er we need access to in this code is 'nd')--this allows us to avoid a global variable lookup of 'ndarray' when calling 'ndarray.tp_as_mapping.mp_subscript'. Now, in the common case of ndim==1, Column (but not MaskedColumn) adds virtually no overhead compared to the built-in __getitem__.
Column.__getitem__ speedup branch of astropy#4075, and made adjustments to take
advantage of it in the indices code.  This included getting rid of the
`def get_item(...)` hacks, which I think are no longer necessary--the
base Column.__getitem__ is now almost exactly as fast as the stock
implementation in the basic case of single item scalar access.

In the process I also refactored the Cython code a bit so that it could
be less repetitive.

For now, tests that failed due to the change in behavior (indicies now
*are* copied by default when slicing) are commented out; but all other
tests pass.

It should also that the 'copy_on_getitem' index mode is now a no-op
since that's the default behavior.  We could still change this so that
it's not the default behavior, if we prefer, or add a no_copy_on_getitem
mode instead.
@embray
Copy link
Member Author

embray commented Sep 30, 2015

Updated to incorporate #3915 / #4202.

As I noted previously, this makes the 'copy_on_getitem' index mode is now a no-op since that's the default behavior. We could still change this so that it's not the default behavior, if we prefer, or add a no_copy_on_getitem mode instead. @taldcroft ? @mdmueller ?

@embray
Copy link
Member Author

embray commented Sep 30, 2015

(To clarify, we could still make the default behavior be to not copy the indices on a slice. I wasn't sure which way is preferable, and if the current non-copy default was just chosen to not slow down __getitem__ of columns with no index.)

@taldcroft
Copy link
Member

@embray - the current non-copy default was more chosen to not slow down __getitem__ of columns with an index. The idea was that normally if you are slicing an individual column that happens to be an index in the table, you don't usually want the index to come along for the ride. This re-indexing can be pretty slow.

@embray
Copy link
Member Author

embray commented Sep 30, 2015

Okay, I wondered about that. In that case I'll restore the original default. This PR should still speed things up in either case since it does away with the penalty of having a __getitem__ in Python.

@taldcroft
Copy link
Member

Sounds good, thanks.

…evel classes / functions. Put module docstring before imports.
…is *not* the default (and re-enabled the tests that I previously disabled before I was sure what we were going to do here). This required going back to using a __class__ switcheroo to enable the index copying inside the index mode context manager. Originally I just wanted to add a flag to enable or disable it, but since that flag would have to be accessible from Python that just added too much slowdown to the general case. So this ends up adopting a middle ground between my original version of this code and @mdmueller's version.
@embray
Copy link
Member Author

embray commented Oct 1, 2015

Okay, I restored the original default behavior. In order to get this working right without sacrificing performance I went back on some of @mdmueller's original implementation, or at least something that looks like it (but still without any need for the get_item methods). But this still keeps the performance benefits of my original code as well.

@embray
Copy link
Member Author

embray commented Oct 2, 2015

Apparently I suck at testing on Python 2...

@taldcroft
Copy link
Member

@embray - I don't have time this weekend for detailed review on this, but I'm trusting that the table testing is sufficiently complete. Have you done some simple %timeit performance testing comparing this to master?

@mhvk
Copy link
Contributor

mhvk commented Oct 3, 2015

@embray - this looks good, and cleaner than before (though I cannot judge the cython very well). What seems to be missing, though, are some comments that explain why we are doing this at all. The original shim class had relevant comments; maybe put those either in the pyx file or below the Column class definition (to explain why it is subclassing the shim). Otherwise, I worry that a few years from now we have no clue anymore why this is being done...

@mhvk
Copy link
Contributor

mhvk commented Oct 3, 2015

p.s. I even wonder if we should have a section in the developer docs with "tips & tricks", where we summarize these types of speed-ups and their rationale. Indeed, my shape-changing mixin would be another good addition...

@embray
Copy link
Member Author

embray commented Oct 5, 2015

I'll add some better explanatory comments before merging.

@taldcroft Concerning timing this adds a tiny slowdown compared to the stock ndarray.__getitem__, solely due to the dimension check which is adding like a handful of instructions. So we're dabbling in difference of a nanosecond or two if that--really less than a timer can measure accurately anyways.

@taldcroft
Copy link
Member

@embray - that's definitely in the noise, so no problem! Thanks for working on this.

@astrofrog
Copy link
Member

@embray - could you add the comments in the next hour or so? It would be nice to get this in 1.1 if possible.

@astrofrog
Copy link
Member

I'm going to branch v1.1, but we can consider merging this in after since it's just a performance fix.

@embray
Copy link
Member Author

embray commented Oct 12, 2015

Oh, this is still open... let me add those aforementioned comments and then merge.

@taldcroft
Copy link
Member

👍

embray added a commit that referenced this pull request Oct 12, 2015
Speed up Column.__getitem__ in general
@embray embray merged commit 5a95d0d into astropy:master Oct 12, 2015
@embray embray deleted the table/getitem branch October 12, 2015 21:41
embray added a commit that referenced this pull request Oct 15, 2015
Speed up Column.__getitem__ in general
dhomeier pushed a commit to dhomeier/astropy that referenced this pull request Dec 17, 2015
Column.__getitem__ speedup branch of astropy#4075, and made adjustments to take
advantage of it in the indices code.  This included getting rid of the
`def get_item(...)` hacks, which I think are no longer necessary--the
base Column.__getitem__ is now almost exactly as fast as the stock
implementation in the basic case of single item scalar access.

In the process I also refactored the Cython code a bit so that it could
be less repetitive.

For now, tests that failed due to the change in behavior (indicies now
*are* copied by default when slicing) are commented out; but all other
tests pass.

It should also that the 'copy_on_getitem' index mode is now a no-op
since that's the default behavior.  We could still change this so that
it's not the default behavior, if we prefer, or add a no_copy_on_getitem
mode instead.
dhomeier pushed a commit to dhomeier/astropy that referenced this pull request Jun 12, 2016
Column.__getitem__ speedup branch of astropy#4075, and made adjustments to take
advantage of it in the indices code.  This included getting rid of the
`def get_item(...)` hacks, which I think are no longer necessary--the
base Column.__getitem__ is now almost exactly as fast as the stock
implementation in the basic case of single item scalar access.

In the process I also refactored the Cython code a bit so that it could
be less repetitive.

For now, tests that failed due to the change in behavior (indicies now
*are* copied by default when slicing) are commented out; but all other
tests pass.

It should also that the 'copy_on_getitem' index mode is now a no-op
since that's the default behavior.  We could still change this so that
it's not the default behavior, if we prefer, or add a no_copy_on_getitem
mode instead.
@dhomeier dhomeier mentioned this pull request Oct 30, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Affects-dev PRs and issues that do not impact an existing Astropy release Performance table

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants