-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Speed up single-element Column.__getitem__ for 1-D columns. #3930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Looks good! Though while I like the generality of generating possibly more types of classes on demand, I wonder if for code clarity it wouldn't be simpler to just define the two cases we need explicitly. I guess one issue is that the names would clash, correct? ... ... ... Hmm, thinking more, your approach also has the advantage that any subclasses automatically do the right thing. So, would seem all fine. |
|
Right, it had to be done this way to work for any BaseColumn subclasses that come along as well (for example, it works transparently for the One disadvantage is that it could affect subclasses that for whatever reason do not need this workaround, or implement a different |
|
Actually, nevermind, doesn't need any tweak. Since @classmethod
def _get_nd_proxy_class(cls):
return clsto make it effectively a no-op, if they don't need this work-around. |
…bclasses on an as-needed basis when using those columns to store multi-dimensional data. The new classes have the slower __getitem__ implementation, while the stock classes go straight through ndarray.__getitem__ and retain its performance.
3f3af0a to
4dea3de
Compare
|
This looks cool, I learned something new! Just one question and one request. First, is this MRO at all a problem (that it includes Column twice)? Second, can you just name this |
No, both of those "Columns" are actually different classes. The first is the new class that gets generated with the bases
It really is a mixin class though. It's composed with other, more concrete classes through multiple-inheritance to add/override one piece of functionality. Does it matter so long as it's an internal implementation detail most people will never see? |
|
Seems like a nice solution. Don't hold the PR up for this, but I wonder if it would benefit from more generality if |
I agree that this is a technically accurate name in the broader context, but for anyone that starts looking at Table code it will be confusing because of the ubiquitous and conflicting usage of |
|
@taldcroft Okay, this PR is confusing enough as it is I don't want to make it worse. What about |
I thought of that, and then didn't do it. Not sure why though--now that I think about it again I like it. |
…ct with other use of the 'Mixin' classes for columns.
…handle checking the data itself
|
Looks good to me @embray, thanks! |
|
👍 from me! |
Speed up single-element Column.__getitem__ for 1-D columns.
… base classes. This eliminates most of the overhead associated with calling pure-Python __getitem__, and speeds up the body of the method as well.
… base classes. This eliminates most of the overhead associated with calling pure-Python __getitem__, and speeds up the body of the method as well.
… base classes. This eliminates most of the overhead associated with calling pure-Python __getitem__, and speeds up the body of the method as well.
… base classes. This eliminates most of the overhead associated with calling pure-Python __getitem__, and speeds up the body of the method as well.
… base classes. This eliminates most of the overhead associated with calling pure-Python __getitem__, and speeds up the body of the method as well.
… base classes. This eliminates most of the overhead associated with calling pure-Python __getitem__, and speeds up the body of the method as well.
As an alternative to #3929, create copies of
BaseColumnand subclasses on an as-needed basis when using those columns to store multi-dimensional data. The new classes have the slower__getitem__implementation, while the stock classes go straight throughndarray.__getitem__and retain its performance (the shimmed classes are also cached, so this not have much performance impact on Column creation either).On my laptop, before this patch (on current master)
After this patch
Note, multi-dimensional columns still have to use the slower
__getitem__, but exhibit the correct behavior:I think this is probably an acceptable balance, given that the multi-dimensional case is far less common. All the tests passed locally.