-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Supporting ndarray subclasses in table columns #1839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@mhvk - This is looking promising! I think we can effectively separate the problems of supporting ndarray subclasses like You might also consider closing #1583 as being superceded by this PR. |
|
@taldcroft - yes, the separation works out quite well; I'm still somewhat amazed that by just using I renamed the PR in the spirit of your suggestion and closed #1583. |
|
p.s. I would have loved to have a more general Relatedly, currently, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it because the comparison should not be by the raw ndarray data for, e.g., Quantity columns. But I can instead try to rewrite these to use the content property if it exists..
Aside: this would be a very good example of why PR should not be considered complete without a test case that ensures the issue does not recur! More mundanely, mentioning those issue numbers in the comments would have been helpful... Would you have time to do a quick PR to add tests, and I'll rebase and make sure I pass them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both those PRs had tests.
|
Right now this doesn't really play with masking, which is perfectly fine. So what about raising an exception for any attempt to have a generalized ndarray column in a masked table? |
|
This will need rebasing now that #1842 has been merged (it makes Travis work again) |
astropy/table/column.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overriding __getitem__ here is pretty worrisome, especially with this implementation that requires going through the content property which does a dict update. I see about a factor of 20 performance hit for:
c = Column([1, 2])
timeit c[1]
Also, without having carefully thought through the code here, I'm suspicious that this will end up changing the API behavior somewhere since it's not just falling through to the base __getitem__.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, factor 20 is definitely too much; I'll see if I can rewrite it so it spends most time only when really needed.
(I'm afraid some of this are the effects of putting out the PR late at night...)
|
@taldcroft - thanks for all the detailed comments. Most I can just implement; I'll think a bit more about how to ensure I'll also see if I can add some checks for |
|
I've looked at this more today and I think it's going to work. This is pretty impressive stuff. Some of this is somewhat intricate in terms of the logic and use of numpy subclassing techniques, so it wouldn't hurt to add comments to help future readers (both of us included) to understand the flow. |
|
One question we're going to have to answer is how to introduce this API change. Right now if you set a table column with a |
|
OK, thanks! I'll add comments, documentation, and tests... (sure way to find some problems too :-) Small aside on those comparison functions: while comparing to how we dealt with it in I'll try to catch this in |
|
The great thing about this approach is that we can expect success for introducing These are looming in my mind for the non- |
|
On the API change: you're right, I think this requires thought and at the very least a deprecation warning. Similarly, right now, if you give a plain array and a unit, the result is not a And finally, we have to decide what happens if you set the unit. This is not allowed for a I guess this already brings up the larger question touched upon earlier, whether one should have columns in which the unit is only some otherwise meaning piece of metadata (I feel no; that is what |
|
Agreed that the non-ndarray columns are interesting; ideally, they behave as close as possible to columns. But e.g. on grouping by |
|
@mhvk - what is the status of this? I see it got pushed to 1.0 at some point... Is that definite, or is this actually nearly ready? Or is there another PR that subsumes this or otherwise makes Table and Quantity work better together? |
|
@eteq - this is being held up mostly by getting things to work with masked columns as well; I've been thinking to try to separate that out better, but have not spent much time recently. I think it is alright to have this for the next major release (with what seems to quite a few other improvements to |
75754e2 to
1b14e0f
Compare
1b14e0f to
ac89546
Compare
|
@mhvk - this is fantastic! Exactly what I want to see! I think just a few standard tests that they behave like BTW, is this still a stepping stone to something like #1833 ? I'm particularly interested in trying to stuff Also, looks like it needs another rebase, probably due to #2963 ...? |
|
@eteq - thanks! I'll try to add further tests soonish. I'd definitely still want something along the lines of #1833, #1835 or #2790, but I think this is independent of all those. My real hesitation is that it will not work with |
|
@mhvk - I've been thinking about tables and revisiting the previous efforts on these lines. I'll be looking at stuff this week. |
I made another attempt at getting
Column's to holdQuantityand otherndarraysubclasses, one which I think is mostly complementary to @taldcroft's #1833, as I on purpose did not (yet) includeTimehere. It is also simpler than my other attempt (#1835), a bit more in the spirit of #1583, in that the essence is to create a newColumnsubclass whose other subclass isQuantity(or whatever; note that I did not adjustMaskedColumnyet, since we do not really have otherMaskedArraysubclasses yet). The new piece is that new subclasses are defined automatically (withQuantityinput ->QuantityColumnclass, etc.)With this PR: