-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Table: suggestion to create new column upon assignment to non-existing one #726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is an interesting idea and I agree there is something nice in making it so easy to make new columns. I think the implementation will be straightforward. The main concern I have is about the possibility for allowing errors by effectively removing the current validation that the column exists when assigning to that column. For example if you make a typo, or have a program error that generates the wrong column name, then Table will happily create that new column instead of raising an exception and indicating that something unexpected is happening. @astrofrog @eteq or others - do you have any thoughts? I'm not totally sure where to draw the line between syntactic convenience and preventing foot shooting. |
|
Hmm... I see both sides @taldcroft. What about a solution with the following behavior: That is, check for @mhvk - do you think this would meet the use case you had in mind? |
|
Hi Erik, Tom, Thanks for your quick replies. My use case is one where I have a table Anyway, I'd prefer contents of a column and creating one to be as As for the solutions with explicitly adding Column(...), that would be One smaller comment on what Erik wrote:
Would one actually want the last one to be a rename? Looking at such an
All the best, Marten Prof. M. H. van Kerkwijk |
|
I guess when I think about it more, it's not clear to me what anyone could mean by And after re-reading your post, @taldcroft, I see what I said before is not fully addressing what you're worried about, as you were thinking of typos when you mean to use an already-existing column. I do see the concern there too, but when I think about it, adding this behavior seems consistent with other things in python. E.g., a |
|
I need to think about this, and am not very keen at first glance, even though it is appealing from a convenience point of view. What would: do if copy the data, or is it a reference? One could limit the cases to: where |
|
@astrofrog - here's what I was thinking: raises I would think would create a new column named 'b'. I just noticed I didn't respond to @mhvk's comment on this - I mis-spoke before and didn't mean "rename" - I mean what I just said here - it would a new column named 'b', that has the same contants as 'a'. There is an issue here that occurs to me: if you do That was what I was discussing above: at first I was thinking only allow |
|
To address some points in the previous couple of comments: This would definitely copy the data. This is required in any case because in the underlying ndarray structured array cannot have one column that is a view of another array. That always happens at the point of adding a column, which creates a brand new Not a problem wrt to the name because it would always ignore the name of the RHS column. In this syntax the name of the new column is uniquely taken from the key item ( Note that If you think about something in (This is sort of pseudo-code). I think it's actually pretty clean and let's you put anything on the right hand side |
|
Anyway, I'm going to take a quick stab at this and see if any problems come up. Hopefully there will be a PR and code soon which should make it easier to decide on the behavior. |
|
@taldcroft - awesome, thanks! |
|
Sounds good, @taldcroft One quick comment on the I was also thinking you can suppress the warning for But maybe you were going to do this anyway - I'll look forward to the PR! |
|
OK, see attached code. So far there is a net addition of like 4 lines of code. See also the ipynb: I think I really like this! |
|
Nice! Ok, now starts the hard work of trying to find examples that would confuse users, and if we can't find any, then I think it should go in! (kind of like trying to disprove a scientific hypothesis ;-) ) |
|
@eteq - having warnings like that will produce unwanted warnings in the case of something like In the Don't Repeat Yourself spirit, for this syntax the RHS column name is irrelevant and should not be cause for even a warning. For the most part users won't need to bother with This does highlight a good use case for anonymous (unnamed) columns. The unfortunate thing is that this would be best handled by a serious API change in |
|
@taldcroft - I wonder whether we made a mistake with the original API - maybe it should be: and then all this wouldn't be an issue here - that is, a column really only has a name as part of a table (though |
|
@astrofrog - I've definitely found myself doing things like: That is, it's useful that columns know their own names, independent of living in a Table. So I think what the original API did is still valuable... it's just that we want to make sure when they're in a table they are subordinate to the Table for their names. @taldcroft - I see your point with the Perhaps a way around this is to also silence the warning if the Column you're adding has a name that's already in the Table? Then |
|
Oh, and @taldcroft - if we want to change the order to |
|
I totally missed you had code @taldcroft - I agree this is very nice! (modulo my comments above, which aren't about the major functionality). Also, nice use of http://nbviewer.ipython.org - I didn't realize you can just toss gists up and have it work. That's super-useful! |
|
@astrofrog - I think the @eteq - Despite a desire to fix this now, I'm a little hesitant about such a big change so late in the release. This would hit around 15 or 20 files in astropy and maybe 200 lines, and will completely break any calls to |
|
With the API you are suggesting in this PR, I think most users will use it as i.e. there's no point using the or something like that. So I think having to use |
|
@taldcroft re: name/value swap. Ok that sounds fine. I agree it's not a big deal. What did you think about my suggestion about warning only if the table name is not already present? I can put a PR against your branch here to implement what I'm thinking, if you like. @astrofrog - that makes sense. I think at least I personally would still end up using |
|
Hi All, Great to see it implemented. (And nice to see how to do this properly;
For what it's worth, as a new user I found it somewhat confusing that t = Table(, names=colnames) while c = Column(name, ) It does seem a bit more logical for both to have the same order, and for All the best, Marten |
|
@mhvk - sigh, I agree 100% on the disconnect with the order of args for See also #731 for a path to fix this by 0.3. |
astropy/table/table.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this make masked arrays non-masked? Is there a reason for doing this here rather than just passing value to data in NewColumn since the re-casting to a Numpy masked or non-masked array should already happen there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A masked array will fail the isinstance() test and won't get to this line.
I originally wrote basically what you suggested, but it turns out that Column ignores the length kwarg if you pass in data, so it was creating a wrong length column if you passed a scalar. Instead of modifying the Column.__new__ and MaskedColumn.__new__, I decided to just ask numpy for the dtype of non-ndarray inputs in advance and then make an initially-empty Column.
BTW, version 2 was something like (with no if not isinstance(..))
value = np.array(value, subok=True, copy=False)
That would have also preserved masked arrays and not copied values, but I thought it was better to not touch value at all if already an ndarray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would np.asarray make sense here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdboom - I agree np.asarray would be the preferred method here. Just for my own understanding, is there a situation where there would be any difference between asarray() and array() in the case where the input value is not an ndarray subclass?
|
I've tagged this with the 0.3 milestone for now, but if @eteq and @taldcroft both think it can be included in 0.2, we can do that too (but it might require substantial improvements to the docs, so maybe 0.3 is easier time-wise). |
|
I think 0.3 is fine. |
|
yeah, I'd say 0.3 makes the most sense |
|
On argument order for |
|
So here's a challenge that I couldn't figure out (was going to try stackoverflow, but I suppose I could start here). Is there a way to know within a function how the arguments were provided? Can I distinguish between: I know with |
astropy/table/table.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to add
shape=value.shape[1:]
That way even assignment of a multidimensional array will do the right thing -- yes, I'm using that; my Gemini/GMOS tables have columns holding the three chips and two stars on the slit, i.e., shape=(512,2,3)
|
Hi All, I've been using the new auto-assignment in my Gemini/GMOS reduction scripts now, and realised I'd like it even better if it did multidimensional assignments; all that is needed is to add "shape=value.shape[1:]" to the column creation in the new version (I added a line note at 1178 above, but am not sure that information gets propagated). Thanks! Marten |
|
@mhvk - excellent idea. This is done in 316b79d and my local tests show that this does the trick. I still need to write a full set of tests. BTW when you write inline comments they do show up as notifications to those who are participating in the thread. |
astropy/table/table.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further use shows that the change from a "try" to an "if" statement breaks assignment to rows by row number, i.e.,
mytable[0] = (somedata)
throws ValueError.
maybe go back to try/except? I guess one could also expand the if statement, but this seems less clean:
if item in self.colnames or isinstance(item, int)
-- Marten
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, I am disappointed that there is no test case of assignment like mytable[0] = (somedata) since this problem should have been caught in test. I'll do that and try to work something out based on your suggestions.
|
@mhvk - I have the fixed the issue of setting rows and added some tests. It would be great if you can look over the code and tests, and try to break this latest version. In the end I just made sure that the item is a string. Using This issue of assigning a row to a row is separate from this PR, and is now tracked in #770. |
|
@mhvk @eteq @astrofrog - I think this is basically ready. I noticed it wasn't propagating column metadata in the case of setting from an existing Column, so that is fixed now. I just need to update the docs. BTW, pandas allows and encourages this syntax, and on reflection it's really the right way to go. I think we were being overly conservative initially about possible problems. |
Previous version broke setting rows with t[1] = (values) because it tried to create a new column 1. Instead make sure the item is a string and is therefore referring to a column name.
|
I think this is ready for final review (assuming it passes Travis tests). I rebased against master to get the docs all straight. |
|
If anyone would like more time to review this then let me know, otherwise I'll plan to merge tomorrow. |
Use setitem syntax to create a new Table column
|
@taldcroft - Sorry I didn't get back to you on this - been a bit swamped... but it looks great to me, anyway, so fortunately there was no need :) |
Hi,
I've started using the Table class and really like it, but one annoyance is that one has to know whether a column exists before assigning to it, as, when it does not, an exception is thrown. Would it be an idea to try to create the column automatically (i.e., have behaviour more closely similar to dict),
i.e.,
mytable['new'] = mytable['old']
or
mytable['new'] = 'test'
would create a new column. A very simple (and likely imperfect) implementation could replace the current setitem with: