Skip to content

Conversation

@astrofrog
Copy link
Member

This is an attempt at a fix for #8642, but I need @taldcroft's input.

Basically replace_column doesn't work properly for TimeSeries because the TimeSeries initializer requires some kind of times to be passed to it. But in any case, it seems like we don't really need to use self.__class__ in replace_column unless I'm missing something?

I don't fully understand why this is needed in the first place and why we can't simply modify the table column dictionary in-place, so @taldcroft if you could provide some background, that would be really useful, and I can try and add some comments in the code to explain.

If this is not an acceptable solution, I'll just overload replace_column in TimeSeries instead.

@astrofrog astrofrog added table Bug Affects-dev PRs and issues that do not impact an existing Astropy release no-changelog-entry-needed timeseries labels May 14, 2019
@astrofrog astrofrog added this to the v3.2 milestone May 14, 2019
@astrofrog astrofrog requested a review from taldcroft May 14, 2019 16:24
@taldcroft
Copy link
Member

@astrofrog - some quick comments:

  • In general, I think there is a fair bit of code left over from the days when the table data was stored internally as a numpy structured array. That's one driver for the shenanigans that you see here. There might be opportunity for clean-up here and in some places, however see the next point:
  • By going through this generalized init code, the appropriate (for the class) data coercion is done:
    • Making arrays, lists, etc into Column as needed
    • Turning Quantity into Column + units for Table.
  • Your solution of using a hardwired Table will (at least temporarily) turn a Quantity into a Column with units. It should come back to Quantity but we probably don't want that round-trip happening in general.

This gets back to a comment I made on the original TimeSeries PR that I think it should be able to be created without time inputs. IIRC the Table code assumes in some places that a subclass can be created with a standard data input. (This is one such place).

I think @mhvk had made reference to some comp-sci principle that a subclass should pass all the tests of the parent class. I'm not sure I'd go all the way on that, but there is something to be said for a TimeSeries object acting mostly like a normal QTable from the init API perspective.

@mhvk
Copy link
Contributor

mhvk commented May 14, 2019

Liskov substitution principle - I'm also not sure whether it is always practical to hold, but found it very helpful in thinking about classes.

t = self.__class__([col], names=[name])
# We deliberately use ``Table`` here since it should be sufficient
# and it is just meant as a temporary container for the column
t = Table([col], names=[name])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the goal here is to ensure the input can be turned into a column. A fairly safe replacement would seem to be

if not isinstance(getattr(col, 'info', None), BaseColumnInfo):
    col = Column(col, name=name)
cols = OrderedDict(self.columns)
cols[name] = col
self._init_from_cols(cols.values())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a test case (using current master) that should be included. I don't think the above would pass this:

In [11]: t = simple_table()
In [12]: b = np.ma.MaskedArray([4, 5, 6])
In [13]: b[1] = np.ma.masked
In [14]: t.replace_column('b', b)
INFO: Upgrading Table to masked Table. Use Table.filled() to convert to unmasked table. [astropy.table.table]

In [15]: t
Out[15]: 
<Table masked=True length=3>
  a     b    c  
int64 int64 str1
----- ----- ----
    1     4    c
    2    --    d
    3     6    e

I've been thinking about a better solution but nothing great has come to mind. I believe that the implementation currently in this PR will work, I'm just a little unhappy about the case where a Quantity gets converted to Column and then back to Quantity because of Table. I thought about using QTable there instead, but that is worse because it can end up sending Quantity objects into Table._init_from_cols, which is not expecting that and will leave them as Quantity.

So maybe a TimeSeries method overload is best, where you use QTable. Or better yet fix TimeSeries so it works with this existing method. 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a combination of the two - use the Table only for objects that don't have BaseColumnInfo? I.e., something like

if not isinstance(getattr(col, 'info', None), BaseColumnInfo):
    col = Table([col], names=[name])[name]
...

@saimn
Copy link
Contributor

saimn commented May 15, 2019

Btw, there is also the related issue from #8077 with replace_column, which also prevents to save the ts table from the timeseries docs example, because of the index on the time column.

@bsipocz bsipocz modified the milestones: v3.2, v3.2.1, v3.2.2 Jun 9, 2019
@pllim pllim removed Affects-dev PRs and issues that do not impact an existing Astropy release no-changelog-entry-needed labels Jun 19, 2019
@pllim
Copy link
Member

pllim commented Jun 19, 2019

I don't think this only "affects dev" anymore, so I removed the label. This now needs a change log under bug fix.

@taldcroft
Copy link
Member

BTW there will be an easier solution with #8789.

@bsipocz
Copy link
Member

bsipocz commented Jun 22, 2019

@astrofrog - Now that #8789 is being merged, could you please revise this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants