Skip to content

Conversation

@taldcroft
Copy link
Member

This is essentially the same as #1833, #1835, #1839 but will be done in the framework of #2790 as a follow-on once that is merged.

The starting point here will be fed00dc, which removed the proof-of-concept changes showing that mixin columns would be straightforward in the new column-based Table structure. In this issue (and eventual PR) we can define the right way to support mixin columns like Quantity, Time, and SkyCoord. The original proof-of-concept used a some simple class / instance variables, but @mhvk has proposed using an abstract base class.

Closes #1833
Closes #1835
Closes #1839

@taldcroft
Copy link
Member Author

I started going on a prototype implementation based on the previous one that was pulled from #2790:
taldcroft#11

@mhvk
Copy link
Contributor

mhvk commented Oct 23, 2014

Following on #3049, I thought for optimal duck typing it would make sense to have a default way to insert new items. But I realize I actually couldn't think of a good method, so all the more reason to have the insert method (where, as in either implementation suggested for Quantity in #3049, we ensure we are compatible with what list.insert would do).

@taldcroft
Copy link
Member Author

Apart from the obvious of not realizing np.insert could do all the work in #3049, I was thinking about a generic requirement that a mixin column would need an insert method that adheres to the list.insert API. If the mixin doesn't have that then an attempt to add_row would fail. Fortunately the np.insert API is a compatible superset of list.insert, so we're good.

I'll add insert to BaseColumn and then we can get a Table.insert_row for free.

@mhvk
Copy link
Contributor

mhvk commented Oct 23, 2014

I like where this is going (though for consistency with add_column, I would add an index to add_row -- and perhaps have a general insert method which can do either).

But I'm not sure we should require the insert method -- if I have some random thing with multiple items, it would be a bit sad not to be able to use it in a Table just because it cannot be lengthened (which is not the most common action).

@mhvk
Copy link
Contributor

mhvk commented Oct 23, 2014

Actually, as a fall-back option, one could try whether the class initialises with a list of its own items, i.e.,

def _backup_insert_method(..., column, index, value):
    collist = [colitem for colitem in column]
    return column.__class__(colllist.insert(index, value))

This works for current Quantity and Time, but not SkyCoord. Still, arguably too error-prone.

@taldcroft
Copy link
Member Author

@mhvk - only add_row would fail if one of the columns didn't have an insert method. So the general plan is to support functionality as possible, dictated by the capabilities of the table columns.

@taldcroft
Copy link
Member Author

@mhvk - the concern with your generic backup method is that there may be other attributes that are required to get back to the same object. E.g. for Time you need the scale. I don't think it's possible to know how to do that in a generic way, but maybe ???

@mhvk
Copy link
Contributor

mhvk commented Oct 23, 2014

Agreed with the idea to support functionailty as possible; we'll just have to have a clear description of what's needed -- I think the absolute minimum is __getitem__ and __len__, with copy perhaps required as well..

@taldcroft
Copy link
Member Author

BTW, on your idea of a generic insert, I see that if the value is actually a scalar object of the right type then this can (sometimes) work. I had been thinking of being able to add a row value in a lighter way like

t.add_row((1, '2001-01-02T03:04:05.6', 3*u.m))

That is to say anything that would initialize the corresponding class correctly assuming all the other attributes (e.g. scale, format) are set appropriately. Perhaps that's too ambitious or ambiguous. But in any case I think that requiring an insert method is the only way to really be sure because then the class is responsible for knowing all the details and defining the allowed inputs.

@taldcroft
Copy link
Member Author

About copy, I wonder if we can just define a helper function like:

def _col_copy(col):
    return col.copy() if hasattr(col, 'copy') else deepcopy(col)

@taldcroft
Copy link
Member Author

On duck typing, if mixin column object has at least one element what about doing a little deeper testing by trying the following:

col[:1]  # supports slicing
col[0]  # item index
assert col[0:][0] == col[0]  # slice indexing
assert col[[0]][0] == col[0]  # fancy indexing
assert col[np.array([0])][0] == col[0]  # fancy indexing

Even if we don't do this test, I think these are valid interface requirements for a mixin column to fully support the Table API.

@mhvk
Copy link
Contributor

mhvk commented Oct 23, 2014

Yes, seem good requirements for a mixin column! And the helper function for copy seems a good idea as well.

@taldcroft taldcroft changed the title Support Table mixin columns using #2790 WIP: support Table mixin columns Dec 9, 2014
@taldcroft
Copy link
Member Author

@mhvk @astrofrog - I have been slowly tweaking the mixin column support in the background and it is finally ready to appear as a rebased WIP pull request on master.

In spite of my recent email to the contrary, I found that it wasn't so hard to support high-level operations for mixins. The only real limitation is that they can't have masked elements. So for now Time and SkyCoord are still in the mix along with Quantity. What I do want to take from de-scoping is not declaring a public API / protocol for mixin columns. In that regard you'll see that a simple class attribute _astropy_mixin_column = True is now used, with no guarantee that anything works for types other than Quantity (and hopefully Time, Coordinate). For 1.1 we can work out a public API that we are all happy with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I would add

elif isinstance(col, (Time, SkyCoord)):
    is_mixin = True

With that, we do not have to touch Quantity, Time, and SkyCoord at all, which I think is better as we will still experiment with what is the best way to indicate a column can be used as a mixin. At the same time, by keeping the else: clause we can have easy experiments with other types of objects by jut setting object._astropy_mixin_column = True.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but Time still needs to be touched for the _table_format attribute. Unfortunately there is a conflict with the existing Time.format vs. the column format. With three years hindsight, I wish I had chosen representation instead of format in Time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. Of course, this would need local imports of Time and SkyCoord -- I think that is fine since by this time is_mixin will generally have been set already. So, maybe:

else:
    is_mixin = getattr(col, '_astropy_mixin_column', False)
    if not is_mixin:
        from ..time import Time
        from ..coordinates import SkyCoord
        is_mixin = isinstance(col, (Time, SkyCoord))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to look in the MRO of col for a class __name__ that matches 'Time' or 'SkyCoord'? Is that robust?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record: fine with postponing this to future updates.

@taldcroft
Copy link
Member Author

I noticed that joins can't be keyed on mixin columns. Does that include Quantities?

@eteq - good point. Looking more closely I realized the real restriction is that key columns should be an ndarray subclass. So Quantity is now allowed as a key column.

@taldcroft
Copy link
Member Author

@eteq - about the representation of mixin columns like SkyCoord, this is indeed a tricky problem that needs attention for 1.1. As @mhvk says we need a more powerful infrastructure to handle this.

There is a bit of a slippery slope here, because once you can write out all the RA Dec (and distance) values, then you realize that the equinox and obstime might be a good idea, and for that matter the location.. which is to say a coordinate can have a lot of nested metadata that is critical to round-tripping. The ECSV format is the only hope here, and even that requires some dedicated serialization machinery for SkyCoord and Time and EarthLocation. But perfect is the enemy of good, and just getting the numbers there in a decent format would be good. 😄

@taldcroft
Copy link
Member Author

@eteq - thanks for the review, I think I've addressed all your concerns.

@astrofrog
Copy link
Member

@taldcroft - thanks! is this good to merge or are there any other changes you want to make? (if so, feel free to do so)

@taldcroft
Copy link
Member Author

I think this is ready and there are no other changes I want to make right now.

@astrofrog
Copy link
Member

In my view, the issue with SkyCoord is that __str__ should not give a repr, but a nicely formatted version, similar to Time:

In [7]: t = Time.now()

In [8]: str(t)
Out[8]: '2015-01-20 12:46:46.735742'

In [9]: repr(t)
Out[9]: "<Time object: scale='utc' format='datetime' value=2015-01-20 12:46:46.735742>"

This is for a separate issue though. But having the current repr for SkyCoord is also going to cause issue for any comma-separated format (since it includes commas). I think str should give a result closer to to_string.

@astrofrog
Copy link
Member

@taldcroft - what do you think of #3315?

@astrofrog
Copy link
Member

All right, let's do this!

astrofrog added a commit that referenced this pull request Jan 20, 2015
@astrofrog astrofrog merged commit fdacbe0 into astropy:master Jan 20, 2015
@astrofrog
Copy link
Member

Thanks for all your work on this @taldcroft!

@astrofrog astrofrog mentioned this pull request Jan 20, 2015
10 tasks
@mhvk
Copy link
Contributor

mhvk commented Jan 20, 2015

Yay!!!

@taldcroft
Copy link
Member Author

Woohoo!

@embray
Copy link
Member

embray commented Jan 20, 2015

I haven't been really following the development of this feature so I might be confused. But at some point it might be nice to have a way for classes to register a function or something to return their column attributes, rather than having to define an _astropy_column_attrs or be composed with a mixin class or anything.

@Cadair
Copy link
Member

Cadair commented Jan 20, 2015

🎆

@mhvk
Copy link
Contributor

mhvk commented Jan 20, 2015

@embray - I'm still hoping that classes can also just register with a ColumnABC to tell Table that they have everything that is needed to look like a column (indeed, I'd still hope at least to give the option to just ducktype it). For the astropy classes, the idea would be to allow a more elegant column attributes class that would allow, e.g., to change the format of __str__ when part of a column (so that, e.g., Quantity subclasses do not print the unit every row).

@embray
Copy link
Member

embray commented Jan 20, 2015

Ah, the ColumnABC idea (not 100% crazy about the name but it's not wrong either) makes sense.

Though I was also thinking that it might one day be desirable to store other objects as columns without having to make a special class or wrapper for it first. But we can cross that bridge if/when the need arises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Affects-dev PRs and issues that do not impact an existing Astropy release coordinates Effort-high Package-expert table time units

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants