Skip to content

Conversation

@mhvk
Copy link
Contributor

@mhvk mhvk commented Nov 27, 2013

Triggered by #1833, I thought I would also put the implementation @wkerzendorf and I have been working on for making generalized columns, which can hold any type of array. This is by no means finished, but as our approach is quite different, I thought it was useful to put it out there.

In the Column class, the addition to properly support Quantity and its sub-classes is remarkably trivial; one just needs to keep track of the few attributes that are needed beyond just the __class__ (_unit for Quantity; also _wrap_angle for Longitude; can just get these from __dict__).

For things other than ndarray subclasses, the main concept is that anything can become a column as long as it can turn itself into a properly sequenced array. We tried only for Time where one turns either just jd1, jd2 into a 2-item array (to be made structured), and stores all relevant attributes in a dictionary (e.g., lon, lat). If any of the attributes are also arrays, they get stored in the full array as well (e.g., delta_ut1_utc1, or indeed lon and lat). The class can take a Time view of this by letting jd1, etc., point to the appropriate parts of the combined array.

Note that in this implementation, Column no longer is a subclass of ndarray itself; it seemed essentially impossible to have it be that, and have things like np.sum(table['q']) return a Quantity if the corresponding column holds a Quantity. One advantage is that this holds the promise of just storing a MaskedArray in a Column, thus obliviating the need for MaskedColumn, or indeed having whole tables be masked when really only a single column needs to be.

Anyway, maybe best to give an example of what's possible now (ignoring that many test cases are now broken...):

In [1]: from astropy.time import Time, TimeDelta; from astropy.table import Table, Column; from astropy.coordinates import Angle, Longitude, Latitude; import numpy as np; import astropy.units as u

In [2]: t = Time(['2012-01-01', '2013-01-01'], scale='utc')

In [3]: l = Longitude(np.arange(10.,12.), 'deg')

In [4]: q = np.arange(2.)*u.m

In [5]: b = Table([q,t,l], names=['q','t','l'])

In [6]: b['t'], b['q'], b['l']
Out[6]: 
(<Column name='t' unit=None format=None description=None>
<Time object: scale='utc' format='iso' value=['2012-01-01 00:00:00.000' '2013-01-01 00:00:00.000']>,
 <Column name='q' unit='m' format=None description=None>
<Quantity [ 0., 1.] m>,
 <Column name='l' unit='deg' format=None description=None>
<Longitude [ 10., 11.] deg>)

In [7]: b['q'].to(u.cm)
Out[7]: <Quantity [   0., 100.] cm>

In [8]: b['t'].tdb
Out[8]: <Time object: scale='tdb' format='iso' value=['2012-01-01 00:01:06.184' '2013-01-01 00:01:07.184']>

In [9]: b['t'] + 1.*u.min
Out[9]: <Time object: scale='utc' format='iso' value=['2012-01-01 00:01:00.000' '2013-01-01 00:01:00.000']>

In [10]: b['q'] += 1.*u.cm

In [11]: b['q']
Out[11]: 
<Column name='q' unit='m' format=None description=None>
<Quantity [ 0.01, 1.01] m>

@taldcroft
Copy link
Member

This is a repeat from #1833 but I'll put in in this thread for the record.

Basically my view on any fundamental changes to Table is that they satisfy:

  1. All current tests pass
  2. The behavior of every allowed numpy operation on table columns is unchanged from current. Having deep numpy compatibility is crucial. We decided early on to inherit from ndarray and I think this has proved a good choice.
  3. Full support for all numpy array data types.
  4. Full support for masking (again with no API changes). Even though numpy.ma has its problems, it basically works quite well and there are no competing alternatives. We do not want to roll our own.

Even though Quantity is really great in a lot of contexts, I don't think it is ideal for a base column class. As we learned from Coordinates, there can be substantial overheads that aren't apparent in simple testing. But more importantly it is sufficiently different from ndarray that it would almost certainly create a lot of API breakage.

@embray
Copy link
Member

embray commented Dec 2, 2013

Agreed that Quantity should not be the base for all table columns--there are types of data one would store in a table that do not fit the model of physical quantities (string columns being the most obvious, but one could find other examples). That said, it should be possible to have a quantity column.

@mhvk
Copy link
Contributor Author

mhvk commented Jan 13, 2015

This will be cosed by #3011 - adding labels just so it doesn't show up as no-label any more.

@mhvk mhvk added the table label Jan 13, 2015
@mhvk mhvk deleted the table-generalized-columns-not-ndarray branch June 5, 2015 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants