Inconsistency in result of comparing unmasked Table column to value #1446

astrofrog · 2013-10-05T17:35:07Z

If I compare a numerical column in a Table to a value (e.g. t['col0'] == 1 where col0 is numbers) the result is a Column object. If I compare a string column in a Table to a value (e.g. t['col0'] == 'a' where col0 is strings) the result is a numpy ndarray.

This specific problem--inconsistency between the returned type based on the data type of the column--does not occur in masked Tables, though there is different bug in that case reported in a separate issue.

The code below reproduces the problem in Anaconda 1.7 and Astropy 0.2.4

from astropy.table import Table

numbers = [1, 2, 3]
strings = ['a', 'b', 'c']

t_numbers = Table(data=[numbers])
comparison_numbers = (t_numbers['col0'] == 1)
type(comparison_numbers)  # this is a Column object
print comparison_numbers

t_strings = Table(data=[strings])
comparison_strings = (t_strings['col0'] == 'a')
type(comparison_strings)  # this is a numpy ndarray object
print comparison_strings

taldcroft · 2013-09-13T11:18:21Z

That's pretty odd, thanks for reporting this. I think the desired output in every case is a plain boolean numpy array. I'm not sure if this code path ends up going through __array_wrap__ or not.

astrofrog · 2013-09-13T18:54:25Z

@taldcroft - I think that, as you suggested before, we should look into adapting part of what we learned for Quantity to the Column class. I could try and look into it either tomorrow or later next week.

taldcroft · 2013-09-13T18:56:23Z

@astrofrog - That would be good. I'm sure your numpy-fu now vastly exceeds our collective knowledge when we did Column.

astrofrog · 2013-10-05T17:38:32Z

@mwcraig @taldcroft - I need your feedback here. Currently, the attached fix means that all the examples shown higher up in this issue return a boolean array. Now we need to think a bit more about what the expected behavior should be:

column == scalar: return numpy boolean array showing element-wise equality with scalar
column1 == column2: return numpy boolean array element-wise? or return a single True/False if the whole column including meta-data match? (Quantity does the first)
column == array: return element-wise equality?

In all cases, I don't think we should ever return a column object, right? Thinking more into the future, how do we deal with cases where the values are the same but the units are different?

mhvk · 2013-10-05T21:50:31Z

@astrofrog - my sense would be to behave just like Quantity, i.e., treat the columns as quantity arrays for the comparison (including the units). Eventually, Table[column] should return a quantity, certainly if a unit is present (but probably dimensionless_unscaled if not).

mwcraig · 2013-10-06T02:31:55Z

@astrofrog - Coming from using atpy I'm used to columns acting like an array in the sense that column1 == column2 should return a boolean element-wise rather than a single True/False; I'd expect to have to use .all() for that. I'd also think column == array should return element-wise equality.

Though I'd expect to be able to create a column from the result it is hard for me to see how to automatically generate a sensible name/description if the result is a column.

As far as units go, with the caveat that I haven't used Unit or Quantity much yet, I'd expect that:

if I'm comparing two things with units then the units will be taken into account, like I think it is with a Quantity.
if I compare two things without units (either because they don't have units or because I've neglected to include units) then units won't matter.
If I compare something with units to something without units then I'm asking astropy to do something nonsensical and it should raise an exception.

astrofrog · 2013-10-06T07:03:53Z

Ok, so now what about the following scenario: comparing two masked columns where the mask is the same and the unmasked values are the same, but where the masked values are not. I think it should return True in this case?

Note that some tests are failing because they were testing e.g. [--, 5, --] == [4, 5, 6] and expecting this to be all True because the values 'behind' the mask are 4 and 6. @taldcroft, do you agree I should change this so this example returns [False, True, False] instead?

mhvk · 2013-10-06T07:24:30Z

@astrofrog - maybe for the masked case, the array returned should be masked as well, i.e., you should return [--, True, --]? (I just checked -- numpy does this too).

taldcroft · 2013-10-06T12:36:44Z

@astrofrog - I agree with @mwcraig that all of the comparisons should return element by element boolean values.

I'm a little bit leery of raising an exception when comparing a table column that has units with a plain array. I count myself among the 99% of astronomers that are used to working with arrays with a unit label (or more likely just knowing within the code what units your array value has), rather than our new Quantities. Even though the latter has clear advantages, I don't think adoption within the broad community will be that fast.

Until a table column is a Quantity (which is not entirely trivial to make happen) we should not enforce all the Quantity-type behaviors.

taldcroft · 2013-10-06T12:38:45Z

@astrofrog - about masking comparisons, your second comment on this matches my expectation. Remember, the value behind the mask should be treated as meaningless. If a value is masked then you don't know the real value so a comparison is likewise "don't know".

astrofrog · 2013-10-06T12:42:43Z

@taldcroft @mhvk - thanks for the input! I'll try and make sure this all works like you mentioned.

astrofrog · 2013-10-06T21:41:39Z

@mhvk @taldcroft - ok, I think it should all be behaving properly now. Note that we don't need to return a column, an array is sufficient, since it is very easy to create a new column:

In [1]: from astropy.table import Table

In [2]: t = Table()

In [3]: t['a'] = [1,2,3]

In [4]: t['b'] = t['a'] == 3

In [5]: print(t)
 a    b  
--- -----
  1 False
  2 False
  3  True

Also, masking behaves as expected:

In [6]: t = Table(masked=True)

In [7]: t['a'] = [1,2,3]

In [8]: t['a'].mask = [0,1,0]

In [9]: t['a'] == 1
Out[9]: 
masked_array(data = [True -- False],
             mask = [False  True False],
       fill_value = True)


In [10]: t['a'] == t['a']
Out[10]: 
masked_array(data = [True -- True],
             mask = [False  True False],
       fill_value = True)

mhvk · 2013-10-07T11:30:00Z

@astrofrog - this looks good. I guess ones columns start to hold quantities, there data attribute would presumably hold the quantity, so this would remain correct. (@taldcroft - I always treated units as labels as well, but it has been amazing how powerful quantities are now that they are subclasses from array; I expect that with Tables it will be similarly good!)

mwcraig · 2013-10-07T17:20:30Z

@taldcroft -- I should have been clear that I'm definitely in the 99% too (or maybe worse, I still don't even have units as labels yet :)).

Perhaps a better way to say what I meant is that if a table column has a Quantity in it (and I don't know that this is even possible, I haven't used Quantity at all yet) then comparisons of that column to a non-Quantity should be handled in whatever way comparison of a Quantity to a non-Quantity is handle when not part of a Table.

taldcroft · 2013-10-07T18:56:31Z

I played around this for a bit and this looks good. It reproduces some numpy behavior that I find a little surprising, but I think that reproducing the numpy behavior is fine.

In [14]: 1 == t['a']
Out[14]: array([ True, False, False], dtype=bool)

In [15]: np.ones(4) == t['a']
Out[15]: False  # To me this should be an exception, but so be it.

In [16]: np.ones(3) == t['a']
Out[16]: array([ True, False, False], dtype=bool)

@astrofrog - Even though the previous behavior was not desired nor consistent, we should still mark this as an API change in the CHANGES.rst document. Once that is done you can go ahead and merge.

…rs for equality.

astrofrog · 2013-10-08T14:52:14Z

@taldcroft - I added an entry to CHANGES.rst and made a couple of improvements in the test, so let's see what Travis says. Does CHANGES.rst look ok?

taldcroft · 2013-10-08T14:56:28Z

CHANGES.rst looks fine. Feel free to merge assuming Travis passes.

Inconsistency in result of comparing unmasked Table column to value

mwcraig mentioned this pull request Sep 13, 2013

Error when displaying the result of comparison of masked Table column to a value #1447

Closed

ghost assigned astrofrog Sep 13, 2013

Implement __eq__ for Column objects so we can guarantee the behavior

5c3f788

Fixed typo

f3e6787

Preserve mask when comparing masked arrays/columns, and fix column test.

c1bca3e

astrofrog added 2 commits October 8, 2013 16:44

Added entry to CHANGES.rst

6b43a44

__array_priority__ isn't needed, and properly test both argument orde…

2385d56

…rs for equality.

astrofrog added a commit that referenced this pull request Oct 8, 2013

Merge pull request #1446 from astrofrog/table-column-eq

15fe277

Inconsistency in result of comparing unmasked Table column to value

astrofrog merged commit 15fe277 into astropy:master Oct 8, 2013

taldcroft mentioned this pull request Nov 29, 2013

Supporting ndarray subclasses in table columns #1839

Closed

mhvk mentioned this pull request Sep 19, 2014

Ensure Column-ness is stripped for boolean ufuncs; faster copy #2963

Merged

astrofrog deleted the table-column-eq branch July 5, 2016 15:53

Uh oh!

Inconsistency in result of comparing unmasked Table column to value #1446

Inconsistency in result of comparing unmasked Table column to value #1446

Conversation

astrofrog commented Oct 5, 2013

Uh oh!

taldcroft commented Sep 13, 2013

Uh oh!

astrofrog commented Sep 13, 2013

Uh oh!

taldcroft commented Sep 13, 2013

Uh oh!

astrofrog commented Oct 5, 2013

Uh oh!

mhvk commented Oct 5, 2013

Uh oh!

mwcraig commented Oct 6, 2013

Uh oh!

astrofrog commented Oct 6, 2013

Uh oh!

mhvk commented Oct 6, 2013

Uh oh!

taldcroft commented Oct 6, 2013

Uh oh!

taldcroft commented Oct 6, 2013

Uh oh!

astrofrog commented Oct 6, 2013

Uh oh!

astrofrog commented Oct 6, 2013

Uh oh!

mhvk commented Oct 7, 2013

Uh oh!

mwcraig commented Oct 7, 2013

Uh oh!

taldcroft commented Oct 7, 2013

Uh oh!

astrofrog commented Oct 8, 2013

Uh oh!

taldcroft commented Oct 8, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants