Skip to content

Conversation

@astrofrog
Copy link
Member

If I compare a numerical column in a Table to a value (e.g. t['col0'] == 1 where col0 is numbers) the result is a Column object. If I compare a string column in a Table to a value (e.g. t['col0'] == 'a' where col0 is strings) the result is a numpy ndarray.

This specific problem--inconsistency between the returned type based on the data type of the column--does not occur in masked Tables, though there is different bug in that case reported in a separate issue.

The code below reproduces the problem in Anaconda 1.7 and Astropy 0.2.4

from astropy.table import Table

numbers = [1, 2, 3]
strings = ['a', 'b', 'c']

t_numbers = Table(data=[numbers])
comparison_numbers = (t_numbers['col0'] == 1)
type(comparison_numbers)  # this is a Column object
print comparison_numbers

t_strings = Table(data=[strings])
comparison_strings = (t_strings['col0'] == 'a')
type(comparison_strings)  # this is a numpy ndarray object
print comparison_strings

@taldcroft
Copy link
Member

That's pretty odd, thanks for reporting this. I think the desired output in every case is a plain boolean numpy array. I'm not sure if this code path ends up going through __array_wrap__ or not.

@astrofrog
Copy link
Member

@taldcroft - I think that, as you suggested before, we should look into adapting part of what we learned for Quantity to the Column class. I could try and look into it either tomorrow or later next week.

@taldcroft
Copy link
Member

@astrofrog - That would be good. I'm sure your numpy-fu now vastly exceeds our collective knowledge when we did Column.

@ghost ghost assigned astrofrog Sep 13, 2013
@astrofrog
Copy link
Member

@mwcraig @taldcroft - I need your feedback here. Currently, the attached fix means that all the examples shown higher up in this issue return a boolean array. Now we need to think a bit more about what the expected behavior should be:

  • column == scalar: return numpy boolean array showing element-wise equality with scalar
  • column1 == column2: return numpy boolean array element-wise? or return a single True/False if the whole column including meta-data match? (Quantity does the first)
  • column == array: return element-wise equality?

In all cases, I don't think we should ever return a column object, right? Thinking more into the future, how do we deal with cases where the values are the same but the units are different?

@mhvk
Copy link
Contributor

mhvk commented Oct 5, 2013

@astrofrog - my sense would be to behave just like Quantity, i.e., treat the columns as quantity arrays for the comparison (including the units). Eventually, Table[column] should return a quantity, certainly if a unit is present (but probably dimensionless_unscaled if not).

@mwcraig
Copy link
Member Author

mwcraig commented Oct 6, 2013

@astrofrog - Coming from using atpy I'm used to columns acting like an array in the sense that column1 == column2 should return a boolean element-wise rather than a single True/False; I'd expect to have to use .all() for that. I'd also think column == array should return element-wise equality.

Though I'd expect to be able to create a column from the result it is hard for me to see how to automatically generate a sensible name/description if the result is a column.

As far as units go, with the caveat that I haven't used Unit or Quantity much yet, I'd expect that:

  • if I'm comparing two things with units then the units will be taken into account, like I think it is with a Quantity.
  • if I compare two things without units (either because they don't have units or because I've neglected to include units) then units won't matter.
  • If I compare something with units to something without units then I'm asking astropy to do something nonsensical and it should raise an exception.

@astrofrog
Copy link
Member

Ok, so now what about the following scenario: comparing two masked columns where the mask is the same and the unmasked values are the same, but where the masked values are not. I think it should return True in this case?

Note that some tests are failing because they were testing e.g. [--, 5, --] == [4, 5, 6] and expecting this to be all True because the values 'behind' the mask are 4 and 6. @taldcroft, do you agree I should change this so this example returns [False, True, False] instead?

@mhvk
Copy link
Contributor

mhvk commented Oct 6, 2013

@astrofrog - maybe for the masked case, the array returned should be masked as well, i.e., you should return [--, True, --]? (I just checked -- numpy does this too).

@taldcroft
Copy link
Member

@astrofrog - I agree with @mwcraig that all of the comparisons should return element by element boolean values.

I'm a little bit leery of raising an exception when comparing a table column that has units with a plain array. I count myself among the 99% of astronomers that are used to working with arrays with a unit label (or more likely just knowing within the code what units your array value has), rather than our new Quantities. Even though the latter has clear advantages, I don't think adoption within the broad community will be that fast.

Until a table column is a Quantity (which is not entirely trivial to make happen) we should not enforce all the Quantity-type behaviors.

@taldcroft
Copy link
Member

@astrofrog - about masking comparisons, your second comment on this matches my expectation. Remember, the value behind the mask should be treated as meaningless. If a value is masked then you don't know the real value so a comparison is likewise "don't know".

@astrofrog
Copy link
Member

@taldcroft @mhvk - thanks for the input! I'll try and make sure this all works like you mentioned.

@astrofrog
Copy link
Member

@mhvk @taldcroft - ok, I think it should all be behaving properly now. Note that we don't need to return a column, an array is sufficient, since it is very easy to create a new column:

In [1]: from astropy.table import Table

In [2]: t = Table()

In [3]: t['a'] = [1,2,3]

In [4]: t['b'] = t['a'] == 3

In [5]: print(t)
 a    b  
--- -----
  1 False
  2 False
  3  True

Also, masking behaves as expected:

In [6]: t = Table(masked=True)

In [7]: t['a'] = [1,2,3]

In [8]: t['a'].mask = [0,1,0]

In [9]: t['a'] == 1
Out[9]: 
masked_array(data = [True -- False],
             mask = [False  True False],
       fill_value = True)


In [10]: t['a'] == t['a']
Out[10]: 
masked_array(data = [True -- True],
             mask = [False  True False],
       fill_value = True)

@mhvk
Copy link
Contributor

mhvk commented Oct 7, 2013

@astrofrog - this looks good. I guess ones columns start to hold quantities, there data attribute would presumably hold the quantity, so this would remain correct. (@taldcroft - I always treated units as labels as well, but it has been amazing how powerful quantities are now that they are subclasses from array; I expect that with Tables it will be similarly good!)

@mwcraig
Copy link
Member Author

mwcraig commented Oct 7, 2013

@taldcroft -- I should have been clear that I'm definitely in the 99% too (or maybe worse, I still don't even have units as labels yet :)).

Perhaps a better way to say what I meant is that if a table column has a Quantity in it (and I don't know that this is even possible, I haven't used Quantity at all yet) then comparisons of that column to a non-Quantity should be handled in whatever way comparison of a Quantity to a non-Quantity is handle when not part of a Table.

@taldcroft
Copy link
Member

I played around this for a bit and this looks good. It reproduces some numpy behavior that I find a little surprising, but I think that reproducing the numpy behavior is fine.

In [14]: 1 == t['a']
Out[14]: array([ True, False, False], dtype=bool)

In [15]: np.ones(4) == t['a']
Out[15]: False  # To me this should be an exception, but so be it.

In [16]: np.ones(3) == t['a']
Out[16]: array([ True, False, False], dtype=bool)

@astrofrog - Even though the previous behavior was not desired nor consistent, we should still mark this as an API change in the CHANGES.rst document. Once that is done you can go ahead and merge.

@astrofrog
Copy link
Member

@taldcroft - I added an entry to CHANGES.rst and made a couple of improvements in the test, so let's see what Travis says. Does CHANGES.rst look ok?

@taldcroft
Copy link
Member

CHANGES.rst looks fine. Feel free to merge assuming Travis passes.

astrofrog added a commit that referenced this pull request Oct 8, 2013
Inconsistency in result of comparing unmasked Table column to value
@astrofrog astrofrog merged commit 15fe277 into astropy:master Oct 8, 2013
@astrofrog astrofrog deleted the table-column-eq branch July 5, 2016 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants