Skip to content

Conversation

@mdboom
Copy link
Contributor

@mdboom mdboom commented Jan 30, 2013

This comes out of a discussion in #669.

Right now, the following behavior is in astropy:

>>> from astropy import units as u
>>> u1 = (u.km / u.m).decompose()
>>> u2 = (u.m / u.m).decompose()
>>> u1
Unit(dimensionless)
>>> u2
Unit(dimensionless)
>>> u1 == u.dimensionless
False
>>> u2 == u.dimensionless
True
>>> u1.is_dimensionless()
True
>>> u2.is_dimensionless()
True

This is counter-intuitive, because two things that say they are "dimensionless" give different behavior when compared to u.dimensionless.

This is "correct" behavior, because two units with different scale factors are different, but the fact that they both appear to be the same is problematic.

cc @mdboom

@mdboom
Copy link
Contributor

mdboom commented Jan 29, 2013

Would updating the __repr__ to display the scale be enough, do you think?

The (u.km / u.m) != u.dimensionless is correct behavior, IMHO, but perhaps u.dimensionless should be renamed? Note that (u.km / u.m).is_equivalent(u.dimensionless) is True, which is also correct. Perhaps just making the __repr__ of u.dimensionless be Unit(1 dimensionless) will be sufficient to alleviate confusion.

@embray
Copy link
Member

embray commented Jan 29, 2013

I agree that (u.km / u.m) != u.dimensionless is correct. But (u.km / u.m).decompose() != u.dimensionless is what confused me. The reason for it is because they both have different scales.

I think displaying the scale in the __repr__ would help.

@eteq
Copy link
Member Author

eteq commented Jan 29, 2013

Definitely adding the scale to __repr__ makes sense. I would say it's even fine to do Unit(dimensionless) if the scale is 1, but display the scale for all other cases.

Going that route, then leaving dimensionless with that name is fine. I think the natural assumption is that "dimensionless" also means "without scale"... in fact, I think the naturalness of that assumption is what causes the confusion about this in the first place!

I think it might make sense to add a paragraph or two in the docs about this, though. There's already a section called "The dimensionless unit", so stating there that dimensionless also means without scale would help, as would and including a few examples like the one I have at the top of this issue.

@astrofrog
Copy link
Member

@mdboom - this is what confused me:

In [2]: (u.km / u.m).is_dimensionless()
Out[2]: True  # this is confusing
In [3]: (u.km / u.m) == u.dimensionless
Out[3]: False

to me, both are syntactically the same, so as a user, I don't understand why they are different.

@mdboom
Copy link
Contributor

mdboom commented Jan 30, 2013

@astrofrog: Your confusion suggest to me that one or the other should probably be renamed. But to what? u.dimensionless_and_unscaled? (Not a really suggestion, given how verbose it is).

@embray
Copy link
Member

embray commented Jan 30, 2013

What about is_unity?

@mdboom
Copy link
Contributor

mdboom commented Jan 30, 2013

I would expect is_unity to check for both unscaled and dimensionless units, whereas the current behavior of is_dimensionless is only about the units.

@astrofrog
Copy link
Member

What about having a variable u.none instead of u.dimensionless? (faster to type than u.dimensionless_and_unscaled.

@eteq
Copy link
Member Author

eteq commented Jan 30, 2013

I don't like none because it's too close to None.

This is a slightly more substantial suggestion, but seems less ambiguous to me:

  • leave dimensionless exactly as is
  • remove the is_dimensionless method
  • change UnitBase.physical_type to give 'dimensionless' instead of the current result, which is 'unknown' (this would be good to do regardless)
  • replace all uses of <something>.is_dimensionless() with <something>.physical_type == 'dimensionless'

That's really what we mean with is_dimensionless, right? That the type of the unit be dimensionless, regarless of the scale? Then the dimensionless object is dimensionless and scale free, but the @astrofrog's syntactic ambiguity with is_dimensionless goes away.

@mdboom
Copy link
Contributor

mdboom commented Jan 31, 2013

@eteq: That's not a bad suggestion, but I wonder if it's wise to remove the fast path for the is_dimensionless check, as shown by @astrofrog's benchmarks in #669? I think we still need something to specifically serve that purpose, under some name or other.

I do agree that physical_type should be updated to do the right thing with dimensionless units in any event.

@astrofrog
Copy link
Member

We could have a check is_unity() which checks that bases==[] and scale==1? (since that means it's the same as Unit(1). Then is_dimensionless() can stay the same?

@astrofrog
Copy link
Member

This looks a bit confusing:

In [2]: (u.km / u.m).decompose()
Out[2]: Unit(1000.0 dimensionless)

I wonder whether for dimensionless, the output should be

In [2]: (u.km / u.m).decompose()
Out[2]: Unit(dimensionless with scale 1000.0)

while for any other units, what you have is fine, e.g.

In [3]: (u.km).decompose()
Out[3]: Unit("1.000000e+03 m")

@mdboom
Copy link
Contributor

mdboom commented Jan 31, 2013

@astrofrog: That makes sense -- to put the scale at the end. I'll make that change -- and I guess you'll want to merge or cherry-pick the new commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example will need to be updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to, before the last commit. I thought it was more helpful to print it than not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant - shouldn't it say

Unit(dimensionless with a scale of 1000.0)

?

@astrofrog
Copy link
Member

Apart from my small comments above, this will be ready to go. I've re-tagged as 0.2 since it's needed fo #669 which we also want in 0.2.

@eteq
Copy link
Member Author

eteq commented Feb 1, 2013

I think we still want to get rid of is_dimensionless. I say this mainly because without doing this, it doesn't address the original problem that @astrofrog pointed out; that is_dimensionless and == u.dimensionless ( or possibly is u.dimensionless) don't mean the same thing.

I have a fairly comprehensive set of timings below (taken from this PR's branch). The only case where is_dimensionless is faster that .physical_type == 'dimensionless' is for an irreducible (or prefix) unit. That might be an important enough case that we need to work around it somehow, though. Perhaps the physical type should be cached?

In [42]: kmm = u.km/u.m

In [43]: mm = u.m/u.m



In [44]: kmm.is_dimensionless()
Out[44]: True

In [45]: timeit kmm.is_dimensionless()
10000 loops, best of 3: 47 us per loop

In [63]: kmm.physical_type == 'dimensionless'
Out[63]: True

In [48]: timeit kmm.physical_type == 'dimensionless'
10000 loops, best of 3: 53 us per loop

In [49]: kmm == u.dimensionless
Out[49]: False

In [50]: timeit kmm == u.dimensionless
1000 loops, best of 3: 164 us per loop

In [51]: kmm is u.dimensionless
Out[51]: False

In [52]: timeit kmm is u.dimensionless
10000000 loops, best of 3: 168 ns per loop



In [53]: mm.is_dimensionless()
Out[53]: True

In [54]: timeit mm.is_dimensionless()
100000 loops, best of 3: 9.14 us per loop

In [57]: mm.physical_type == 'dimensionless'
Out[57]: True

In [58]: timeit mm.physical_type == 'dimensionless'
100000 loops, best of 3: 16.5 us per loop

In [59]: mm == u.dimensionless
Out[59]: True

In [60]: timeit mm == u.dimensionless
10000 loops, best of 3: 157 us per loop

In [61]: mm is u.dimensionless
Out[61]: False

In [62]: timeit mm is u.dimensionless
1000000 loops, best of 3: 205 ns per loop

In [84]: timeit u.km.physical_type=='dimensionless'
10000 loops, best of 3: 45.6 us per loop

In [85]: timeit u.km.is_dimensionless()
1000000 loops, best of 3: 566 ns per loop

@eteq
Copy link
Member Author

eteq commented Feb 1, 2013

I did some further tests (which I won't copy here to not fill up the comments even further) that show that mm.physical_type == 'dimensionless' and mm.decompose().scale==1 and similar are within a factor of 2 (sometimes slower, sometimes faster) as is_unity (except for the irreducible/prefix case, which I assume is due to the physical_type thing I noted above).

So I'm not sure we even need is_unity; the above seems more explicit (which is good here, I think).

And even if you disagree and want to keep is_unity, I think the name needs to be changed. This class is one that I think users should be able to always just look at the code and know what's happening. The term "unity" is somewhat ambiguous, and I don't think I would grasp the fact that it has two pieces (dimensionlessness and no scale) if I hadn't been reading this PR.

@astrofrog
Copy link
Member

@eteq - if we are going to use

mm.physical_type == 'dimensionless' and mm.decompose().scale==1

instead of

is_unity

then we could just as well use

mm.bases and mm.scale == 1

as it's more concise. But it'd be nice to have a convenience method, so I think we should still keep is_unity, though I'm fine with trying to find a better name. I'll think about it.

@astrofrog
Copy link
Member

Looking into it more, I think the current implementation of is_unity is not exactly what I had in mind yet. For example, if I have (silly) units of mm * m / micron / km then in #669 I don't want to treat that as being dimensionless, but is_unity will return True - I really only want the cases where the unit is Unit(1) but without having to instantiate Unit(1). So I think the name is_unity is fine, but I think I'd rather have it test that bases is empty and scale is one. @eteq - do you agree the name would be ok with such an implementation?

In order to get #669 moving, I'll simply define a convenience function _is_unity() there for now, so that we don't have to rush this discussion for 0.2.0 if we don't reach a consensus.

@astrofrog
Copy link
Member

I'm going to +1 @eteq's suggestion from earlier to remove is_dimensionless. I think all we really need is:

some_unit.physical_type == 'dimensionless'

which is testing that the overall type is not length, mass, etc. What I also wanted, but will not insist on a convenience method that is equivalent to

some_unit.bases == [] and some_unit.scale == 1.

and that is what I would call is_unity. However, I would also be happy with is_empty, or is_one for example. And if there's no consensus, we don't have to have a user convenience method for this.

@astrofrog
Copy link
Member

Since #669 doesn't depend on this PR anymore, shall we re-tag it as 0.2.1 or 0.3? Or do we want to fix it for 0.2?

@mdboom
Copy link
Contributor

mdboom commented Feb 1, 2013

@eteq: Those benchmarks are helpful, but keep in mind many are doing different things. u.is_dimensionless() and u.physical_type == 'dimensionless' are essentially equivalent, but the others are going to give different results, particularly because they aren't working with decomposed units. (And x is u.dimensionless is not useful here).

So I think we need to first decide what we're trying to achieve, then we can decide what to call it, and then optimize if necessary.

@astrofrog: I don't understand your rationale for not wanting to decompose the unit first before checking whether it is unity. I think it's very unlikely that the user would create a unity quantity directly (which is the only case under which what you suggest would work). It's much more likely that the user would do something like:

Quantity(5, u.m) / Quantity(2, u.m)

and this should be allowed to implicitly convert to a float in the context of what we're discussing for #669.

I think some_unit.bases == [] and some_unit.scale == 1 is just wrong for what we want there -- I know that's what I originally suggested -- but the unit really should be decomposed first. And I think we should have some sort of convenience function for this purpose (even if it's private) as this PR illustrates how difficult getting it right can be.

@mdboom
Copy link
Contributor

mdboom commented Feb 1, 2013

Also, I think I'm fine with removing is_dimensionless and steering people to using physical_type == 'dimensionless' in the docs.

@astrofrog
Copy link
Member

@mdboom - regarding #669, I've replied directly there. I think removing is_dimensionless is the right thing to do.

Regarding is_unity, I really can't think of a better name - that is exactly what it's checking - that the unit is basically 1. I agree is_dimensionless was ambiguous, but I don't think users will expect m / km for example to pass is_unity. It's got a sufficiently unusual name that they can look at the docs (whereas is_dimensionless was more dangerous in that respect).

@astrofrog
Copy link
Member

@iguananaut - do you have any opinions about is_unity?

@astrofrog
Copy link
Member

@mdboom - by the way, I think my definition of unity did cater for cases such as:

Quantity(5, u.m) / Quantity(2, u.m)

because:

In [9]: (u.m / u.m).bases
Out[9]: []

In [10]: (u.m / u.m).scale
Out[10]: 1

But I've updated #669 to bring is_unity in line with this PR.

@mdboom
Copy link
Contributor

mdboom commented Feb 1, 2013

Yes, but here's the problem (and come to think of it, we should probably add a test for this):

>>> x = (u.Unit(1000.0 * u.m) / u.km)
>>> x.bases
[Unit("m"), Unit("km")]
>>> x.scale
1000.0
>>> y = x.decompose()
>>> y.bases
[]
>>> y.scale
1.0

@astrofrog
Copy link
Member

Ah, I see - ok, so now I see that this makes sense!

@astrofrog
Copy link
Member

Well, I think u.dimensionless would be better as u.unity - @eteq - do you also oppose that name, since it would no longer be a method, so users would likely need to read the docs to find out about it? (u.<tab> isn't very helpful)

@eteq
Copy link
Member Author

eteq commented Feb 4, 2013

@astrofrog - In my previous posts I was actually referring to the dimensionless object, not the (current) is_unity() method. In the inline comment I was suggesting ditching is_unity (or whatever it's named), or making it private on the grounds that it appeared to me (and @astrofrog) that in the current version, u.m / u.m == u.dimensionless seemed identical to is_unity(u.m / u.m). So that's why I was only talking about the attribute. Sorry that wasn't very clear.

With that in mind, yes, I think u.unity is still ambiguous. In my head, unity is basically another word for "1" (as long as we're not in group theory...), so u.unit is the same as u.Unit(1). And that's exactly what the whole reason was for creating the dimensionless object; Unit(1) had different interpretations to different people.

I see @mdboom's point on scale_and_dimensionless, though. I'd be ok with unscaled_and_dimensionless. Or dimensionless_unscaled/unscaled_dimensionless (slightly shorter, still gets the point across).

It doesn't really bother me that much that this is a long name. If it starts with dimensionless..., u.dim<tab> gets me what I want in IPython, and when writing code meant for others' consumption it's still less work than the comment I would inevitably have to write to explain what I mean by either dimensionless being unscaled or the fact that Unit(1) means dimensionless.

Sorry to be a pain, but I think this carries quite a bit of potential for confusion because most other aspects of unit are so easy to use... And this is just an inherently complicated case, it seems.

@astrofrog
Copy link
Member

I agree we can do u.dimensionless_and_unscaled. It is long, but hopefully doesn't need to be used much, and if it does, it will be unambiguous. I think in this case, clarity trumps conciseness. If in future we think of how to say it more concisely, we can always add a shorter alias. @mdboom - what do you think?

@mdboom
Copy link
Contributor

mdboom commented Feb 4, 2013

I think we're reaching consensus here. is_unity() goes away. u.dimensionless is changed to dimensionless_unscaled (the and isn't strictly necessary to understand it, as @eteq points out).

@astrofrog
Copy link
Member

Ok, that sounds good to me. Then, just for information, should one do my_unit.decompose() == u.dimensionless_unscaled to do what is_unity currently does?

@eteq
Copy link
Member Author

eteq commented Feb 4, 2013

👍 to @mdboom's plan here.

@mdboom
Copy link
Contributor

mdboom commented Feb 4, 2013

@astrofrog: Yes.

@astrofrog
Copy link
Member

Great - let's go with that!

@eteq
Copy link
Member Author

eteq commented Feb 5, 2013

The travis build passed without complaint. Looks good to me!

Should we get rid of the _is_unity from #669, or leave it in? (either here or a later PR)

@astrofrog
Copy link
Member

@eteq - I think we can just fix it in a separate PR in future - it's a private method, so no rush.

@astrofrog
Copy link
Member

Looks good to me too!

mdboom added a commit that referenced this pull request Feb 5, 2013
Dimensionless unit looks the same regardless of scale
@mdboom mdboom merged commit 4a9044e into astropy:master Feb 5, 2013
@embray
Copy link
Member

embray commented Feb 6, 2013

I've completely lost track of this at this point--but this should be backported for 0.2 final right?

@mdboom
Copy link
Contributor

mdboom commented Feb 6, 2013

I would think so. It's pretty low risk.

@astrofrog
Copy link
Member

I agree, it should be backported.

mdboom added a commit that referenced this pull request Feb 8, 2013
Dimensionless unit looks the same regardless of scale
keflavich pushed a commit to keflavich/astropy that referenced this pull request Oct 9, 2013
Dimensionless unit looks the same regardless of scale
@mdboom mdboom deleted the unit/dimensionless_repr branch May 21, 2014 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants