Skip to content

Conversation

@mhvk
Copy link
Contributor

@mhvk mhvk commented Jun 15, 2015

Something came up in working on models with units, that I thought might be useful. Though if it's deemed too "weird" I could live without it (or perhaps this could be implemented just as a hack to use within the modeling package).

There are a couple of special cases in quantity arithmetic--namely the scalars 0.0 and inf, in that they may take "arbitrary" units (see here). This allows expressions like the following:

In [3]: q = 1 * u.m

In [4]: q + 0
Out[4]: <Quantity 1.0 m>

to be clear, this is only allowed at all with zero. q + 1 will raise an exception since you cannot add a quantity in meters to a dimensionless quantity (by default scalars get units of dimensionless_unscaled, except in the special case of zero and inf). In fact, q + 0 is different from q + (0 * u.dimensionless_unscaled). The latter is making it explicit that the zero we're adding must be a dimensionless quantity, and can't be added to meters. This is a good design choice.

However, it makes things a little inconvenient in how I'm trying to implement quantities in models. What I would like to be able to do is when quantities are being used to evaluate a model, it is convenient to automatically convert all inputs to the model to Quantity objects (so that everything has a .unit). In the spirit of how most Quantity arithmetic works, inputs without explicit quantities are automatically given units of dimensionless_unscaled. However, I can't do this if the input is zero (or inf) because that actually gives different results than if the model were evaluated with the scalar 0.0, with its ability to take arbitrary units.

Maybe that's just something that should be rethought outright, though I think it would be confusing if models have different evaluation behavior from other ufuncs with respect to how they treat 0 when combining it with dimensionful quantities.

So I think it might be convenient to be able to create a Quantity whose value is either 0 or inf, and has a special unit called "arbitrary" that just implies the existing behavior of the bare 0/inf, as distinct from dimensionless_unscaled. This would simplify a lot of code so I don't have to keep treating them as special cases.

@embray embray added the units label May 22, 2015
@pllim
Copy link
Member

pllim commented May 22, 2015

Since this is specific to how you wish to handle models, does it make more sense to make it a hidden internal unit within modeling, instead of putting in unit? I cannot see how such an arbitrary unit is useful to a regular astropy user.

@embray
Copy link
Member Author

embray commented May 22, 2015

@pllim That's sort of what I'm thinking--wherever it gets defined I don't think it should be a normally user-visible entity.

I'm wondering if this is a worthwhile approach at all though, or if I just need to rethink some things. I think it would be convenient though, as a way to encode this distinct behavior.

@mdboom
Copy link
Contributor

mdboom commented May 26, 2015

Cc: @mhvk

@pllim
Copy link
Member

pllim commented May 26, 2015

@embray , like you mentioned somewhere else, using this arbitrary unit instead of normal Numpy arrays will slow things down. Definitely something to think about.

@mhvk
Copy link
Contributor

mhvk commented May 26, 2015

I need to think a bit more about this, but like it in principle as a more formal way than the current _can_have_arbitrary_unit private function in quantity.py.

p.s Since this would be for quantity-fed models, it should not affect speed much I think (and Quantity will soon speed up, once __numpy_ufunc__ is properly available).

@embray
Copy link
Member Author

embray commented May 26, 2015

like you mentioned somewhere else, using this arbitrary unit instead of normal Numpy arrays will slow things down

@pllim I'm not sure what you're referencing. That's not the case. This is just needed in a few minimal contexts to wrap bare arrays with a "unit" indicating the behavior that the result will inherit units from any operands that do not have the "arbitrary" unit.

What does slow things down is that in order to determine whether or not the "arbitrary" unit is in use means checking the values of each input. If an input is a large array that means we need to scan through the entire array. I'm not sure if there's a way around that though, except to disallow this behavior in models. I'm still split on whether or not to do that.

@pllim
Copy link
Member

pllim commented May 26, 2015

@embray , perhaps I misunderstood your comment in the other discussion (can't find it now). I apologize. Regardless, I am also not sure which is the best approach.

@embray
Copy link
Member Author

embray commented May 27, 2015

If it helps, I'll see if I can come up with a PR for this so we can see what impact it has.

@mhvk
Copy link
Contributor

mhvk commented May 27, 2015

@embray - that would be great! I thought a little more about it and it seemed most logical to make a new unit class (perhaps subclassing IrreducibleUnit like UnrecognizedUnit, or maybe UnrecognizedUnit itself, so it has precedence over it in binary operations).

The main thing would be to think about the operators. I think logically any multiplication/division with another unit returns arbitrary unit (since it will have involved multiplying or dividing by 0 or infinity). An arbitrary unit would also seem always equal to any other unit, i.e., __eq__(self, other) returns isinstance(other, UnitBase). This then defines all other comparison operators as well, i.e., __le__ and __ge__ return True (or raise TypeError if other is not a unit). Similarly, _get_converter always returns a pass-through function.

@mhvk
Copy link
Contributor

mhvk commented May 27, 2015

p.s. I could also try an initial PR, though it would likely not be for a few days.

@embray
Copy link
Member Author

embray commented May 27, 2015

That sounds more or less along the lines of what I was going to do, though I appreciate you typing it out since this gives a sort of rough checklist to follow.

@mhvk
Copy link
Contributor

mhvk commented Jun 15, 2015

@embray - inspired by your work on getting Quantity support in modeling, I made an attempt at defining an arbitrary unit. Would this be what you are looking for? @mdboom - please have a look too, especially at (1) whether the way I ensure there is only one such unit makes sense; and (2) whether the implementation in CompositeUnit makes sense.

From the added documentation:

  >>> u.m == u.arbitrary
  True
  >>> u.m / u.s * u.arbitrary
  Unit("arbitrary")
  >>> u.m.to(u.arbitrary, 1.0)
  1.0

@pllim
Copy link
Member

pllim commented Jun 15, 2015

How about making it a little "hidden" like this? After all, this makes no sense for most astropy users.

u._arbitrary

@embray
Copy link
Member Author

embray commented Jun 15, 2015

Thanks for this--this looks mostly like what I had in mind. The following example is confusing to me though:

  >>> u.m / u.s * u.arbitrary
  Unit("arbitrary")

I would have expected

  >>> u.m / u.s * u.arbitrary
  Unit("m/s")

But maybe there's a good reason it has to be like your example?

@embray
Copy link
Member Author

embray commented Jun 15, 2015

To clarify my last example, in terms of Quantities I would expect something like

>>> Quantity(2.0, 'm/s') * Quantity(0.0, 'arbitrary')
Quantity(0.0, 'm/s')

I also agree with @pllim that we should make this for internal use only at the moment, though I have no problem with mentioning it in the changelog.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these print statements supposed to stay in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darn, no...

@mhvk
Copy link
Contributor

mhvk commented Jun 15, 2015

@embray - I think it has to be as I have it: if the dimension of anything is arbitrary, then multiplying/dividing it with another would leave it arbitrary.

For Quantity, we're mostly worried about adding/subtracting: there, one expects and gets:

>>> Quantity(2.0, 'm/s') + Quantity(0.0, 'arbitrary')
Quantity(2.0, 'm/s')

For multiplication, I think one does want what I wrote, because I would expect the following:

>>> Quantity(2.0, 'm/s') + Quantity(0.0, 'arbitrary') * Quantity(10., 'kg')
Quantity(2.0, 'm/s')

Here, of course, I'm really thinking that the 0 should propagate: the kg unit becomes irrelevant.

@mhvk mhvk force-pushed the unit-arbitrary branch from 5d2df8d to f34a20b Compare June 15, 2015 16:23
@mhvk
Copy link
Contributor

mhvk commented Jun 15, 2015

@pllim - I thought about making it hidden, but ended up feeling is was better to just have it documented. @embray mostly wants it for quantities with all 0, inf, nan, but in principle one could use it more generally. While I think this would mostly just throw away all the benefits of units, I didn't see a clear reason to preclude it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arbirary -> arbitrary

@embray
Copy link
Member Author

embray commented Jun 15, 2015

>>> Quantity(2.0, 'm/s') + Quantity(0.0, 'arbitrary') * Quantity(10., 'kg')
Quantity(2.0, 'm/s')

Here, of course, I'm really thinking that the 0 should propagate: the kg unit becomes irrelevant.

I'm not sure about that. Currently,

>>> 0.0 * (1.0 * u.kg)
<Quantity 0.0 kg>

In other words, the value is zero but it takes on the units of whatever Quantity zero was multiplied with. Then I think that adding "zero kilograms" to any number of meters should be an error, and it is:

>>> (0.0 * (1.0 * u.kg)) + (1.0 * u.m)
UnitConversionError: Can only apply 'add' function to quantities with compatible dimensions

I think only zero in "arbitrary" units should add with 1 meter. But when zero is multiplied with a quantity the result is not in arbitrary units, but in the units of whatever zero was multiplied with (as though it were dimensionless).

@embray
Copy link
Member Author

embray commented Jun 15, 2015

In other words, what I expected is already the current functionality, which I think is correct. The only thing adding this "arbitrary" unit changes is it makes it explicit when a value being passed around will have this behavior, so that I don't have to repeatedly check if a value is all 0/nan/inf.

@mhvk
Copy link
Contributor

mhvk commented Jun 15, 2015

@embray - the difference in your example is that you are, effectively, doing

>>> (0.0 * u.dimensionless_unscaled) * (1.0 * u.kg)
<Quantity 0.0 kg>

This is not the same as doing something with arbitrary units.

Of course, the multiplication issue is also not the one we're trying to solve -- it works already fine without taking any special care of 0. What is needed to solve is addition, etc., and assignment -- which is why _can_have_arbitrary_units is used only in quantity_helper.helper_twoarg* (addition, comparison, arctan2) and Quantity._to_own_unit (for setter).

@embray
Copy link
Member Author

embray commented Jun 15, 2015

There's no difference between

Quantity(0.0, u.dimensionless_unscaled) * Quantity(1.0, u.km)

and

Quantity(0.0, u.arbitrary) * Quantity(1.0, u.km)

with respect to multiplication. That's how things already work. What makes the "arbitrary" unit different from dimensionless is, as you say, its behavior with respect to addition.

But 0.0 * Quantity(1.0, u.km) returns a quantity in kilometers, so why should that then be allowed to be added to?

@embray
Copy link
Member Author

embray commented Jun 15, 2015

Let me reiterate: To me the only purpose of this arbitrary unit would be to encode the existing functionality. You're proposing something different and, IMO, broken.

Update: Perhaps to drive that point home we would even disallow creation of Quantities with "arbitrary" as the unit unless _can_have_arbitrary_unit is true for the value.

@mhvk mhvk force-pushed the unit-arbitrary branch from f34a20b to c16591a Compare June 15, 2015 18:06
@mhvk
Copy link
Contributor

mhvk commented Jun 15, 2015

@embray - 0 * Quantity(1.0, u.km) should indeed have dimension km and thus not be allowed to be added to anything but lengths, but why should that determine what 0 * u.arbitrary * Quantity(1.0, u.km) should do?

Further back to basics, I think it is obvious that 10 * (0. * u.arbitrary) should not become dimensionless, since that would break commutativity, as it would be different from (10 * 0) * u.abitrary. Now we could let (10.*u.dimensionless_unscaled) * (0. *u.arbitrary) become dimensionless, though this is harder than my PR, since it is somewhat baked into units that any normal number is treated as if it were dimensionless.

But mostly I guess I don't quite see why we should generally enforce that something with arbitrary units can only hold 0, inf, or nan; that is not the usual definition at all. I do of course see that you want it in models, but I thought the main point was that you would just run _can_have_arbitrary_units once, and if it is true, you assign the parameter to a quantity with arbitrary units.

@embray
Copy link
Member Author

embray commented Jun 15, 2015

The problem is that the way it's treated, zero does not have default units of dimensionless_unscaled, because that's not how it behaves in addition. It does behave the same in multiplication. But because of the different behavior in addition it is a special case. I'm saying that 0.0 is equivalent to Quantity(0.0, u.arbitrary) when used in expressions. It is never the same as saying zero is dimensionless_unscaled. Quantity(0.0, u.dimensionless_unscaled) is possible, and has different behavior from the literal 0.0.

@embray
Copy link
Member Author

embray commented Jun 15, 2015

As for your point about commutativity (actually associativity) I don't think it applies, because "arbitrary" is not a real unit and will not be used in such expressions. Or in other words, the algebra of units does not include an "arbitrary" value, so units including "arbitrary" does not play by the same rules. The only reason I proposed for it to exist is so that if I have a value that is meant to be treated as a Quantity I can always treat it as a Quantity, and not have to do things like isinstance(val, Quantity) or _can_have_arbitrary_unit(val).

@mhvk
Copy link
Contributor

mhvk commented Jun 15, 2015

OK, let's for the moment ignore the fact that there may be use for a general arbitrary unit (which I think there is) and focus on what you actually need!

I think you want to do parameter = Quantity(input, <something>) and not have to worry about whether it can be arbitrary in your code beyond having to possibly add something to the constructor. Is this correct?

One option would be to have no something and generally have Quantity(0.) produce the same as 0. * u.arbitrary instead of 0. * u.dimensionless_unscaled (as it does now; for non-arbitrary, it would remain dimensionless); and then, in operations it would behave as it does now, convert to other's unit if needed, otherwise be dimensionless. Doing this by default would be, I think, more magic than I'd like, and is also very costly, as it would imply testing any non-quantity input (and _can_have_arbitrary_units does not short-circuit its test...).

But an easy alternative would be to use <something> to tell Quantity that in interactions it should be treated as if it were a number still. Eg., one could imagine using u.Quantity(numbers, u._treat_as_number), which would do no extra work up front, but just call _can_have_arbitrary_units when needed. Indeed, is what you had in mind for the behaviour of u.arbitrary in the first place?

My logic had been a bit different: I assumed that one would need this behaviour only in situations where one knew that addition is going to be done, not multiplication, and that one would then explicitly use u.Quantity(a, u.arbitrary if _can_have_arbitrary_unit(a) else u.dimensionless_unscaled). I.e., I assumed the test would be done up front (of course, this could also be done with a flag, e.g., u.Quantity(a, 'maybe_arbitrary'), but again the idea would be that this would force an immediate test, so that the quantity's unit would either be dimensionless_unscaled or arbitrary).

@embray
Copy link
Member Author

embray commented Jun 15, 2015

But an easy alternative would be to use to tell Quantity that in interactions it should be treated as if it were a number still. Eg., one could imagine using u.Quantity(numbers, u._treat_as_number), which would do no extra work up front, but just call _can_have_arbitrary_units when needed. Indeed, is what you had in mind for the behaviour of u.arbitrary in the first place?

Yes. The only reason I referred to it as a "unit" is that it would go into the .unit attribute for the Quantity as a way to distinguish it from dimensionless_unscaled. But one would never use it manually in an arithmetic expression, which is why I also agreed with @pllim on using an underscore name.

I appreciate your effort to try to make this work more broadly but I don't think that really makes any sense in self-consistent system of units. And yes, I like the idea that Quantity(0.0) would get the "arbitrary" unit by default (as opposed to dimensionless_unscaled). I hadn't thought of it that way previously but I think that makes it clearer, and even easier to use!

@embray
Copy link
Member Author

embray commented Jun 15, 2015

My logic had been a bit different: I assumed that one would need this behaviour only in situations where one knew that addition is going to be done, not multiplication

Ah, I see how that led to confusion. No, my use case was quite the opposite--for the purposes of modeling the point is that the code has no idea what operations are going to be done on these values. I just wanted that the operations on an input of 0.0 would be consistent with the results one would get if they wrote out the expression for that model explicitly and inserted 0.0 into the relevant operands. It would just be vastly simpler if I could treat the value as a Quantity end to end, but there was no way to do that for zero. For other dimensionless values they could be automatically converted to Quantity(val, u.dimensionless_unscaled), but for zero (and inf/nan) just converting to a dimensionless_unscaled Quantity had different meaning.

@mhvk
Copy link
Contributor

mhvk commented Jun 16, 2015

@embray - OK, good to have it clarified. I think your use case requires quite a different PR than this one, and I'll think about how to implement a version of u._treat_as_number.

p.s. I'm not in favour of doing this by default -- really, this was a hack that makes a few numpy functions work which do comparisons with 0 internally, and there have been annoyed reports as is that 0 gets special treatment (e.g., #2959). But we can discuss once there is a PR whether this should be public or private.

@embray
Copy link
Member Author

embray commented Jun 16, 2015

I don't know that what it requires is completely different--most of this looks like what I had in mind, although there are probably some details I haven't considered. Anyways, I really appreciate you looking into this!

@pllim
Copy link
Member

pllim commented May 9, 2017

This PR is pretty old, there are conflicts, and at the end of the discussions suggested this is not the desired approach. So, closing.

@pllim pllim closed this May 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants