Skip to content

Conversation

@isuruf
Copy link
Member

@isuruf isuruf commented Mar 28, 2017

Partial fix for #4596

Here's a small benchmark from the issue

from sympy import *
from sympy.core.cache import *
knl = sympify("hankel_1(0, k*sqrt((a0 + b0*tau)**2 + (a1 + b1*tau)**2))")

def func(dist, simplify):
    with(distribute(dist)):
        ops=diff(knl, "tau", 5, simplify=simplify)
        print(ops.count_ops())

clear_cache()
%time func(True, True)
clear_cache()
%time func(True, False)
clear_cache()
%time func(False, True)
clear_cache()
%time func(False, False)

with the output,

417
CPU times: user 744 ms, sys: 0 ns, total: 744 ms
Wall time: 742 ms
3133
CPU times: user 276 ms, sys: 0 ns, total: 276 ms
Wall time: 277 ms
417
CPU times: user 340 ms, sys: 0 ns, total: 340 ms
Wall time: 339 ms
488
CPU times: user 156 ms, sys: 0 ns, total: 156 ms
Wall time: 154 ms

For this code, when add*number is not distributed with the global option, op count is 488, whereas if it was distributed, op count is 3133. Simplify with both options give the same op count of 417.

Here's some code from the docs where it says, "Because there can be a significant amount of simplification that can be done when multiple differentiations are performed, results will be automatically simplified ..."

>>> from sympy import sqrt, diff
>>> from sympy.abc import x
>>> e = sqrt((x + 1)**2 + x)
>>> diff(e, x, 5, simplify=False).count_ops()
136
>>> diff(e, x, 5).count_ops()
30

Here's with no automatic distribution,

>>> from sympy import *
>>> from sympy.abc import x
>>> e = sqrt((x + 1)**2 + x)
>>> with distribute(False):
...     print(diff(e, x, 5, simplify=False).count_ops())
... 
37
>>> with distribute(False):
...     print(diff(e, x, 5).count_ops())
... 
33

Note that here the increase in the opcount for simplify=True is probably because a bug with simplify.

Thanks to @skirpichev, for figuring out where changes were necessary.

@asmeurer
Copy link
Member

Is this intended to be a stopgap? I think we should aim to completely remove it (no options).

Copy link
Member

@certik certik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes seem simple enough that I think this can be merged now.

I agree that we should simply not distribute at all, this was a mistake to introduce. But in the meantime this PR should help.

@certik
Copy link
Member

certik commented Mar 29, 2017

Note: In SymEngine we do not distribute automatically, which is the correct way.

>>> from symengine import *
>>> var("x y")
(x, y)
>>> 2*(x+y)
2*(x + y)
>>> -(x+y)
-(x + y)

@asmeurer
Copy link
Member

If it's a stopgap we should not add public API for it, as we'd then have to maintain it even after we change the default.

@certik
Copy link
Member

certik commented Mar 29, 2017

How would we make this a private API?

@asmeurer
Copy link
Member

asmeurer commented Mar 29, 2017

I am talking about the distributed context manager, which has been added to __init__.py here.

@certik
Copy link
Member

certik commented Mar 29, 2017 via email

@isuruf
Copy link
Member Author

isuruf commented Mar 29, 2017

I've added this option because this is a major change. There are hundreds of doctests and unit tests failing when I change the default. For downstream projects, this may be the case also.
I'll remove the context manager, stop automatic distribution and fix the tests if that's okay. Also should we distribute -(x+y) or not? There wasn't any agreement in the issue.

@certik
Copy link
Member

certik commented Mar 29, 2017

I wouldn't distribute -1 either.

We can make the default not to distribute, and then provide this decorator for downstream projects so that they can keep the old behavior as a temporary workaround until they implement proper fixes.

@isuruf
Copy link
Member Author

isuruf commented Mar 29, 2017

Wow, there are 410 tests failing when I switch the default. https://travis-ci.org/isuruf/sympy/jobs/216195057
And some of them are non-trivial.

In [3]: refine((-1)**((-1)**x/2 - S.Half), Q.integer(x))   # expected (-1)**x
Out[3]: (-1)**((-1)**x/2 + 3/2)         

@inducer
Copy link
Contributor

inducer commented Mar 29, 2017

Thanks @isuruf for working on this!

@certik
Copy link
Member

certik commented Mar 29, 2017

@asmeurer how do you want to proceed? Do you want to merge this (after making the decorator not a public API), and then work on the failures.

Or perhaps we can do the switch, provide a decorator for old behavior, and then use that decorator as a temporary workaround for failing tests, that way everything should pass, and then we work on it, one by one.

This is a big change, so I would like some direction from you which way it should be done.

@asmeurer
Copy link
Member

asmeurer commented Mar 29, 2017

If the context manager is necessary to do the change piecemeal then that's fine. I'm only worried about supporting both distribution and nondistribution long after the change is completely done. In general, we should avoid having global options, especially in the core, as they provide different code paths that remain mostly untested. And I don't think we want the whole test suite to pass with both global_distribute = True and global_distribute = False, in other words, the context manager is guaranteed to not break any code that goes under it. That would mean that all code has to assume that both cases are possible. (By the way, I don't much like the global_evaluate flag either, for the same reason. I'd much rather see something like the recently added UnevaluatedExpr gain traction)

But maybe this is not a huge deal. We can

  • Add the context manager. I would keep it out of __init__.py. That isn't to assume that people like @inducer won't use it outside of the library, but at least it won't gain common usage.
  • Use it to make the change across SymPy in a piecemeal fashion.
  • Once we can change the default, remove the decorator across SymPy. Or at some point we may be able to change the default and only use the context manager in places where it needs to be enabled, instead of the other way around.
  • Once the context manager is no longer used at all, deprecate it and the global option.

The deprecation is unnecessary if this all happens before a release, but that seems unlikely.

@asmeurer
Copy link
Member

And I agree not to distribute anything, even -(x - y). This probably makes fixing SymPy even harder, so if it becomes easier to change SymPy without this at first, that's fine, but we should aim to not distribute anything. From the people I've heard that don't like the automatic distribution, it applies to all distribution, even -1.

@skirpichev
Copy link
Contributor

skirpichev commented Mar 30, 2017

Here's a small benchmark from the issue

How does this justify removing usage of distributive law in flatten? Looks like a very special case.

@isuruf, what will be your next step? Are you planing to eventually fix all tests and switch to distributive=False? If so, I doubt that this context manager will help in such slow migration: supporting codebase where part of code runs in context distributive(False) and another part not - will be nightmare.

Disclaimer: my lazy attempts to fix referenced issue were motivated by another reasons, not performance. For instance, to allow default Mul.flatten be generic enough to handle more common cases of algebras, where distributivity doesn't holds. E.g. interval arithmetic.

@isuruf
Copy link
Member Author

isuruf commented Mar 31, 2017

How does this justify removing usage of distributive law in flatten? Looks like a very special case.

Take a simple example like (x**2 + x)**100 and differentiate 20 times without simplifying. It takes 1 minute to run it and has 356558 ops. With simplify, it takes 3 minutes. Without distributing Add over Number, computation takes 0.1 seconds and has 94 ops.

There might be other reasons not to do this though. We are not distributing a * (x + y), so why special case Number?

@isuruf, what will be your next step? Are you planing to eventually fix all tests and switch to distributive=False? If so, I doubt that this context manager will help in such slow migration: supporting codebase where part of code runs in context distributive(False) and another part not - will be nightmare.

True. I added the context manager to provide the user with an option. Now I realized that SymPy has the assumption that the expression is distributed all over the place and very complicated examples fail when you switch.

As to my next step, I'm not sure. I didn't expect examples like #12440 (comment) to fail.

@asmeurer
Copy link
Member

That's why automatic evaluation is bad. You end up with code all over the place that implicitly relies on it, and stops working or even gives wrong results if you try to remove it. I had the same thing years ago when I removed the exp(x)*exp(y) -> exp(x + y) behavior. As I remember many of the failures from that were nontrivial to fix.

For this issue, it's actually pretty easy in SymPy right now to get an expression that isn't distributed. Lots of functions like factor use evaluate=False to keep distribution from happening, and you can also prevent it with different multiplication order (compare 2*x*(x + y) and 2*(x + y)*x). So if any of the test failures are wrong answers, I'd be really worried (for the refine one above, the answer isn't as simplified, but still mathematically correct, since a factor of 2 has been added in the exponent).

Now I realized that SymPy has the assumption that the expression is distributed all over the place and very complicated examples fail when you switch.

So do you still think the context manager is useful? Or is it too hard to isolate code where it can safely be turned off?

@isuruf
Copy link
Member Author

isuruf commented Mar 31, 2017

So do you still think the context manager is useful?

I still think it will be useful until we switch. SymPy users who only does stuff like arithmetic, differentiation, matrix operations and codegen will find it useful.

Or is it too hard to isolate code where it can safely be turned off?

For the examples I mentioned, it's easy to isolate where the problem is. (Pow._eval_derivative). I don't want to do evaluate = False if we go that route since I do want x * x to be simplified to x**2. Maybe a distribute=False option to Mul.

@asmeurer
Copy link
Member

I think there may already be some helper functions in the core for creating undistributed products.

@skirpichev
Copy link
Contributor

Probably, you want distribute -1. Huge amount of broken tests arising from this.

isuruf added 3 commits August 29, 2017 16:26
It assumed that the Expr is commutative, but now it is explicitly
added.
@isuruf isuruf changed the title [WIP] Global option to stop automatic distribution Global option to stop automatic distribution Aug 29, 2017
@isuruf
Copy link
Member Author

isuruf commented Aug 30, 2017

I've removed distribute contextmanager from the public API and fixed the tests

@asmeurer
Copy link
Member

Can you add a note to the docstring that it will likely go away once automatic distribution is removed entirely. Other than that I am +1

@isuruf
Copy link
Member Author

isuruf commented Aug 31, 2017

Fixed

@asmeurer asmeurer merged commit d510e60 into sympy:master Aug 31, 2017
@certik
Copy link
Member

certik commented Aug 31, 2017

Thanks @isuruf !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants