Restrict available functions for mathematica parsing #13544

quasi-human · 2017-10-27T17:59:24Z

Problem:
So far, there is no input check phase, hence all the functions in
the input are just let go with lowercase.
For example, spell-missed 'Arcsin[1/2]' is parsed to 'pi/6'.

Improved:
I set up a barricade, and only registered functions in dictionary
are able to be parsed. It is a large restriction, but it would be
safer. There are two good reasons for supporting this.

(1) We can exclude invalid functions mentioned above.

(2) Not all functions with the same name between Mathematica and
sympy do also the same work.

Example)
Log[2, 4] = 2 (Mathematica)
log(4, 2) = 2 (sympy)

Base positions are different!

So, we really need to check a function one by one about how it
works in Mathematica and sympy. Then parse it not only the name,
but also the movement.

Above the reasons, we should restrict available function in parsing.

Future task:
(1) Increase the number of available functions.

(2) In the present, list and matrix is not taken into account. My code
throws a ValueError if it finds '{' or '}'. This point should be
improved.

asmeurer · 2017-10-27T21:33:14Z

I don't know if strict restriction is the best option. How about a default, which just takes the lowercase of the Mathematica name? That way, it will work for most functions, even if it isn't known about beforehand in this file.

(From #13533) Problem: So far, there is no input check phase, hence all the functions in the input are just let go with lowercase. For example, spell-missed 'Arcsin[1/2]' is parsed to 'pi/6'. Improved: I set up a barricade, and only registered functions in dictionary are able to be parsed. It is a large restriction, but it would be safer. There are two good reasons for supporting this. (1) We can exclude invalid functions mentioned above. (2) Not all functions with the same name between Mathematica and sympy do also the same work. Example) Log[2, 4] = 2 (Mathematica) log(4, 2) = 2 (sympy) Base positions are different! So, we really need to check a function one by one about how it works in Mathematica and sympy. Then parse it not only the name, but also the movement. Above the reasons, we should restrict available function in parsing. Future task: (1) Increase the number of available functions. (2) In the present, list and matrix is not taken into account. My code throws a ValueError if it finds '{' or '}'. This point should be improved.

I removed the restrictions and any functions can be passed with taking the lowercase. For example: 'ArcSin[1/2]' is parsed to 'pi/6' 'Arcsin[1/2]' is parsed to 'arcsin(1/2)' 'Function[1/2]' is parsed to 'function(1/2)'

quasi-human · 2017-10-28T02:07:12Z

I modified the code and removed the restrictions.

I also have considered which is the best option. As you mentioned, I think most of Mathematica functions work correctly in SymPy with just taking the lowercase. It is convenient that we can parse more numbers of functions. However, I am worried that one day a user would perhaps get an unpredictable result because of an unchecked function which is apparently parsed well but actually gives a wrong calculation in SymPy, coming from a similar origin in Log[2, 4] != log(2,4) problem.

smichr · 2017-10-28T14:09:37Z

sympy/parsing/mathematica.py

+
+    F_swap = {
+        # when you say log(2,4) in Mathematica, base is 2.
+        ('Log', 2): 'log',


Would there be any advantage to introducing a logb wrapper function in SymPy with the syntax logb(b, a) = log(a, b)? Then Mathematic Log(x, y) could be passed directly to logb(x, y)

It would be a nice idea if only this logarithmic function needs an argument swap. But, I have no idea how many functions there are which need such a treatment. I do not know if it is the best choice to introduce a wrapper function in SymPy every time when facing this kind of matters.

I think that would just be confusing. Mathematica clearly chose its order because when you write log_b(x) the base is written before the argument. SymPy uses its convention because the base is an optional argument (defaulting to E), and in Python optional arguments go after required arguments.

There are other SymPy functions where the second argument is optional, or where the argument order is effectively arbitrary, and I wouldn't be surprised if Mathematica chose a different convention for some of them as well.

In my opinion, a parsing function should be written without affecting any other SymPy files.

Upabjojr · 2017-10-28T17:04:50Z

sympy/parsing/mathematica.py

+        ('Mod', 2): 'Mod',
+        ('Sqrt', 1): 'sqrt',
+        ('Exp', 1): 'exp',
+        ('Log', 1): 'log',


Log may have one or two parameters.

In F_swap dictionary, ('Log', 2): 'log' is defined.

Upabjojr · 2017-10-28T17:06:06Z

sympy/parsing/mathematica.py

-class _Parse(object):
-    '''An instance of this class converts basic Mathematica expression to
-    Python.'''
+class parse(object):


I usually prefer to have unique class names in the project, to avoid name collisions. But it's OK.

So, 'mathematcia_parse' could be better from the view point of name collisions.
But in this case, I also think it's OK.

Upabjojr · 2017-10-28T17:06:54Z

sympy/parsing/mathematica.py

-        (']', ')'),
-    )
+    # trigonometric, e.t.c.
+    for arc, tri, h in product(('', 'Arc'), (


Nice trick!

Upabjojr · 2017-10-28T17:17:35Z

I think we need a two-step parser in the future, to parse stuff like Times@{x,y,z}

asmeurer · 2017-10-28T20:55:49Z

@quasi-human you may have a point. I just discovered that Mathematica's Root is like our RootOf, that is, Root(f, n) represents the nth root of f = 0. Our root(x, n) represents x**(1/n). So a naive translation would be mathematically incorrect.

So a whitelist is probably fine, so long as it's easy for users to pass in known definitions to the parser function.

quasi-human · 2017-10-29T07:43:24Z

@Upabjojr Yes. Parsing a function would be broken down to two steps.
(1) Rename the function name properly.
(2) Check how the function works, and handle the original arguments properly.

quasi-human · 2017-10-29T08:01:14Z

@asmeurer Do you mean, we should add a method which enables users to add their own translation dictionary to the 'parse' class just in case they want to parse a function which is out of the whitelist?

asmeurer · 2017-10-29T09:26:22Z

Yes, I would add a flag to the function.

smichr · 2017-10-29T15:08:05Z

in Python optional arguments go after required arguments

A nice application of rule-breaking is the range function where, when a single argument is given it is the end of the range but when two are given the first becomes the start. The same thing could have be done with log; it could now be done with logb but I don't have strong feelings about it either way.

def logb(b, a=None):
    if a is not None:
        a, b = b, a
    return log(a, b) if b is not None else log(a)

asmeurer · 2017-10-29T22:07:01Z

Either way, I think it would be confusing to have two distinct log functions.

Actually, what I'd like to see is for log(x, b) to remain unevaluated. Then we could easily add shortcuts like log10 and log2.

But that's neither here nor there for this PR, obviously.

Upabjojr · 2017-10-29T22:46:10Z

Has anyone ever tried https://github.com/rljacobson/FoxySheep ?

quasi-human · 2017-10-30T08:07:34Z

@asmeurer I added 'add_translation' method to 'parse' class by which users can add their own translation dictionary. But I am not sure whether users would feel easy to use this or not. I hope you to check after passing the code quality tests.

    # Example 1
    In [1]: user_dic
    Out[1]: {'Log3[x]': 'log(x, 3)'}

    In [2]: parse.add_translation(user_dic)

    In [3]: mathematica('Log3[81]')
    Out[3]: 4

    # Example 2
    In [1]: user_dic
    Out[1]: {'Func[a]': 'sqrt(a)*exp(1/a)**2'}

    In [2]: parse.add_translation(user_dic)

    In [3]: mathematica('Func[3]')
    Out[3]: sqrt(3)*exp(2/3)

    # Example 3
    In [1]: user_dic
    Out[1]: {'Func[x, y, *z]': 'Max(*z)*(x**2 + y**2)'}

    In [2]: parse.add_translation(user_dic)

    In [3]: mathematica('Func[2,3,1,2,3,4,5,6,7,8,9,10]')
    Out[3]: 130

variable-length argument needs '*' character

Added a method by which users can add translation dictionary and parse functions out of the whitelist in Mathematica parsing or their own functions.

asmeurer · 2017-10-30T19:00:56Z

I would add a flag to the mathematica function itself, then use that to instantiate a parse instance with those extra arguments. That way, users don't have to use the internal parse object (which shouldn't be public API), and there is no modification of global state.

quasi-human · 2017-10-31T08:40:24Z

I introduced an extra argument to the mathematica function. Then users do not have to use parse function.
In [1]: mathematica('Log3[9]', {'Log3[x]':'log(x,3)'})
Out[1]: 2

asmeurer · 2017-11-01T01:47:48Z

sympy/parsing/mathematica.py

Why did you make this a class?

It is because I wanted the mathematica class to keep the parse instance.
The parse instance has a cache for the latest users' additional_translations dictionary. The additional_translations dictionary needs to be transformed internally for make parsing easier, and this cache saves time from transforming it every time when the mathematica is called.

asmeurer · 2017-11-01T01:47:58Z

sympy/parsing/mathematica.py

Should be additional_translations (plural)

I understand. If you or someone have a better idea about the name of additional_translations, please change it.

asmeurer · 2017-11-01T01:50:29Z

sympy/parsing/mathematica.py

Can this line ever be run? __new__ never returns an instance of this class.

In this code, I wanted mathematica class to behave like a function, instead of creating an instance.

asmeurer · 2017-11-01T01:52:41Z

I would just do something like

def mathematica(s, additional_translations=None):
    parser = MathematicaParser(additional_translations)
    return parser.parse(s)

Changes:

mathematica should be a function, not class.
Rename Parser to MathematicaParser
The default keyword value should be None. Don't use mutable defaults for keyword arguments!
Every call to mathematica creates a new parser instance.
The additional translations are passed in to the parser at instantiation time.

I introduced an extra argument as the 2nd one to the mathematica parsing function. Through this argument, users can add thier own translation dictionary to mathematica function.

quasi-human · 2017-11-01T11:59:47Z

Thank you for reviewing. I modified the code.

Changed the mathematica class to a function.
Renamed _Parse to MathematicaParser.
Set None as a default keyword value.
Made every call creates a new parser instance.
Let additional_translations passed to the parser at instantiation time.

Changed mathematica class to a function. Renamed _Parse to MathematicaParser. Set None as a default keyword value. Made every call creates a new parser instance. Let additional translations passed to the parser at instantiation time.

asmeurer · 2017-11-02T22:50:53Z

Any objections here? I'm no expert on Mathematica syntax. The Python style looks better now.

joha2 · 2017-11-21T07:31:58Z

I just read the comments on this pull request and just want to ask: Is the mathematica parser just a `function translator' or does it parse full Mathematica code? I programmed Mathematica for a long time and if you perform a FullForm on your expressions, you see that every Mathematica expression is of the form head[arg1, ..., argN] and nested.

Perhaps this helps for parsing full expressions if this has not been already done. This syntax scheme has several advantages, e.g. fast building a parsing tree, or quite fast searching for subexpressions. It has also a more higher coherence compared to Python or Sympy which I both love as very good open source alternatives. So keep on doing your very good job guys and I hope my comment was still helpful.

Best wishes
Johannes

asmeurer · 2017-11-21T19:10:47Z

The goal here is one of practicality. Calling FullForm shouldn't be a requirement, because in some cases you may simply have an expression written manually (or for instance, a formula copied from WolframAlpha). Obviously if you do call it first, though, it should be easy to do a full parse. Translating Wolfram Language in generality isn't useful because it has a lot of programmatic constructs which don't translate to SymPy expressions.

I think I'm going to merge this. I haven't heard any objections. If someone discovers a bug, they can open a new issue or pull request for it.

quasi-human changed the title ~~Restrict available functions for mathematica pursing~~ Restrict available functions for mathematica parsing Oct 28, 2017

quasi-human added 2 commits October 28, 2017 03:29

Remove restrictions of parsed function

349977c

I removed the restrictions and any functions can be passed with taking the lowercase. For example: 'ArcSin[1/2]' is parsed to 'pi/6' 'Arcsin[1/2]' is parsed to 'arcsin(1/2)' 'Function[1/2]' is parsed to 'function(1/2)'

Refactoring and add docstirngs in some functions.

7ba8d50

smichr reviewed Oct 28, 2017

View reviewed changes

Upabjojr reviewed Oct 28, 2017

View reviewed changes

Enbale users to parse thier own functions

0e0b79b

Added a method by which users can add translation dictionary and parse functions out of the whitelist in Mathematica parsing or their own functions.

asmeurer reviewed Nov 1, 2017

View reviewed changes

Add an arugment to mathematica parsing function

ceacdd0

I introduced an extra argument as the 2nd one to the mathematica parsing function. Through this argument, users can add thier own translation dictionary to mathematica function.

Modified mathematica class to function, etc.

598872e

Changed mathematica class to a function. Renamed _Parse to MathematicaParser. Set None as a default keyword value. Made every call creates a new parser instance. Let additional translations passed to the parser at instantiation time.

asmeurer merged commit 5cdce47 into sympy:master Nov 21, 2017

Uh oh!

Restrict available functions for mathematica parsing #13544

Restrict available functions for mathematica parsing #13544

Uh oh!

Conversation

quasi-human commented Oct 27, 2017

Uh oh!

asmeurer commented Oct 27, 2017

Uh oh!

quasi-human commented Oct 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Upabjojr commented Oct 28, 2017

Uh oh!

asmeurer commented Oct 28, 2017

Uh oh!

quasi-human commented Oct 29, 2017

Uh oh!

quasi-human commented Oct 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmeurer commented Oct 29, 2017

Uh oh!

smichr commented Oct 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmeurer commented Oct 29, 2017

Uh oh!

Upabjojr commented Oct 29, 2017

Uh oh!

quasi-human commented Oct 30, 2017

Uh oh!

asmeurer commented Oct 30, 2017

Uh oh!

quasi-human commented Oct 31, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asmeurer commented Nov 1, 2017

Uh oh!

quasi-human commented Nov 1, 2017

Uh oh!

asmeurer commented Nov 2, 2017

Uh oh!

joha2 commented Nov 21, 2017

Uh oh!

asmeurer commented Nov 21, 2017

Uh oh!

Reviewers

Assignees

quasi-human commented Oct 28, 2017 •

edited

Loading

quasi-human commented Oct 29, 2017 •

edited

Loading

smichr commented Oct 29, 2017 •

edited

Loading