-
-
Notifications
You must be signed in to change notification settings - Fork 5k
Restrict available functions for mathematica parsing #13544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrict available functions for mathematica parsing #13544
Conversation
|
I don't know if strict restriction is the best option. How about a default, which just takes the lowercase of the Mathematica name? That way, it will work for most functions, even if it isn't known about beforehand in this file. |
(From #13533) Problem: So far, there is no input check phase, hence all the functions in the input are just let go with lowercase. For example, spell-missed 'Arcsin[1/2]' is parsed to 'pi/6'. Improved: I set up a barricade, and only registered functions in dictionary are able to be parsed. It is a large restriction, but it would be safer. There are two good reasons for supporting this. (1) We can exclude invalid functions mentioned above. (2) Not all functions with the same name between Mathematica and sympy do also the same work. Example) Log[2, 4] = 2 (Mathematica) log(4, 2) = 2 (sympy) Base positions are different! So, we really need to check a function one by one about how it works in Mathematica and sympy. Then parse it not only the name, but also the movement. Above the reasons, we should restrict available function in parsing. Future task: (1) Increase the number of available functions. (2) In the present, list and matrix is not taken into account. My code throws a ValueError if it finds '{' or '}'. This point should be improved.
I removed the restrictions and any functions can be passed with taking the lowercase. For example: 'ArcSin[1/2]' is parsed to 'pi/6' 'Arcsin[1/2]' is parsed to 'arcsin(1/2)' 'Function[1/2]' is parsed to 'function(1/2)'
|
I modified the code and removed the restrictions. I also have considered which is the best option. As you mentioned, I think most of Mathematica functions work correctly in SymPy with just taking the lowercase. It is convenient that we can parse more numbers of functions. However, I am worried that one day a user would perhaps get an unpredictable result because of an unchecked function which is apparently parsed well but actually gives a wrong calculation in SymPy, coming from a similar origin in Log[2, 4] != log(2,4) problem. |
sympy/parsing/mathematica.py
Outdated
|
|
||
| F_swap = { | ||
| # when you say log(2,4) in Mathematica, base is 2. | ||
| ('Log', 2): 'log', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be any advantage to introducing a logb wrapper function in SymPy with the syntax logb(b, a) = log(a, b)? Then Mathematic Log(x, y) could be passed directly to logb(x, y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a nice idea if only this logarithmic function needs an argument swap. But, I have no idea how many functions there are which need such a treatment. I do not know if it is the best choice to introduce a wrapper function in SymPy every time when facing this kind of matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would just be confusing. Mathematica clearly chose its order because when you write log_b(x) the base is written before the argument. SymPy uses its convention because the base is an optional argument (defaulting to E), and in Python optional arguments go after required arguments.
There are other SymPy functions where the second argument is optional, or where the argument order is effectively arbitrary, and I wouldn't be surprised if Mathematica chose a different convention for some of them as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, a parsing function should be written without affecting any other SymPy files.
sympy/parsing/mathematica.py
Outdated
| ('Mod', 2): 'Mod', | ||
| ('Sqrt', 1): 'sqrt', | ||
| ('Exp', 1): 'exp', | ||
| ('Log', 1): 'log', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log may have one or two parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In F_swap dictionary, ('Log', 2): 'log' is defined.
sympy/parsing/mathematica.py
Outdated
| class _Parse(object): | ||
| '''An instance of this class converts basic Mathematica expression to | ||
| Python.''' | ||
| class parse(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually prefer to have unique class names in the project, to avoid name collisions. But it's OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, 'mathematcia_parse' could be better from the view point of name collisions.
But in this case, I also think it's OK.
| (']', ')'), | ||
| ) | ||
| # trigonometric, e.t.c. | ||
| for arc, tri, h in product(('', 'Arc'), ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice trick!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
I think we need a two-step parser in the future, to parse stuff like |
|
@quasi-human you may have a point. I just discovered that Mathematica's So a whitelist is probably fine, so long as it's easy for users to pass in known definitions to the parser function. |
|
@Upabjojr Yes. Parsing a function would be broken down to two steps. |
|
@asmeurer Do you mean, we should add a method which enables users to add their own translation dictionary to the 'parse' class just in case they want to parse a function which is out of the whitelist? |
|
Yes, I would add a flag to the function. |
A nice application of rule-breaking is the def logb(b, a=None):
if a is not None:
a, b = b, a
return log(a, b) if b is not None else log(a) |
|
Either way, I think it would be confusing to have two distinct log functions. Actually, what I'd like to see is for log(x, b) to remain unevaluated. Then we could easily add shortcuts like log10 and log2. But that's neither here nor there for this PR, obviously. |
|
@asmeurer I added 'add_translation' method to 'parse' class by which users can add their own translation dictionary. But I am not sure whether users would feel easy to use this or not. I hope you to check after passing the code quality tests. variable-length argument needs '*' character |
Added a method by which users can add translation dictionary and parse functions out of the whitelist in Mathematica parsing or their own functions.
|
I would add a flag to the |
|
I introduced an extra argument to the |
sympy/parsing/mathematica.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you make this a class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because I wanted the mathematica class to keep the parse instance.
The parse instance has a cache for the latest users' additional_translations dictionary. The additional_translations dictionary needs to be transformed internally for make parsing easier, and this cache saves time from transforming it every time when the mathematica is called.
sympy/parsing/mathematica.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be additional_translations (plural)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. If you or someone have a better idea about the name of additional_translations, please change it.
sympy/parsing/mathematica.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this line ever be run? __new__ never returns an instance of this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this code, I wanted mathematica class to behave like a function, instead of creating an instance.
|
I would just do something like def mathematica(s, additional_translations=None):
parser = MathematicaParser(additional_translations)
return parser.parse(s)Changes:
|
I introduced an extra argument as the 2nd one to the mathematica parsing function. Through this argument, users can add thier own translation dictionary to mathematica function.
|
Thank you for reviewing. I modified the code.
|
Changed mathematica class to a function. Renamed _Parse to MathematicaParser. Set None as a default keyword value. Made every call creates a new parser instance. Let additional translations passed to the parser at instantiation time.
|
Any objections here? I'm no expert on Mathematica syntax. The Python style looks better now. |
|
I just read the comments on this pull request and just want to ask: Is the mathematica parser just a `function translator' or does it parse full Mathematica code? I programmed Mathematica for a long time and if you perform a FullForm on your expressions, you see that every Mathematica expression is of the form head[arg1, ..., argN] and nested. Perhaps this helps for parsing full expressions if this has not been already done. This syntax scheme has several advantages, e.g. fast building a parsing tree, or quite fast searching for subexpressions. It has also a more higher coherence compared to Python or Sympy which I both love as very good open source alternatives. So keep on doing your very good job guys and I hope my comment was still helpful. Best wishes |
|
The goal here is one of practicality. Calling FullForm shouldn't be a requirement, because in some cases you may simply have an expression written manually (or for instance, a formula copied from WolframAlpha). Obviously if you do call it first, though, it should be easy to do a full parse. Translating Wolfram Language in generality isn't useful because it has a lot of programmatic constructs which don't translate to SymPy expressions. I think I'm going to merge this. I haven't heard any objections. If someone discovers a bug, they can open a new issue or pull request for it. |
(From #13533)
Problem:
So far, there is no input check phase, hence all the functions in
the input are just let go with lowercase.
For example, spell-missed 'Arcsin[1/2]' is parsed to 'pi/6'.
Improved:
I set up a barricade, and only registered functions in dictionary
are able to be parsed. It is a large restriction, but it would be
safer. There are two good reasons for supporting this.
(1) We can exclude invalid functions mentioned above.
(2) Not all functions with the same name between Mathematica and
sympy do also the same work.
Example)
Log[2, 4] = 2 (Mathematica)
log(4, 2) = 2 (sympy)
Base positions are different!
So, we really need to check a function one by one about how it
works in Mathematica and sympy. Then parse it not only the name,
but also the movement.
Above the reasons, we should restrict available function in parsing.
Future task:
(1) Increase the number of available functions.
(2) In the present, list and matrix is not taken into account. My code
throws a ValueError if it finds '{' or '}'. This point should be
improved.