Skip to content

Conversation

@quasi-human
Copy link
Contributor

(From #13533)

Problem:
So far, there is no input check phase, hence all the functions in
the input are just let go with lowercase.
For example, spell-missed 'Arcsin[1/2]' is parsed to 'pi/6'.

Improved:
I set up a barricade, and only registered functions in dictionary
are able to be parsed. It is a large restriction, but it would be
safer. There are two good reasons for supporting this.

(1) We can exclude invalid functions mentioned above.

(2) Not all functions with the same name between Mathematica and
sympy do also the same work.

Example)
Log[2, 4] = 2 (Mathematica)
log(4, 2) = 2 (sympy)

Base positions are different!

So, we really need to check a function one by one about how it
works in Mathematica and sympy. Then parse it not only the name,
but also the movement.

Above the reasons, we should restrict available function in parsing.

Future task:
(1) Increase the number of available functions.

(2) In the present, list and matrix is not taken into account. My code
throws a ValueError if it finds '{' or '}'. This point should be
improved.

@asmeurer
Copy link
Member

I don't know if strict restriction is the best option. How about a default, which just takes the lowercase of the Mathematica name? That way, it will work for most functions, even if it isn't known about beforehand in this file.

@quasi-human quasi-human changed the title Restrict available functions for mathematica pursing Restrict available functions for mathematica parsing Oct 28, 2017
(From #13533)

Problem:
So far, there is no input check phase, hence all the functions in
the input are just let go with lowercase.
For example, spell-missed 'Arcsin[1/2]' is parsed to 'pi/6'.

Improved:
I set up a barricade, and only registered functions in dictionary
are able to be parsed. It is a large restriction, but it would be
safer. There are two good reasons for supporting this.

(1) We can exclude invalid functions mentioned above.

(2) Not all functions with the same name between Mathematica and
    sympy do also the same work.

    Example) Log[2, 4] = 2 (Mathematica)
	     log(4, 2) = 2 (sympy)

    Base positions are different!

    So, we really need to check a function one by one about how it
    works in Mathematica and sympy. Then parse it not only the name,
    but also the movement.

Above the reasons, we should restrict available function in parsing.

Future task:
(1) Increase the number of available functions.

(2) In the present, list and matrix is not taken into account. My code
    throws a ValueError if it finds '{' or '}'. This point should be
    improved.
I removed the restrictions and any functions can be passed with taking
the lowercase.
For example:
'ArcSin[1/2]' is parsed to 'pi/6'
'Arcsin[1/2]' is parsed to 'arcsin(1/2)'
'Function[1/2]' is parsed to 'function(1/2)'
@quasi-human
Copy link
Contributor Author

quasi-human commented Oct 28, 2017

I modified the code and removed the restrictions.

I also have considered which is the best option. As you mentioned, I think most of Mathematica functions work correctly in SymPy with just taking the lowercase. It is convenient that we can parse more numbers of functions. However, I am worried that one day a user would perhaps get an unpredictable result because of an unchecked function which is apparently parsed well but actually gives a wrong calculation in SymPy, coming from a similar origin in Log[2, 4] != log(2,4) problem.


F_swap = {
# when you say log(2,4) in Mathematica, base is 2.
('Log', 2): 'log',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be any advantage to introducing a logb wrapper function in SymPy with the syntax logb(b, a) = log(a, b)? Then Mathematic Log(x, y) could be passed directly to logb(x, y)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a nice idea if only this logarithmic function needs an argument swap. But, I have no idea how many functions there are which need such a treatment. I do not know if it is the best choice to introduce a wrapper function in SymPy every time when facing this kind of matters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would just be confusing. Mathematica clearly chose its order because when you write log_b(x) the base is written before the argument. SymPy uses its convention because the base is an optional argument (defaulting to E), and in Python optional arguments go after required arguments.

There are other SymPy functions where the second argument is optional, or where the argument order is effectively arbitrary, and I wouldn't be surprised if Mathematica chose a different convention for some of them as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, a parsing function should be written without affecting any other SymPy files.

('Mod', 2): 'Mod',
('Sqrt', 1): 'sqrt',
('Exp', 1): 'exp',
('Log', 1): 'log',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log may have one or two parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In F_swap dictionary, ('Log', 2): 'log' is defined.

class _Parse(object):
'''An instance of this class converts basic Mathematica expression to
Python.'''
class parse(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually prefer to have unique class names in the project, to avoid name collisions. But it's OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, 'mathematcia_parse' could be better from the view point of name collisions.
But in this case, I also think it's OK.

(']', ')'),
)
# trigonometric, e.t.c.
for arc, tri, h in product(('', 'Arc'), (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice trick!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Upabjojr
Copy link
Contributor

I think we need a two-step parser in the future, to parse stuff like Times@{x,y,z}

@asmeurer
Copy link
Member

@quasi-human you may have a point. I just discovered that Mathematica's Root is like our RootOf, that is, Root(f, n) represents the nth root of f = 0. Our root(x, n) represents x**(1/n). So a naive translation would be mathematically incorrect.

So a whitelist is probably fine, so long as it's easy for users to pass in known definitions to the parser function.

@quasi-human
Copy link
Contributor Author

@Upabjojr Yes. Parsing a function would be broken down to two steps.
(1) Rename the function name properly.
(2) Check how the function works, and handle the original arguments properly.

@quasi-human
Copy link
Contributor Author

quasi-human commented Oct 29, 2017

@asmeurer Do you mean, we should add a method which enables users to add their own translation dictionary to the 'parse' class just in case they want to parse a function which is out of the whitelist?

@asmeurer
Copy link
Member

Yes, I would add a flag to the function.

@smichr
Copy link
Member

smichr commented Oct 29, 2017

in Python optional arguments go after required arguments

A nice application of rule-breaking is the range function where, when a single argument is given it is the end of the range but when two are given the first becomes the start. The same thing could have be done with log; it could now be done with logb but I don't have strong feelings about it either way.

def logb(b, a=None):
    if a is not None:
        a, b = b, a
    return log(a, b) if b is not None else log(a)

@asmeurer
Copy link
Member

Either way, I think it would be confusing to have two distinct log functions.

Actually, what I'd like to see is for log(x, b) to remain unevaluated. Then we could easily add shortcuts like log10 and log2.

But that's neither here nor there for this PR, obviously.

@Upabjojr
Copy link
Contributor

@quasi-human
Copy link
Contributor Author

@asmeurer I added 'add_translation' method to 'parse' class by which users can add their own translation dictionary. But I am not sure whether users would feel easy to use this or not. I hope you to check after passing the code quality tests.

    # Example 1
    In [1]: user_dic
    Out[1]: {'Log3[x]': 'log(x, 3)'}

    In [2]: parse.add_translation(user_dic)

    In [3]: mathematica('Log3[81]')
    Out[3]: 4

    # Example 2
    In [1]: user_dic
    Out[1]: {'Func[a]': 'sqrt(a)*exp(1/a)**2'}

    In [2]: parse.add_translation(user_dic)

    In [3]: mathematica('Func[3]')
    Out[3]: sqrt(3)*exp(2/3)

    # Example 3
    In [1]: user_dic
    Out[1]: {'Func[x, y, *z]': 'Max(*z)*(x**2 + y**2)'}

    In [2]: parse.add_translation(user_dic)

    In [3]: mathematica('Func[2,3,1,2,3,4,5,6,7,8,9,10]')
    Out[3]: 130

variable-length argument needs '*' character

Added a method by which users can add translation dictionary and
parse functions out of the whitelist in Mathematica parsing or
their own functions.
@asmeurer
Copy link
Member

I would add a flag to the mathematica function itself, then use that to instantiate a parse instance with those extra arguments. That way, users don't have to use the internal parse object (which shouldn't be public API), and there is no modification of global state.

@quasi-human
Copy link
Contributor Author

I introduced an extra argument to the mathematica function. Then users do not have to use parse function.
In [1]: mathematica('Log3[9]', {'Log3[x]':'log(x,3)'})
Out[1]: 2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you make this a class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because I wanted the mathematica class to keep the parse instance.
The parse instance has a cache for the latest users' additional_translations dictionary. The additional_translations dictionary needs to be transformed internally for make parsing easier, and this cache saves time from transforming it every time when the mathematica is called.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be additional_translations (plural)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. If you or someone have a better idea about the name of additional_translations, please change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this line ever be run? __new__ never returns an instance of this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this code, I wanted mathematica class to behave like a function, instead of creating an instance.

@asmeurer
Copy link
Member

asmeurer commented Nov 1, 2017

I would just do something like

def mathematica(s, additional_translations=None):
    parser = MathematicaParser(additional_translations)
    return parser.parse(s)

Changes:

  • mathematica should be a function, not class.
  • Rename Parser to MathematicaParser
  • The default keyword value should be None. Don't use mutable defaults for keyword arguments!
  • Every call to mathematica creates a new parser instance.
  • The additional translations are passed in to the parser at instantiation time.

I introduced an extra argument as the 2nd one to the mathematica
parsing function. Through this argument, users can add thier own
translation dictionary to mathematica function.
@quasi-human
Copy link
Contributor Author

Thank you for reviewing. I modified the code.

  • Changed the mathematica class to a function.
  • Renamed _Parse to MathematicaParser.
  • Set None as a default keyword value.
  • Made every call creates a new parser instance.
  • Let additional_translations passed to the parser at instantiation time.

Changed mathematica class to a function.
Renamed _Parse to MathematicaParser.
Set None as a default keyword value.
Made every call creates a new parser instance.
Let additional translations passed to the parser at instantiation time.
@asmeurer
Copy link
Member

asmeurer commented Nov 2, 2017

Any objections here? I'm no expert on Mathematica syntax. The Python style looks better now.

@joha2
Copy link
Contributor

joha2 commented Nov 21, 2017

I just read the comments on this pull request and just want to ask: Is the mathematica parser just a `function translator' or does it parse full Mathematica code? I programmed Mathematica for a long time and if you perform a FullForm on your expressions, you see that every Mathematica expression is of the form head[arg1, ..., argN] and nested.

Perhaps this helps for parsing full expressions if this has not been already done. This syntax scheme has several advantages, e.g. fast building a parsing tree, or quite fast searching for subexpressions. It has also a more higher coherence compared to Python or Sympy which I both love as very good open source alternatives. So keep on doing your very good job guys and I hope my comment was still helpful.

Best wishes
Johannes

@asmeurer
Copy link
Member

The goal here is one of practicality. Calling FullForm shouldn't be a requirement, because in some cases you may simply have an expression written manually (or for instance, a formula copied from WolframAlpha). Obviously if you do call it first, though, it should be easy to do a full parse. Translating Wolfram Language in generality isn't useful because it has a lot of programmatic constructs which don't translate to SymPy expressions.

I think I'm going to merge this. I haven't heard any objections. If someone discovers a bug, they can open a new issue or pull request for it.

@asmeurer asmeurer merged commit 5cdce47 into sympy:master Nov 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants