Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[java] About operator nodes #1661

Closed
oowekyala opened this issue Feb 18, 2019 · 3 comments
Closed

[java] About operator nodes #1661

oowekyala opened this issue Feb 18, 2019 · 3 comments
Labels
in:ast About the AST structure or API, the parsing step
Milestone

Comments

@oowekyala
Copy link
Member

oowekyala commented Feb 18, 2019

This relates to expression grammar changes. #1019

  • We currently use a special node for the operator of an assignment expression (AssignmentOperator). This is not needed because an assignment expression may only have one assignment operator so we could as well set the image of the parent. The Expression node is currently used to represent assignment expressions, but it also encloses all other types of expressions because the JJTree descriptor doesn't mention (>1) as a condition. That adds an unnecessary layer of nesting. I think it would be better to have Expression #void, and a new node AssignmentExpression with no child operator node, instead the operator is on the image. This is consistent with how we don't push e.g. additive expression nodes if there is no additive operation mentioned -> it makes no sense to push ASTExpression if there is no assignment operator.
  • So AssignmentOperator is unnecessary. On the other hand, Additive and multiplicative expressions throw away all but the last of their operators:
    void AdditiveExpression() #AdditiveExpression(>1):
    {}
    {
    MultiplicativeExpression() ( LOOKAHEAD(2) ( "+" {jjtThis.setImage("+");} | "-" {jjtThis.setImage("-");} ) MultiplicativeExpression() )*
    }
    void MultiplicativeExpression() #MultiplicativeExpression(>1):
    {}
    {
    UnaryExpression() ( LOOKAHEAD(2) ( "*" {jjtThis.setImage("*");} | "/" {jjtThis.setImage("/");} | "%" {jjtThis.setImage("%");}) UnaryExpression() )*
    }

    The way this is written, an expression such as a + b + c - 1 would be parsed as an ASTAdditiveExpression with an image equal to "-", throwing away the fact we saw "+" twice. The same happens with MultiplicativeExpression.

I see three possible options:

  1. Add a node for each operator, while continuing to parse these expressions flatly. This is very likely going to get unwieldy, bloating the AST, and the structure of the expression is lost and unpractical to recover.
    With this option, we'd have
AdditiveExpression
+ a (...)
+ AdditiveOperator "+"
+ b (...)
+ AdditiveOperator "+"
+ c (...)
+ AdditiveOperator "-"
+ 1 (...)

Matching additive expressions with eg AdditiveExpression[@Operator="+"] wouldn't be possible anymore.

  1. Left-recursive parsing. This would increase the depth of the AST very much, and especially for eg long string concatenation expressions. With this option, we'd have
AdditiveExpression "-"
+ AdditiveExpression "+"
  + AdditiveExpression "+"
    + a (...)
    + b (...)
  + c (...)
+ 1 (...)
  1. "Reduced" left-recursive parsing. With this option, we only push new nodes when the operator is different:
AdditiveExpression "-"
+ AdditiveExpression "+"
  + a (...)
  + b (...)
  + c (...)
+ 1 (...)

IMO that's a good compromise. String concatenations, even big, would be flat, because they use only "+". The increased nesting is in fact consistent with how nesting of expressions of different precedences is done (eg a + b * c has two levels -> why not a + b - c?).

@oowekyala oowekyala added the in:ast About the AST structure or API, the parsing step label Feb 18, 2019
@oowekyala oowekyala added this to the 7.0.0 milestone Feb 18, 2019
@oowekyala
Copy link
Member Author

The loss of information also happens for ShiftExpression and EqualityExpression. For RelationalExpression it doesn't, since you can't chain them (in fact according to the JLS, a < b < c parses, but never typechecks, which for us is the same as "doesn't parse").

@oowekyala
Copy link
Member Author

Closed via #1759

@oowekyala
Copy link
Member Author

This poses some problems...

The main problem is that the "+" operator is overloaded: it means either string concatenation or numeric addition. So 1 + 2 + "" and "" + 1 + 2 are not the same expression (the first evaluates to "3", the second to "12"), and should be parsed differently. This would be useful for example to write a rule to forbid writing expressions like the first without parentheses.

We should probably add a constant to BinaryOp, to have separate operators, and rewrite the AdditiveExpression using type resolution. E.g. ideally

1+2+"" should parse as

 + AdditiveExpression[@Type=String.class, @Operator=CONCAT]
   + AdditiveExpression[@Type=int, @Operator=ADD]
      + NumericLiteral
      + NumericLiteral
   + StringLiteral

and ""+1+2 should parse as

 + AdditiveExpression[@Type=String.class, @Operator=CONCAT]
   + NumericLiteral
   + NumericLiteral
   + StringLiteral

The problem with that, is that this depends on classpath configuration, since the AST structure depends on the type of the operands.

Another problem is that flattening deletes some intermediary type resolution results. Binary numeric promotion is applied to the operands (for shift statements, unary promotion), which may give a different result for every pair of operands. So if we want to preserve every intermediary type, which is the point of #497, we have to parse all expressions entirely left-recursively. That would make the structure more predictable, more JLS-like (also, more easily documented). It also would make it independent from type resolution.

Currently type resolution does the promotion wrong too.

  • In 1+2l+3d, binary promotion should work in the following way:
    • (1+2l) is considered. 1 is enlarged to 1l. The type of the expression is long
    • (1+2l) + 3d is considered. (1+2l) is enlarged to double. The type of the expression is double
  • Our current type resolution considers only the first two operands, and types the whole expression as long, though it should be double.
  • Similarly, when we have 1+2+"", typeres ignores the third argument and types the whole as int even though it should be String

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in:ast About the AST structure or API, the parsing step
Projects
None yet
Development

No branches or pull requests

2 participants