Skip to content

Conversation

@mihaibudiu
Copy link
Contributor

This PR is probably easiest to review commit by commit. The 4 commits correspond to various Calcite program representations.

@mihaibudiu
Copy link
Contributor Author

The CI failures are not fundamental, I will try to fix them today

@mihaibudiu mihaibudiu force-pushed the issue6372 branch 3 times, most recently from 26a1aa6 to d4ec7aa Compare July 25, 2024 20:58
@caicancai
Copy link
Member

caicancai commented Jul 27, 2024

I am very interested in this PR and I may need to spend some time reviewing it.

@mihaibudiu
Copy link
Contributor Author

Go ahead

double innerRowCount = left * right * selectivity;
switch (join.getJoinType()) {
case ASOF:
case LEFT_ASOF:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ASOF similar to inner join or left join?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this implementation we have both ASOF and LEFT ASOF.
ASOF itself is an INNER join.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that LEFT_ASOF returns left. Would it be more accurate for asof to return (left * selectivity)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with this part of the code, I see that other places use 1 - selectivity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that for a regular inner join, selectivity means the probability of the condition being true in the result, and 1 - selectivity is because for an outer join we need to add the unmatched rows on the null side.
For asof join, we should consider how many matches there are against the left table, and we also could to consider the matchCondition, which is more complicated. Perhaps this optimization can be discussed in another issue.

import java.util.List;
import java.util.Set;

/** Implementation of {@link LogicalAsofJoin} in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Implementation of {@link Join} not LogicalAsofJoin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, this doesn't implement a generic join, it only implements an ASOF join.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should consider introducing a new AsofJoin class? Both LogicalAsofJoin and EnumerableAsofJoin can extends this class

condition,
matchCondition);
} else {
if (type == JoinType.ASOF && type != JoinType.LEFT_ASOF) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition is type == JoinType.ASOF || type == JoinType.LEFT_ASOF ?

/** Implementation of {@link LogicalAsofJoin} in
* {@link EnumerableConvention enumerable calling convention}. */
public class EnumerableAsofJoin extends Join implements EnumerableRel {
// This implementation is based on {@link EnumerableHashJoin}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is based on {@link EnumerableHashJoin},
Can this class directly extends EnumerableHashJoin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there would be much code reuse.

@Override public EnumerableAsofJoin copy(RelTraitSet traitSet, RexNode condition,
RelNode left, RelNode right, JoinRelType joinType,
boolean semiJoinDone) {
// This method does not know about the matchCondition, so it should not be called
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to support the copy of changing matchCondition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly this method cannot be right, so it's better to catch attempts to call it.

// match comparator
.append(matchPredicate)
// timestamp comparator
.append(timestampComparator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comparison limited to timestamp only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"timestamp" is just a name for a pair of columns from each side that are being compared.
It has nothing to do with the TIMESTAMP data type.
The MATCH_CONDITION is restricted to one comparison between a pair of columns from either side. That's enforced by the validator, and part of the spec.


/**
* An ASOF JOIN operation combines rows from two tables based on comparable timestamp values.
* For each row in the left table, the join finds a single row in the right table that has the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any confusion in this description? From the description, it may be understood as left join.

if (joinCondition != null) {
switch (join.getConditionType()) {
case ON:
writer.keyword("ON");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this restriction need to be processed in sql Parser.jj?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is, right now only ON is supported in the parser.
I plan to add USING later.

* @param right Right input
* @param hints Hints
* @param condition Join condition
* @param matchCondition ASOF join match condition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing joinType

ImmutableList<RelDataTypeField> systemFieldList) {
super(cluster, traitSet, hints, left, right, condition, ImmutableSet.of(), joinType);
this.systemFieldList = requireNonNull(systemFieldList, "systemFieldList");
this.matchCondition = matchCondition;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit requireNonNull(matchCondition)?

case FULL:
return JoinType.FULL;
case ASOF:
return JoinType.ASOF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing LEFT_ASOF case?

@caicancai
Copy link
Member

I will review this PR this weekend, sorry for the delay

@mihaibudiu
Copy link
Contributor Author

@YiwenWu thank you for the review, I have pushed a commit with fixes for problems you found.

import java.util.List;
import java.util.Set;

/** Implementation of {@link LogicalAsofJoin} in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should consider introducing a new AsofJoin class? Both LogicalAsofJoin and EnumerableAsofJoin can extends this class

+ join.getRight().getRowType().getFieldCount();
final RexNode conditionExpr = join.getCondition();
final RexNode matchConditionExpr = (join instanceof LogicalAsofJoin)
? ((LogicalAsofJoin) join).getMatchCondition()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly limit ((LogicalAsofJoin) join).getMatchCondition() always not null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you suggest an Objects.requireNonNull? I am not sure what you would gain from that.

joinType = JoinType()
e2 = TableRef1(ExprContext.ACCEPT_QUERY_OR_JOIN)
(
[ <MATCH_CONDITION> matchCondition = Expression(ExprContext.ACCEPT_SUB_QUERY) ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1). Is it consistent with snowflake, also uses parentheses?
(2). Whether MATCH_CONDITION support multiple conditions?

image

Copy link
Contributor Author

@mihaibudiu mihaibudiu Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) This grammar does not require parentheses, but accepts them, I think it's more general this way. We can add them if desired.

(2) The grammar accepts any expression, but the validator will reject complex expressions, accepting only expressions that contain a single comparison. This would be very hard to do using the grammar

Copy link
Contributor

@YiwenWu YiwenWu Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1). This is a minor issue. The current implementation is based on Snowflake syntax, so keeping it consistent can help more users avoid confusion and learning new syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation accepts the Snowflake syntax, so users familiar with that syntax don't have to learn anything new. I think in general it's ok to be put as few constraints as possible on the programs you accept (the Jon Postel principle: https://en.wikipedia.org/wiki/Robustness_principle).

Also, please note that the Snowflake construct is deemed experimental in their documentation, so I don't think we should strive too much to adhere to it, maybe they will change their syntax too. (I hope they will, and, who knows, maybe we can even influence them?)

@mihaibudiu
Copy link
Contributor Author

I have added a base AsofJoin class as suggested

Copy link
Contributor

@YiwenWu YiwenWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

matchCondition = expand(matchCondition, joinScope);
join.setOperand(6, matchCondition);
validateWhereOrOn(joinScope, matchCondition, "MATCH_CONDITION");
ConjunctionOfEqualities conj = new ConjunctionOfEqualities();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to replace RelOptUtil.conjunctions() and match equal here.
Maybe the bool filed type affects like bool_column3 = (bool_column1 and bool_column2). WDYT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, this is a rare case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and' is not the same as "=" for booleans, since "and" returns "false" for "false = false".
But maybe I misunderstand your question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that the checks here refer to snowflake's specification, i.e. "ON clause for ASOF JOIN must contain conjunctions of equality conditions only. Each side of an equality condition must only refer to either the left table or the right table. (S.STATE = P.STATE) OR (S.LOCATION = P.LOCATION) is invalid."
We expect that condition needs to satisfy
l.c1 = r.c1 and l.c2 = r.c2
But if you just check if sqlcall is equal / and, may a case like the following would also pass, if the types of c1 and c2 would both be boolean
(l.c1 and r.c1 ) and (l.c2 and r.c2 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will add a test case to make sure this doesn't happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I can't reuse RelOptUtil because that works on RexNode, and here I only have SqlNode objects.
I will improve the ConjunctionOfEqualities visitor.

@mihaibudiu
Copy link
Contributor Author

Thank you @suibianwanwank, I hope I have fixed the problems you highlighted.

@mihaibudiu
Copy link
Contributor Author

Does anyone else plan to review this PR or can I merge it?
More work may be needed, but this implementation is fairly functional and useful.
The only thing I am aware that can be improved is handing USING joins, which can be done in a separate PR - this one is big enough as it is.

@mihaibudiu mihaibudiu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Aug 13, 2024
@mihaibudiu
Copy link
Contributor Author

I have tagged this with "will merge soon", please untag if you plan to review or disagree.

Copy link
Contributor

@suibianwanwank suibianwanwank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
if (kind == SqlKind.AND) {
List<SqlNode> operands = call.getOperandList();
for (SqlNode operand : operands) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that there might still be an issue here. If the condition is a.id = b.id AND a.id2 = b.id2 AND a.id3 = b.id3, this might not pass as expected because the conditions in SqlNode are not in the same operandList.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Calcite AND can have N operands and I believe that there is a pass prior to this which will make sure that a sequence of AND operators is flattened into a single operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It look like that is only true for RexNodes, so you are right, I will have to handle this case too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed this case and added a test. Great observation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good✅

@sonarqubecloud
Copy link

@mihaibudiu mihaibudiu merged commit fb37dc3 into apache:main Aug 14, 2024
@mihaibudiu mihaibudiu deleted the issue6372 branch August 14, 2024 19:44
e2 = TableRef1(ExprContext.ACCEPT_QUERY_OR_JOIN)
(
[ <MATCH_CONDITION> matchCondition = Expression(ExprContext.ACCEPT_SUB_QUERY) ]
<ON> { on = JoinConditionType.ON.symbol(getPos()); }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mihaibudiu this means that the ON clause is mandatory for ASOF joins as well, is that intentional? Snowflake's ASOF JOIN only requires the MATCH_CONDITION, with ON / USING being optional - https://docs.snowflake.com/en/sql-reference/constructs/asof-join#syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that it is mandatory with this implementation
And you can't just use 'true' either, since it requires an equijoin

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you can't just use 'true' either, since it requires an equijoin

Actually, this does seem to work because the ConjunctionOfEqualities class in SqlValidatorImpl doesn't override the literal visit method of SqlShuttle so ConjunctionOfEqualities.illegal retains its default value of false for a join condition like ON true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LGTM-will-merge-soon Overall PR looks OK. Only minor things left.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants