Skip to content

Conversation

@gortiz
Copy link
Contributor

@gortiz gortiz commented Mar 1, 2023

This PR adds 3 visitors: DataTypeVisitor, SingleValueVisitor and MultiValueVisitor.

The visitor pattern is well-known and heavily used in old versions of Java where switch statement cannot be applied to classes. With the inclusion of pattern matching and switch expressions in modern Java versions the visitor pattern is not that important.

In Apache Pinot it can be useful on a lot of different object domains. In this PR I added to:

  • FieldSpec.DataType. As explained in the javadoc, given that DataType is an enum we can apply a switch and we actually do it in several places, but the switch has two advantages over switch: it is typesafe and it can be composed.
  • Some predicates: Here I have added the visitors in order to offer internal typesafe visibility of the predicates. The idea is to be able to extract the matching value of the predicate (for example, the actual literal value we are comparing with in a Eq predicate) without having to know anything about the actual predicate apart.

The later is quite useful in a proprietary plugin we are developing in StarTree.

@gortiz
Copy link
Contributor Author

gortiz commented Mar 1, 2023

@walterddr: I'm nominating you as reviewer because I know you like visitors ;)

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont have a single concrete impl of the visitors in this PR so this PR is merely just a interface addition?

do we have a concrete usage of this in a follow up PR?

@gortiz
Copy link
Contributor Author

gortiz commented Mar 3, 2023

we dont have a single concrete impl of the visitors in this PR so this PR is merely just a interface addition?

Yes, just the interface in this PR.

do we have a concrete usage of this in a follow up PR?

I'm working on a proprietary extension and I found this pattern would help me in the implementation. I didn't dedicate time trying to look for possible usages in Apache Pinot, but I'm sure we can use these interfaces in code we already wrote. I'm using them to be able to access the matchingValue attribute in a safe way without having to create a new method on the class to export it and without doing boxing (although I need to create a new Visitor, so the performance change shouldn't be significant). Alternatively we could just add a getMatchingValue (and similar to IN, NEQ and NIN).

@gortiz
Copy link
Contributor Author

gortiz commented Mar 13, 2023

I've added some tests to show how to use the visitors and actually test that methods are been called. Also, I've changed the OpRawPredicateEvaluator (where Op is Eq, In, etc) classes visibility in order to be able to actually have the accept method accessible from outside.

@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2023

Codecov Report

Merging #10361 (9228e68) into master (3bad67d) will increase coverage by 0.01%.
The diff coverage is 17.85%.

@@             Coverage Diff              @@
##             master   #10361      +/-   ##
============================================
+ Coverage     70.32%   70.34%   +0.01%     
- Complexity     6105     6107       +2     
============================================
  Files          2055     2056       +1     
  Lines        111389   111445      +56     
  Branches      16939    16939              
============================================
+ Hits          78337    78397      +60     
+ Misses        27566    27560       -6     
- Partials       5486     5488       +2     
Flag Coverage Δ
integration1 24.52% <14.28%> (+0.12%) ⬆️
integration2 24.16% <14.28%> (-0.04%) ⬇️
unittests1 68.00% <17.85%> (+<0.01%) ⬆️
unittests2 13.84% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...a/org/apache/pinot/spi/data/MultiValueVisitor.java 0.00% <0.00%> (ø)
...lter/predicate/NotInPredicateEvaluatorFactory.java 73.37% <16.66%> (-0.57%) ⬇️
...ter/predicate/EqualsPredicateEvaluatorFactory.java 77.19% <18.18%> (-6.31%) ⬇️
.../predicate/NotEqualsPredicateEvaluatorFactory.java 69.67% <20.00%> (-3.55%) ⬇️
.../filter/predicate/InPredicateEvaluatorFactory.java 75.17% <33.33%> (+5.24%) ⬆️

... and 28 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. several questions we can follow up.

  • is there any reason why RegexpLike and FST-Based factories not part of the visitor pattern change?
  • DataType visitor tests are not showing exactly how this will be useful yet.

@walterddr
Copy link
Contributor

CC @yupeng9 @xiangfu0 @Jackie-Jiang please kindly take another look

@gortiz
Copy link
Contributor Author

gortiz commented Mar 14, 2023

is there any reason why RegexpLike and FST-Based factories not part of the visitor pattern change?

Just because I don't need it right now. I didn't iterate over all predicates to look for useful cases. In order to do not add features we may not use, I thought it would be better if we add them whenever we need them. But I can try to do it.

DataType visitor tests are not showing exactly how this will be useful yet.

Given that DataType is an enum, we can use a switch and DataTypeVisitor is not that useful. I'm using it in a situation where the code in the switch depends on some context, so I do something like:

DataTypeVisitor<R> visitor;
if (condition.is.fulfilled) {
  visitor = new VisitorInCase1;
} else {
  visitor = new VisitorInCase2;
}
R result = dataType.accept(visitor)

But I can use Function instead:

Function<DataType, R> fun;
if (condition.is.fulfilled) {
  fun = dt -> {
    switch (dt) {
      case INT: return X1;
      case DOUBLE: return Y1;
      default: return Z1;
    }
  };
} else {
  fun = dt -> {
    switch (dt) {
      case INT: return X2;
      case DOUBLE: return Y2;
      default: return Z2;
    }
  };
}
R result = fun.apply(dataType);

The main difference is that the switch will not be protected by the compiler in case a new data type is added, but given that it isn't a very common change, I think I'm going to follow the second approach in order to create less confusion in this PR.

@gortiz
Copy link
Contributor Author

gortiz commented Mar 16, 2023

It doesn't seem that more people is going to review this. Can we merge it?

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise


public interface MultiValueVisitor<R> {

R visitInt(int[] value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my understanding of visitor, and also from this blog, visitor should not return anything. Currently we force the visitor to return the same type result when visiting different data types which seems kind of strange. Should we remove the return and always handle the visiting logic within the method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO many visitor makes return type generic and we put Void type when no return type is necessary. this should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, usually is better to return something in visitors. Otherwise visitors that return should store some state that the caller has to consult later with a getter method. That means that the visitor must be stateful and the code will be uglier. Instead, if you let visitor to return, it will be easier to use from caller perspective and you can always use Void when not needed.

In fact, in a previous life when I heavily use visitors, I usually introduce a second generic A (from Argument) that was used to pass contextual information to the visitor. Visit methods were defined as R visit(SomeClass value, A argument). By doing that visitor instances are almost always stateless. I didn't do that here because it is not a well common pattern and didn't want to make it too complex.

Copy link
Contributor

@walterddr walterddr Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's also a visitor return + context

  • visitor return carries visiting info (related to the visit sequence) and
  • the context carries contextual info that's unrelated to the visiting sequence

But largely depending on the use case, i do agree with Gonzalo here: i don't think we need to go that far here. (we went for this approach in query planner though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants