-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Data visitors #10361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data visitors #10361
Conversation
|
@walterddr: I'm nominating you as reviewer because I know you like visitors ;) |
walterddr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont have a single concrete impl of the visitors in this PR so this PR is merely just a interface addition?
do we have a concrete usage of this in a follow up PR?
Yes, just the interface in this PR.
I'm working on a proprietary extension and I found this pattern would help me in the implementation. I didn't dedicate time trying to look for possible usages in Apache Pinot, but I'm sure we can use these interfaces in code we already wrote. I'm using them to be able to access the |
|
I've added some tests to show how to use the visitors and actually test that methods are been called. Also, I've changed the |
Codecov Report
@@ Coverage Diff @@
## master #10361 +/- ##
============================================
+ Coverage 70.32% 70.34% +0.01%
- Complexity 6105 6107 +2
============================================
Files 2055 2056 +1
Lines 111389 111445 +56
Branches 16939 16939
============================================
+ Hits 78337 78397 +60
+ Misses 27566 27560 -6
- Partials 5486 5488 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 28 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
walterddr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. several questions we can follow up.
- is there any reason why RegexpLike and FST-Based factories not part of the visitor pattern change?
- DataType visitor tests are not showing exactly how this will be useful yet.
|
CC @yupeng9 @xiangfu0 @Jackie-Jiang please kindly take another look |
Just because I don't need it right now. I didn't iterate over all predicates to look for useful cases. In order to do not add features we may not use, I thought it would be better if we add them whenever we need them. But I can try to do it.
Given that DataType is an enum, we can use a switch and DataTypeVisitor is not that useful. I'm using it in a situation where the code in the switch depends on some context, so I do something like: DataTypeVisitor<R> visitor;
if (condition.is.fulfilled) {
visitor = new VisitorInCase1;
} else {
visitor = new VisitorInCase2;
}
R result = dataType.accept(visitor)But I can use Function<DataType, R> fun;
if (condition.is.fulfilled) {
fun = dt -> {
switch (dt) {
case INT: return X1;
case DOUBLE: return Y1;
default: return Z1;
}
};
} else {
fun = dt -> {
switch (dt) {
case INT: return X2;
case DOUBLE: return Y2;
default: return Z2;
}
};
}
R result = fun.apply(dataType);The main difference is that the switch will not be protected by the compiler in case a new data type is added, but given that it isn't a very common change, I think I'm going to follow the second approach in order to create less confusion in this PR. |
|
It doesn't seem that more people is going to review this. Can we merge it? |
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
|
|
||
| public interface MultiValueVisitor<R> { | ||
|
|
||
| R visitInt(int[] value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my understanding of visitor, and also from this blog, visitor should not return anything. Currently we force the visitor to return the same type result when visiting different data types which seems kind of strange. Should we remove the return and always handle the visiting logic within the method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO many visitor makes return type generic and we put Void type when no return type is necessary. this should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, usually is better to return something in visitors. Otherwise visitors that return should store some state that the caller has to consult later with a getter method. That means that the visitor must be stateful and the code will be uglier. Instead, if you let visitor to return, it will be easier to use from caller perspective and you can always use Void when not needed.
In fact, in a previous life when I heavily use visitors, I usually introduce a second generic A (from Argument) that was used to pass contextual information to the visitor. Visit methods were defined as R visit(SomeClass value, A argument). By doing that visitor instances are almost always stateless. I didn't do that here because it is not a well common pattern and didn't want to make it too complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's also a visitor return + context
- visitor return carries visiting info (related to the visit sequence) and
- the context carries contextual info that's unrelated to the visiting sequence
But largely depending on the use case, i do agree with Gonzalo here: i don't think we need to go that far here. (we went for this approach in query planner though)
This PR adds 3 visitors:
DataTypeVisitor,SingleValueVisitorandMultiValueVisitor.The visitor pattern is well-known and heavily used in old versions of Java where
switchstatement cannot be applied to classes. With the inclusion of pattern matching and switch expressions in modern Java versions the visitor pattern is not that important.In Apache Pinot it can be useful on a lot of different object domains. In this PR I added to:
Eqpredicate) without having to know anything about the actual predicate apart.The later is quite useful in a proprietary plugin we are developing in StarTree.