-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add common comparison scalar functions for filterConfig during ingestion (WIP) #8563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add common comparison scalar functions for filterConfig during ingestion (WIP) #8563
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8563 +/- ##
============================================
+ Coverage 64.00% 70.76% +6.76%
+ Complexity 4294 4289 -5
============================================
Files 1641 1686 +45
Lines 86307 88229 +1922
Branches 13142 13351 +209
============================================
+ Hits 55237 62435 +7198
+ Misses 27071 21422 -5649
- Partials 3999 4372 +373
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we don't support having different parameter type for the same scalar function, so we might need to use the same workaround similar to other functions by adding the type into the function name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest adding a separate class for all the logical functions. See FilterKind enum for more supported logical functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't a robust way to compare double values for equality, I suggest using an epsilon of 1e-7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
Will return Double.compare(a, b) == 0 be right? From the looks of it, it'll converts both values to 754 notation and compare the bits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that would produce odd results too. doubles should be compared for equality within a tolerance because they are an approximation of a number and the error in the approximation should be accounted for.
For example, consider this program, which accumulates 1% 100 times:
double d = 0.00;
for (int i = 0; i < 100; i++) {
d += 0.01;
}
System.out.println(d == 1D);
System.out.println(Double.compare(d, 1D));
System.out.println(Math.abs(d - 1D) < 1e-15);it prints:
false
1
true
because 1% isn't finitely expressible in base 2. However, the quantity is clearly an approximation of 1. In order to make this function robust to arithmetic performed by transform functions (prior to persistence or at query time) equality should be verified relative to a small tolerance (1e-7 would be best).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. Thanks @richardstartin. Updated the pr
744c09f to
30b9eff
Compare
30b9eff to
7190ee6
Compare
| if (!_method.isVarArgs()) { | ||
| int numParameters = _parameterClasses.length; | ||
| Preconditions.checkArgument(arguments.length == numParameters, | ||
| "Wrong number of arguments for method: %s, expected: %s, actual: %s", | ||
| _method, numParameters, arguments.length); | ||
| for (int i = 0; i < numParameters; i++) { | ||
| // Skip conversion for null | ||
| Object argument = arguments[i]; | ||
| if (argument == null) { | ||
| continue; | ||
| } | ||
| // Skip conversion if argument can be directly assigned | ||
| Class<?> parameterClass = _parameterClasses[i]; | ||
| Class<?> argumentClass = argument.getClass(); | ||
| if (parameterClass.isAssignableFrom(argumentClass)) { | ||
| continue; | ||
| } | ||
|
|
||
| PinotDataType parameterType = _parameterTypes[i]; | ||
| PinotDataType argumentType = FunctionUtils.getArgumentType(argumentClass); | ||
| Preconditions.checkArgument(parameterType != null && argumentType != null, | ||
| "Cannot convert value from class: %s to class: %s", argumentClass, parameterClass); | ||
| arguments[i] = parameterType.convert(argument, argumentType); | ||
| } | ||
| // Skip conversion if argument can be directly assigned | ||
| Class<?> parameterClass = _parameterClasses[i]; | ||
| Class<?> argumentClass = argument.getClass(); | ||
| if (parameterClass.isAssignableFrom(argumentClass)) { | ||
| continue; | ||
| } else { | ||
| Class<?> parameterBaseClass = _parameterClasses[0].getComponentType(); | ||
| for (Object argument : arguments) { | ||
| if (argument == null) { | ||
| continue; | ||
| } | ||
| if (!parameterBaseClass.isAssignableFrom(argument.getClass())) { | ||
| // TODO add conversion if classes don't match | ||
| throw new IllegalArgumentException( | ||
| "Argument class" + argument.getClass() + "can't be converted to " + parameterBaseClass); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be tempted to decompose this into a method for each branch so it's a bit less of a wall of code to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| if (_method.isVarArgs()) { | ||
| Class<?> parameterBaseClass = _parameterClasses[0].getComponentType(); | ||
| // Convert arguments to base class | ||
| Class<?> klass = Arrays.stream(_method.getParameterTypes()).findFirst().get().getComponentType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support for varargs will already impose overheads, but streaming the parameter types will be unacceptably slow. This operation should be performed once and cached in the FunctionInvoker. It's probably better to store a prototype array in a field too, which you can just clone at invocation time.
| if (functionInfoMap != null && functionInfoMap.containsKey(numParameters)) { | ||
| return functionInfoMap.get(numParameters); | ||
| } else { | ||
| return VARARGS_FUNCTION_INFO_MAP.getOrDefault(functionName, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Map#get does the same thing
|
I think there should be support for operators as well in filters such as |
| public static boolean and(Boolean... args) { | ||
| for (Boolean arg : args) { | ||
| if (!arg) { | ||
| return false; | ||
| } | ||
| } | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is to support logical conjugation of predicates, but is it the right approach? This requires all expressions to be evaluated before they are combined, and prevents early termination. Early termination is not just a performance concern, but one of flexibility and correctness - it should be possible to express predicates which aren't safe to evaluate if another predicate acts as a guard to prevent its evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this. Will rework this to allow early termination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have tried to model these functions differently (to be able to support early termination) here https://github.com/apache/pinot/pull/8582/files
|
@richardstartin @Jackie-Jiang @KKcorps @npawar This PR adds support of "and" and "or" conjugates via varargs function. As described in @richardstartin's review comment, this prohibits early termination and forces all conditions to be evaluated before and / or can be applied on their results. |
Instructions:
featurebugfixperformanceuibackward-incompatrelease-notes(**)(*) Other labels to consider:
testingdependenciesdockerkubernetesobservabilitysecuritycode-styleextension-pointrefactorcleanup(**) Use
release-noteslabel for scenarios like: