Skip to content

Conversation

@saurabhd336
Copy link
Contributor

Instructions:

  1. The PR has to be tagged with at least one of the following labels (*):
    1. feature
    2. bugfix
    3. performance
    4. ui
    5. backward-incompat
    6. release-notes (**)
  2. Remove these instructions before publishing the PR.

(*) Other labels to consider:

  • testing
  • dependencies
  • docker
  • kubernetes
  • observability
  • security
  • code-style
  • extension-point
  • refactor
  • cleanup

(**) Use release-notes label for scenarios like:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed

@saurabhd336
Copy link
Contributor Author

GH issue: #8488

cc: @npawar @KKcorps

@saurabhd336 saurabhd336 changed the title Add common comparison scalar functions for filterConfig during ingestion Add common comparison scalar functions for filterConfig during ingestion (WIP) Apr 18, 2022
@codecov-commenter
Copy link

codecov-commenter commented Apr 18, 2022

Codecov Report

Merging #8563 (74cd5d1) into master (82f7aef) will increase coverage by 6.76%.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##             master    #8563      +/-   ##
============================================
+ Coverage     64.00%   70.76%   +6.76%     
+ Complexity     4294     4289       -5     
============================================
  Files          1641     1686      +45     
  Lines         86307    88229    +1922     
  Branches      13142    13351     +209     
============================================
+ Hits          55237    62435    +7198     
+ Misses        27071    21422    -5649     
- Partials       3999     4372     +373     
Flag Coverage Δ
integration1 27.31% <0.00%> (?)
integration2 25.80% <0.00%> (?)
unittests1 67.00% <0.00%> (-0.04%) ⬇️
unittests2 14.14% <0.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ot/common/function/scalar/ArithmeticFunctions.java 61.53% <0.00%> (-3.68%) ⬇️
...nt/local/startree/v2/store/StarTreeDataSource.java 40.00% <0.00%> (-13.34%) ⬇️
...ot/segment/local/startree/OffHeapStarTreeNode.java 72.22% <0.00%> (-5.56%) ⬇️
.../helix/core/minion/MinionInstancesCleanupTask.java 77.27% <0.00%> (-4.55%) ⬇️
...e/impl/forward/FixedByteMVMutableForwardIndex.java 90.90% <0.00%> (-3.04%) ⬇️
...ugin/inputformat/avro/KafkaAvroMessageDecoder.java 0.00% <0.00%> (ø)
...che/pinot/plugin/metrics/yammer/YammerMetered.java 25.00% <0.00%> (ø)
...plugin/segmentuploader/SegmentUploaderDefault.java 87.09% <0.00%> (ø)
...t/server/starter/ServerQueriesDisabledTracker.java 86.36% <0.00%> (ø)
...pinot/plugin/metrics/yammer/YammerJmxReporter.java 100.00% <0.00%> (ø)
... and 383 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 82f7aef...74cd5d1. Read the comment docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we don't support having different parameter type for the same scalar function, so we might need to use the same workaround similar to other functions by adding the type into the function name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding a separate class for all the logical functions. See FilterKind enum for more supported logical functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't a robust way to compare double values for equality, I suggest using an epsilon of 1e-7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.
Will return Double.compare(a, b) == 0 be right? From the looks of it, it'll converts both values to 754 notation and compare the bits

Copy link
Member

@richardstartin richardstartin Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that would produce odd results too. doubles should be compared for equality within a tolerance because they are an approximation of a number and the error in the approximation should be accounted for.

For example, consider this program, which accumulates 1% 100 times:

    double d = 0.00;
    for (int i = 0; i < 100; i++) {
      d += 0.01;
    }
    System.out.println(d == 1D);
    System.out.println(Double.compare(d, 1D));
    System.out.println(Math.abs(d - 1D) < 1e-15);

it prints:

false
1
true

because 1% isn't finitely expressible in base 2. However, the quantity is clearly an approximation of 1. In order to make this function robust to arithmetic performed by transform functions (prior to persistence or at query time) equality should be verified relative to a small tolerance (1e-7 would be best).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Thanks @richardstartin. Updated the pr

@saurabhd336 saurabhd336 force-pushed the scalarFunctionsInFilterConfig branch from 744c09f to 30b9eff Compare April 20, 2022 10:06
@saurabhd336 saurabhd336 force-pushed the scalarFunctionsInFilterConfig branch from 30b9eff to 7190ee6 Compare April 20, 2022 10:30
Comment on lines +91 to +125
if (!_method.isVarArgs()) {
int numParameters = _parameterClasses.length;
Preconditions.checkArgument(arguments.length == numParameters,
"Wrong number of arguments for method: %s, expected: %s, actual: %s",
_method, numParameters, arguments.length);
for (int i = 0; i < numParameters; i++) {
// Skip conversion for null
Object argument = arguments[i];
if (argument == null) {
continue;
}
// Skip conversion if argument can be directly assigned
Class<?> parameterClass = _parameterClasses[i];
Class<?> argumentClass = argument.getClass();
if (parameterClass.isAssignableFrom(argumentClass)) {
continue;
}

PinotDataType parameterType = _parameterTypes[i];
PinotDataType argumentType = FunctionUtils.getArgumentType(argumentClass);
Preconditions.checkArgument(parameterType != null && argumentType != null,
"Cannot convert value from class: %s to class: %s", argumentClass, parameterClass);
arguments[i] = parameterType.convert(argument, argumentType);
}
// Skip conversion if argument can be directly assigned
Class<?> parameterClass = _parameterClasses[i];
Class<?> argumentClass = argument.getClass();
if (parameterClass.isAssignableFrom(argumentClass)) {
continue;
} else {
Class<?> parameterBaseClass = _parameterClasses[0].getComponentType();
for (Object argument : arguments) {
if (argument == null) {
continue;
}
if (!parameterBaseClass.isAssignableFrom(argument.getClass())) {
// TODO add conversion if classes don't match
throw new IllegalArgumentException(
"Argument class" + argument.getClass() + "can't be converted to " + parameterBaseClass);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be tempted to decompose this into a method for each branch so it's a bit less of a wall of code to read

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

if (_method.isVarArgs()) {
Class<?> parameterBaseClass = _parameterClasses[0].getComponentType();
// Convert arguments to base class
Class<?> klass = Arrays.stream(_method.getParameterTypes()).findFirst().get().getComponentType();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for varargs will already impose overheads, but streaming the parameter types will be unacceptably slow. This operation should be performed once and cached in the FunctionInvoker. It's probably better to store a prototype array in a field too, which you can just clone at invocation time.

if (functionInfoMap != null && functionInfoMap.containsKey(numParameters)) {
return functionInfoMap.get(numParameters);
} else {
return VARARGS_FUNCTION_INFO_MAP.getOrDefault(functionName, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map#get does the same thing

@KKcorps
Copy link
Contributor

KKcorps commented Apr 21, 2022

I think there should be support for operators as well in filters such as ==, >= etc. rather than only relying on functional equivalents. However, that might increase the scope of this PR significantly so we can take that up later.

Comment on lines +28 to +35
public static boolean and(Boolean... args) {
for (Boolean arg : args) {
if (!arg) {
return false;
}
}
return true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is to support logical conjugation of predicates, but is it the right approach? This requires all expressions to be evaluated before they are combined, and prevents early termination. Early termination is not just a performance concern, but one of flexibility and correctness - it should be possible to express predicates which aren't safe to evaluate if another predicate acts as a guard to prevent its evaluation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this. Will rework this to allow early termination.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have tried to model these functions differently (to be able to support early termination) here https://github.com/apache/pinot/pull/8582/files

@saurabhd336
Copy link
Contributor Author

@richardstartin @Jackie-Jiang @KKcorps @npawar This PR adds support of "and" and "or" conjugates via varargs function. As described in @richardstartin's review comment, this prohibits early termination and forces all conditions to be evaluated before and / or can be applied on their results.
I've tried to address that here #8582, modelling the conjugations a little differently. Suggest to review that PR instead. If that PR seems alright conceptually, I'll close this one and continue on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants