Skip to content

Conversation

@61yao
Copy link
Contributor

@61yao 61yao commented Sep 2, 2022

Coalesce takes a list of arguments and returns the first not null value. If all arguments are null, return a null.

This implementations transform all arguments into string and return the first non-null string value. Null is represented as string value "null"

@codecov-commenter
Copy link

codecov-commenter commented Sep 2, 2022

Codecov Report

Merging #9327 (43ac34c) into master (ff2a333) will increase coverage by 3.56%.
The diff coverage is 87.50%.

@@             Coverage Diff              @@
##             master    #9327      +/-   ##
============================================
+ Coverage     63.40%   66.97%   +3.56%     
- Complexity     4762     4894     +132     
============================================
  Files          1832     1403     -429     
  Lines         98146    73108   -25038     
  Branches      15020    11722    -3298     
============================================
- Hits          62231    48964   -13267     
+ Misses        31321    20582   -10739     
+ Partials       4594     3562    -1032     
Flag Coverage Δ
unittests1 66.97% <87.50%> (+0.05%) ⬆️
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../transform/function/CoalesceTransformFunction.java 87.35% <87.35%> (ø)
...e/pinot/common/function/TransformFunctionType.java 100.00% <100.00%> (ø)
...r/transform/function/TransformFunctionFactory.java 86.89% <100.00%> (+0.09%) ⬆️
.../org/apache/pinot/query/routing/WorkerManager.java 83.87% <0.00%> (-6.87%) ⬇️
...he/pinot/segment/local/segment/store/IndexKey.java 65.00% <0.00%> (-5.00%) ⬇️
.../aggregation/function/ModeAggregationFunction.java 88.10% <0.00%> (ø)
...ller/recommender/data/generator/TimeGenerator.java
...ontroller/api/resources/PinotControllerLogger.java
...inion/event/DefaultMinionEventObserverFactory.java
...ker/routing/instanceselector/InstanceSelector.java
... and 430 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think implicitly treating everything as STRING will give confusing semantics imo. If we don't want to support this function on few data types, then throwing exception is better.

Reference - https://docs.snowflake.com/en/sql-reference/functions/coalesce.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. For the first version, we can check if all the input are numbers or strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check for argument type. but the return value has to be a string, otherwise we are not able to represent null?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Currently we don't support null values for transform function yet. You may read TransformBlockValSet.getNullBitmap(), which count the result as null when any argument is null. This won't work for coalesce.

For now, we may use NullValueUtils.getDefaultNullValue() for each data type to present the null. Then we should figure out a way to support the real null in transform function.

Copy link
Contributor

@walterddr walterddr Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(edited) represented using default nullvalue probably is not a good idea since all numeric data type uses zero as default null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@walterddr Not really. We use min value as the default for numeric types except for big decimal, which doesn't have a min value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use NullValueUtils.getDefaultNullValue() and only supports numeric and string types. Also it requires the type to be same for all arguments.

@61yao 61yao force-pushed the coalesce branch 3 times, most recently from adbc410 to 21d21ae Compare September 10, 2022 01:19
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this check. We may use getResultMetadata() to rule out the unsupported types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the second check but we need the type to be identifier to get the null bit map?

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Let's change the name to storedType to be more explicit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) We usually follow the sequence of INT, LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING (same order as the enum for easier tracking

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also use intellij autocomplete to generate the branches will automatically be in that order :-) )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Let's calculate the result metadata in the init() and store it in a member variable. It has 2 benefits:

  1. Fail fast when the type cannot be supported
  2. Prevent calculating result metadata multiple times

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Follow the same sequence (INT, LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING) as the interface for easier tracking

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Put this method before getDoublelTransformResults() for easier tracking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants