Skip to content

Conversation

@nizarhejazi
Copy link
Contributor

Proper NULL handling in:

  • SELECT with and without ORDER BY
  • DISTINCT
  • GROUP BY
  • DISTINCT

For Single-valued columns of all data types with:

  • Raw encoding
  • Dictionary encoding

Unit tests:

Updated BigDecimalQueriesTest to work with null values.
Added BooleanNullEnabledNoDictQueriesTest.
Added AllNullQueriesTest where all values are null.
Added NullEnabledQueriesTest where some values are nulls.

@codecov-commenter
Copy link

codecov-commenter commented Jun 20, 2022

Codecov Report

Merging #8927 (b661f25) into master (84478b6) will decrease coverage by 6.44%.
The diff coverage is 79.72%.

❗ Current head b661f25 differs from pull request most recent head 85e806a. Consider uploading reports for the commit 85e806a to get more accurate results

@@             Coverage Diff              @@
##             master    #8927      +/-   ##
============================================
- Coverage     70.13%   63.68%   -6.45%     
+ Complexity     4741     4719      -22     
============================================
  Files          1831     1784      -47     
  Lines         96382    94815    -1567     
  Branches      14408    14371      -37     
============================================
- Hits          67594    60381    -7213     
- Misses        24125    30101    +5976     
+ Partials       4663     4333     -330     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 66.96% <79.72%> (+0.11%) ⬆️
unittests2 15.36% <0.00%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../java/org/apache/pinot/common/utils/DataTable.java 95.45% <ø> (ø)
...che/pinot/core/common/datablock/BaseDataBlock.java 77.41% <0.00%> (-0.28%) ⬇️
...che/pinot/core/common/datatable/BaseDataTable.java 78.94% <ø> (ø)
...reaming/StreamingSelectionOnlyCombineOperator.java 0.00% <0.00%> (-65.31%) ⬇️
...ator/streaming/StreamingSelectionOnlyOperator.java 0.00% <0.00%> (-90.00%) ⬇️
...ctionaryBasedSingleColumnDistinctOnlyExecutor.java 0.00% <0.00%> (ø)
...e/pinot/core/query/reduce/RowBasedBlockValSet.java 15.94% <0.00%> (-0.24%) ⬇️
...main/java/org/apache/pinot/spi/data/FieldSpec.java 82.03% <ø> (ø)
...va/org/apache/pinot/spi/utils/CommonConstants.java 28.57% <ø> (ø)
.../raw/RawBytesSingleColumnDistinctOnlyExecutor.java 42.10% <28.57%> (-37.90%) ⬇️
... and 480 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 84478b6...85e806a. Read the comment docs.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In high level, we should not decide whether to enable query null handling based on the whether there is null value vector. This flag should be passed from the broker as a query option, and we can introduce another table config to enable it. This can ensure all the segments are executed using the same mode.

_dataBlockCache = dataBlockCache;
_column = column;
_dataSource = dataSource;
NullValueVectorReader nullValueReader = _dataSource == null ? null : _dataSource.getNullValueVector();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data source should never be null. The computation should be lazy, and the operator should decide whether to read it based on if null handling is enabled

Copy link
Contributor Author

@nizarhejazi nizarhejazi Jul 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to do computation lazily, and removed null check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the logic to getNullBitmap() method. Please note that this method is called only when nullHandlingEnabled is set to true.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are plenty of comments, but the PR is in good shape overall. Good job!

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good.
cc @siddharthteotia to also take a look since this change might have performance impact

@Jackie-Jiang Jackie-Jiang merged commit 8acb17f into apache:master Jul 19, 2022
@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes labels Jul 19, 2022
@Jackie-Jiang
Copy link
Contributor

Related to #8697

Comment on lines +256 to +264
for (Object[] row : rows) {
for (int i = 0; i < numColumns; i++) {
Object columnValue = row[i];
if (columnValue == null) {
row[i] = colDefaultNullValues[i];
nullBitmaps[i].add(rowId);
}
}
rowId++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we merge this part with the main loop below? any specific reason we had to loop through the rows one more time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature null support release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants