Skip to content

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Sep 6, 2023

  • Adding more tuple sketch scalar functions
  • Adding more tuple sketch integration test with union and join
  • Adding more theta sketch integration test with union and join
    Sample queries:
  1. Intersection on sketch bytes with filters
SELECT 
    GET_INT_TUPLE_SKETCH_ESTIMATE(
        INT_SUM_TUPLE_SKETCH_INTERSECTION(
          DISTINCT_COUNT_RAW_INTEGER_SUM_TUPLE_SKETCH(metTupleSketchBytes) FILTER (WHERE id = 1 OR id = 2),
          DISTINCT_COUNT_RAW_INTEGER_SUM_TUPLE_SKETCH(metTupleSketchBytes) FILTER (WHERE id = 2 OR id = 3)
        )
    )
FROM myTable
  1. TupleSketch after union multiple sub queries
SELECT 
    DISTINCT_COUNT_TUPLE_SKETCH(metTupleSketchBytes),
    DISTINCT_COUNT_RAW_INTEGER_SUM_TUPLE_SKETCH(metTupleSketchBytes),
    SUM_VALUES_INTEGER_SUM_TUPLE_SKETCH(metTupleSketchBytes), 
    AVG_VALUE_INTEGER_SUM_TUPLE_SKETCH(metTupleSketchBytes)
FROM (
    SELECT metTupleSketchBytes FROM myTable WHERE id = 4
    UNION ALL
    SELECT metTupleSketchBytes FROM myTable WHERE id = 5
    UNION ALL
    SELECT metTupleSketchBytes FROM myTable WHERE id = 6
    UNION ALL
    SELECT metTupleSketchBytes FROM myTable WHERE id = 7
)
  1. TupleSketch after join
SELECT a.dimValue, distinctCountThetaSketch(b.thetaSketchCol)
FROM
(SELECT dimName, dimValue, thetaSketchCol FROM myTable WHERE dimName = 'gender' AND dimValue = 'Female') a 
JOIN 
(SELECT dimName, dimValue, thetaSketchCol FROM myTable WHERE dimName = 'gender' AND dimValue = 'Male') b 
ON a.dimName = b.dimName
GROUP BY a.dimValue;
  1. TupleSketch with Intersection/Union after join
SELECT
    GET_INT_TUPLE_SKETCH_ESTIMATE(
        INT_SUM_TUPLE_SKETCH_INTERSECTION(
            DISTINCT_COUNT_RAW_INTEGER_SUM_TUPLE_SKETCH(a.metTupleSketchBytes),
            DISTINCT_COUNT_RAW_INTEGER_SUM_TUPLE_SKETCH(b.metTupleSketchBytes)
        )
    ),
    GET_INT_TUPLE_SKETCH_ESTIMATE(
        INT_SUM_TUPLE_SKETCH_UNION(
            DISTINCT_COUNT_RAW_INTEGER_SUM_TUPLE_SKETCH(a.metTupleSketchBytes),
            DISTINCT_COUNT_RAW_INTEGER_SUM_TUPLE_SKETCH(b.metTupleSketchBytes)
        )
    )
FROM
    (SELECT id, metTupleSketchBytes FROM myTable WHERE id < 8 ) a
JOIN
    (SELECT id, metTupleSketchBytes FROM myTable WHERE id > 3 ) b
ON
    a.id = b.id
  1. ThetaSketch with Intersection/Union after join
SELECT 
    GET_THETA_SKETCH_ESTIMATE(
        THETA_SKETCH_INTERSECT(
            DISTINCT_COUNT_RAW_THETA_SKETCH(a.thetaSketchCol, ''),
            DISTINCT_COUNT_RAW_THETA_SKETCH(b.thetaSketchCol, '')
        )
    ),
    GET_THETA_SKETCH_ESTIMATE(
        THETA_SKETCH_UNION(
            DISTINCT_COUNT_RAW_THETA_SKETCH(a.thetaSketchCol, ''), 
            DISTINCT_COUNT_RAW_THETA_SKETCH(b.thetaSketchCol, '')
        )
    ) 
FROM 
    (SELECT dimName, dimValue, thetaSketchCol FROM myTable where dimName = 'gender' and dimValue = 'Female') a
JOIN 
    (SELECT dimName, dimValue, thetaSketchCol FROM myTable where dimName = 'gender' and dimValue = 'Male') b
ON
    a.dimName = b.dimName

@codecov-commenter
Copy link

codecov-commenter commented Sep 6, 2023

Codecov Report

Merging #11517 (8de0a62) into master (b25b62a) will decrease coverage by 0.06%.
Report is 11 commits behind head on master.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##             master   #11517      +/-   ##
============================================
- Coverage     63.07%   63.02%   -0.06%     
- Complexity      207     1108     +901     
============================================
  Files          2320     2320              
  Lines        124598   124691      +93     
  Branches      19022    19036      +14     
============================================
- Hits          78596    78581      -15     
- Misses        40408    40511     +103     
- Partials       5594     5599       +5     
Flag Coverage Δ
integration <0.01% <0.00%> (ø)
integration1 <0.01% <0.00%> (ø)
integration2 0.00% <0.00%> (ø)
java-11 62.99% <0.00%> (+12.94%) ⬆️
java-17 14.48% <0.00%> (-48.44%) ⬇️
java-20 62.87% <0.00%> (+12.94%) ⬆️
temurin 63.02% <0.00%> (-0.06%) ⬇️
unittests 63.01% <0.00%> (-0.06%) ⬇️
unittests1 67.41% <0.00%> (-0.10%) ⬇️
unittests2 14.50% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...he/pinot/core/function/scalar/SketchFunctions.java 59.43% <0.00%> (-24.57%) ⬇️

... and 27 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@xiangfu0 xiangfu0 changed the title Adding more theta sketch integration test Adding more tuple sketch scalar functions and integration tests Sep 6, 2023
@xiangfu0 xiangfu0 changed the title Adding more tuple sketch scalar functions and integration tests [multistage]Adding more tuple sketch scalar functions and integration tests Sep 6, 2023
@xiangfu0 xiangfu0 added feature query multi-stage Related to the multi-stage query engine labels Sep 6, 2023
@xiangfu0 xiangfu0 requested a review from snleee September 6, 2023 17:54
Adding more sketch integration test
@xiangfu0 xiangfu0 merged commit d211d89 into apache:master Sep 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature multi-stage Related to the multi-stage query engine query testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants