Skip to content

Fuse multi quantile funcs with same arguments to one function.#26657

Merged
vdimir merged 14 commits intoClickHouse:masterfrom
hexiaoting:fuse_quantile
Sep 21, 2021
Merged

Fuse multi quantile funcs with same arguments to one function.#26657
vdimir merged 14 commits intoClickHouse:masterfrom
hexiaoting:fuse_quantile

Conversation

@hexiaoting
Copy link
Copy Markdown
Contributor

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

If query has multiple quantile aggregate functions with the same arguments but different level parameter:

SELECT quantile(0.5)(x), quantile(0.9)(x), quantile(0.95)(x) FROM ...
Rewrite it to

SELECT quantiles(0.5, 0.9, 0.95)(x)[1], quantiles(0.5, 0.9, 0.95)(x)[2], quantiles(0.5, 0.9, 0.95)(x)[3] FROM ...
So the aggregation will share the state.

The functions that can be fused in this implementation:
quantile, quantileDeterministic, quantileTiming, quantileTDigest, quantileTDigestWeighted, quantileExact, quantileExactWeighted, quantileBFloat16

close: #25961

@robot-clickhouse robot-clickhouse added the pr-performance Pull request with some performance improvements label Jul 21, 2021
@vdimir vdimir self-assigned this Jul 23, 2021
Copy link
Copy Markdown
Member

@vdimir vdimir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your changes!

Some comments, main points are:

  • it's good to add some tests with aliased columns
  • fuzzer repport

set optimize_fuse_quantile=true;
SELECT 'quantile:';
EXPLAIN SYNTAX SELECT quantile(0.2)(d), quantile(0.3)(d) FROM datetime;
SELECT quantile(0.2)(d), quantile(0.3)(d) FROM datetime;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's good to add tests were aliases are used like SELECT b, quantile(0.5)(x) as a, quantile(0.9)(x) as y, quantile(0.95)(x) FROM (select number as x, number % 2 as b from numbers(10)) group by b;
and also where gathered function are not only in select list, but in order by, join on or other section

SELECT '---------After fuse result-----------';
set optimize_fuse_quantile=true;
SELECT 'quantile:';
EXPLAIN SYNTAX SELECT quantile(0.2)(d), quantile(0.3)(d) FROM datetime;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we got two AS y in the result in (quantiles(0.5, 0.9, 0.95)(x) AS y)[1] AS y?

EXPLAIN SYNTAX
SELECT
    b,
    quantile(0.5)(x) AS y,
    quantile(0.9)(x),
    quantile(0.95)(x)
FROM
(
    SELECT
        number AS x,
        number % 2 AS b
    FROM numbers(10)
)
GROUP BY b;

┌─explain──────────────────────────────────────────┐
│ SELECT                                           │
│     b,                                           │
│     (quantiles(0.5, 0.9, 0.95)(x) AS y)[1] AS y, │
│     y[2],                                        │
│     y[3]                                         │
│ FROM                                             │
│ (                                                │
│     SELECT                                       │
│         number AS x,                             │
│         number % 2 AS b                          │
│     FROM numbers(10)                             │
│ )                                                │
│ GROUP BY b                                       │
└──────────────────────────────────────────────────┘

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

{NameQuantileBFloat16::name, NameQuantilesBFloat16::name}
};

/// Gather all the quantilexxx functions
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Gather all the quantilexxx functions
/// Gather all the `quantile*` functions

Comment on lines +620 to +622
for (auto * func : functions)
{
func->name = "arrayElement";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at the fuzzer failure, it finds error somewhere here


SELECT '---------After fuse result-----------';
set optimize_fuse_quantile=true;
SELECT 'quantile:';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it may be a bit consing that we have column name quantile(0.9)(x) in query SELECT ..., quantile(0.9)(x), ... but in result it prints arrayElement(quantiles(0.5, 0.9, 0.95)(x), 1). We can try to add alias ... AS "quantile(0.9)(x)". But I'm not sure that it's a big issue, maybe we can leave it as is, though.

Comment on lines +65 to +75
auto it = fuse_quantile.find(function.name);
if (it != fuse_quantile.end())
{
it->second.addFuncNode(&function);
}
else
{
FuseQuantileAggregatesData funcs{};
funcs.addFuncNode(&function);
fuse_quantile[function.name] = funcs;
}
Copy link
Copy Markdown
Member

@vdimir vdimir Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply write fuse_quantile[function.name].addFuncNode, map entry will be initialized with default FuseQuantileAggregatesData constructor

https://godbolt.org/z/7KGdGadz8

Suggested change
auto it = fuse_quantile.find(function.name);
if (it != fuse_quantile.end())
{
it->second.addFuncNode(&function);
}
else
{
FuseQuantileAggregatesData funcs{};
funcs.addFuncNode(&function);
fuse_quantile[function.name] = funcs;
}
fuse_quantile[function.name].addFuncNode(&function);

@vdimir
Copy link
Copy Markdown
Member

vdimir commented Aug 2, 2021

@hexiaoting Do you wanna continue working on this PR? Or I may try to finish it

@hexiaoting
Copy link
Copy Markdown
Contributor Author

@hexiaoting Do you wanna continue working on this PR? Or I may try to finish it

sorry for the delay, I will continue

@vdimir
Copy link
Copy Markdown
Member

vdimir commented Sep 13, 2021

Sorry, I missed last commit, going to review it.

Following query is not working with option enabled:

SELECT `quantile(0.95)(x)`
FROM ( SELECT quantile(0.5)(x) as a, quantile(0.9)(x) as y, quantile(0.95)(x) FROM (select number as x, number % 2 as b from numbers(10)) )

But maybe it's ok, because query looks weired and we have option to disable it.

Also I'm thinking of creating more general option, like that enables all fuse syntax optimization (we already have one) not to create lots of for each function.

@hexiaoting
Copy link
Copy Markdown
Contributor Author

Sorry, I missed last commit, going to review it.

Following query is not working with option enabled:

SELECT `quantile(0.95)(x)`
FROM ( SELECT quantile(0.5)(x) as a, quantile(0.9)(x) as y, quantile(0.95)(x) FROM (select number as x, number % 2 as b from numbers(10)) )

But maybe it's ok, because query looks weired and we have option to disable it.

Also I'm thinking of creating more general option, like that enables all fuse syntax optimization (we already have one) not to create lots of for each function.

I will use only one fuse option setting.

Copy link
Copy Markdown
Member

@vdimir vdimir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your changes

@vdimir
Copy link
Copy Markdown
Member

vdimir commented Sep 16, 2021

Some Fuzzer with UBSan failures not related to changes, reproduces on master #29075

@vdimir vdimir merged commit bc7c41e into ClickHouse:master Sep 21, 2021
@hexiaoting hexiaoting deleted the fuse_quantile branch September 22, 2021 02:15
@alexey-milovidov
Copy link
Copy Markdown
Member

@vdimir Unfortunately, a bug has been found: #29965

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fuse some aggregate functions, part 2.

4 participants