Skip to content

Add -SimpleState combinator#16853

Merged
Avogar merged 2 commits intoClickHouse:masterfrom
amosbird:ss
Dec 17, 2020
Merged

Add -SimpleState combinator#16853
Avogar merged 2 commits intoClickHouse:masterfrom
amosbird:ss

Conversation

@amosbird
Copy link
Copy Markdown
Collaborator

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Provide a new aggregator combinator : -SimpleState to build SimpleAggregateFunction types via query. It's useful for defining MaterializedView of AggregatingMergeTree engine, and will benefit projections too.

Detailed description / Documentation draft:

Documents are updated.

If you apply this combinator, the aggregate function returns the same value but with a different type. This is an SimpleAggregateFunction(...) that can be stored in a table to work with AggregatingMergeTree table engines.

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Nov 11, 2020
@filimonov
Copy link
Copy Markdown
Contributor

filimonov commented Nov 11, 2020

I also prefer to do

CREATE TABLE xxx Engine=AggregatingMergeTree() ORDER BY ... AS AS SELECT x,y, xxxState(...) FROM yyy WHERE 0=1;
CREATE MATERIALIZED VIEW xxx_mv TO xxx AS SELECT x,y, xxxState(...) FROM yyy;

to avoid typing definitions for every columns. And i was missing that functionality.

@amosbird
Copy link
Copy Markdown
Collaborator Author

amosbird commented Nov 11, 2020

@filimonov You can achieve that via column transformers

CREATE MATERIALIZED VIEW xxx_mv TO xxx AS SELECT x,y, COLUMNS(<your favorite regexp>) APPLY (xxxState) FROM yyy;

It's in question whether I should extend it to ORDER BY for storage definition.

@azat
Copy link
Copy Markdown
Member

azat commented Nov 11, 2020

It's useful for defining MaterializedView of AggregatingMergeTree engine, and will benefit projections too.

@amosbird what is not possible to do right now that will be allowed after this patch? Right now the type for the materialized view will use original type:

create table data1 (key Int, value SimpleAggregateFunction(max, Int)) engineAggregatingMergeTree() order by key;
create materialized view data1_mv engine=AggregatingMergeTree() order by key as select * from data1;

SHOW CREATE TABLE data1_mv2;

┌─statement────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE MATERIALIZED VIEW default.data1_mv2
(
    `key` Int32,
    `value` SimpleAggregateFunction(max, Int32)
)
ENGINE = AggregatingMergeTree()
ORDER BY key
SETTINGS index_granularity = 8192 AS
SELECT *
FROM default.data1 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

-- same types for inner table.

Something like this only?

create table data2 engine=AggregatingMergeTree() order by n as select number n, anySimpleState(number) a from numbers(1) group by number;

SHOW CREATE TABLE data2 FORMAT Pretty;

┌─statement─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE default.data2
(
    `n` UInt64,
    `a` SimpleAggregateFunction(any, UInt64) /* to get SimpleAggregateFunction here? */
)
ENGINE = AggregatingMergeTree()
ORDER BY n
SETTINGS index_granularity = 8192 │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

@amosbird
Copy link
Copy Markdown
Collaborator Author

Something like this only?

I would say something like this

create materialized view data_mv engine=AggregatingMergeTree() order by key as select key, maxSimpleState(value) value from data group by key;

@azat
Copy link
Copy Markdown
Member

azat commented Nov 11, 2020

I would say something like this

Doesn't this the same? (But a little bit longer though)

CREATE MATERIALIZED VIEW data2_mv
ENGINE = AggregatingMergeTree()
ORDER BY key AS
SELECT
    key,
    CAST(value, 'SimpleAggregateFunction(max, Int)') AS value
FROM data1

As for performance:

INSERT INTO FUNCTION null('key Int, value SimpleAggregateFunction(max, Int)') SELECT
    any(number) AS key,
    maxSimpleState(number)
FROM numbers(10000000000)
SETTINGS max_threads = 1

Query id: c866ac71-18df-4a94-80af-bc26c4f761dd

Ok.

0 rows in set. Elapsed: 32.275 sec. Processed 10.00 billion rows, 80.01 GB (309.87 million rows/s., 2.48 GB/s.) 

INSERT INTO FUNCTION null('key Int, value SimpleAggregateFunction(max, Int)') SELECT                                                                                                  
    number AS key,
    CAST(number, 'SimpleAggregateFunction(max, Int)') AS value
FROM numbers(10000000000)                                                                  
SETTINGS max_threads = 1
                                             
Query id: 81593891-8a5c-4654-a042-5caf54af782e
                                             
Ok.
                                                                                           
0 rows in set. Elapsed: 14.828 sec. Processed 10.00 billion rows, 80.01 GB (674.47 million rows/s., 5.40 GB/s.) 

(And the using -SimpleState is obviously slower, since it does aggregation)

@amosbird
Copy link
Copy Markdown
Collaborator Author

amosbird commented Nov 11, 2020

Doesn't this the same? (But a little bit longer though)

That's the motivation (and also refer to the example of column transformers).

As for performance:

It's a good warning but not a persuasive example. -SimpleState is mainly for defining tables and materialized views (and projections in the future). It's also consistent with -State.

@amosbird
Copy link
Copy Markdown
Collaborator Author

test error looks weird

2020-11-11 13:29:07 01247_optimize_distributed_group_by_sharding_key:                       [ FAIL ] 2.43 sec. - return code 101
2020-11-11 13:29:07 Received exception from server (version 20.12.1):
2020-11-11 13:29:07 Code: 101. DB::Exception: Received from localhost:9000. DB::Exception: Received from 127.0.0.2:9000. DB::Exception: Unexpected packet Data received from client. 
2020-11-11 13:29:07 
2020-11-11 13:29:07 , result:
2020-11-11 13:29:07 
2020-11-11 13:29:07 -
2020-11-11 13:29:07 0
2020-11-11 13:29:07 1
2020-11-11 13:29:07 0
2020-11-11 13:29:07 1
2020-11-11 13:29:07 optimize_skip_unused_shards
2020-11-11 13:29:07 0
2020-11-11 13:29:07 1
2020-11-11 13:29:07 0
2020-11-11 13:29:07 1
2020-11-11 13:29:07 GROUP BY number
2020-11-11 13:29:07 1	0
2020-11-11 13:29:07 1	1
2020-11-11 13:29:07 1	0
2020-11-11 13:29:07 1	1
2020-11-11 13:29:07 GROUP BY number distributed_group_by_no_merge
2020-11-11 13:29:07 1	0
2020-11-11 13:29:07 1	1
2020-11-11 13:29:07 1	0
2020-11-11 13:29:07 1	1
2020-11-11 13:29:07 GROUP BY number, 1
2020-11-11 13:29:07 1	0
2020-11-11 13:29:07 1	1
2020-11-11 13:29:07 1	0
2020-11-11 13:29:07 1	1
2020-11-11 13:29:07 GROUP BY 1
2020-11-11 13:29:07 4	0
2020-11-11 13:29:07 GROUP BY number ORDER BY number DESC
2020-11-11 13:29:07 1	1
2020-11-11 13:29:07 1	1
2020-11-11 13:29:07 1	0
2020-11-11 13:29:07 1	0
2020-11-11 13:29:07 GROUP BY toString(number)
2020-11-11 13:29:07 1	0
2020-11-11 13:29:07 1	1
2020-11-11 13:29:07 1	0
2020-11-11 13:29:07 1	1
2020-11-11 13:29:07 GROUP BY number%2
2020-11-11 13:29:07 2	0
2020-11-11 13:29:07 2	1
2020-11-11 13:29:07 countDistinct
2020-11-11 13:29:07 2
2020-11-11 13:29:07 countDistinct GROUP BY number
2020-11-11 13:29:07 1
2020-11-11 13:29:07 1
2020-11-11 13:29:07 1
2020-11-11 13:29:07 1
2020-11-11 13:29:07 DISTINCT
2020-11-11 13:29:07 0
2020-11-11 13:29:07 1
2020-11-11 13:29:07 0
2020-11-11 13:29:07 1
2020-11-11 13:29:07 HAVING
2020-11-11 13:29:07 HAVING LIMIT
2020-11-11 13:29:07 1	0
2020-11-11 13:29:07 LIMIT
2020-11-11 13:29:07 1	0
2020-11-11 13:29:07 LIMIT OFFSET
2020-11-11 13:29:07 1	1
2020-11-11 13:29:07 WHERE LIMIT OFFSET
2020-11-11 13:29:07 

@azat
Copy link
Copy Markdown
Member

azat commented Nov 11, 2020

That's the motivation (and also refer to the example of column transformers).

Got it, thanks!

It's a good warning but not a persuasive example.

Yeah, this is synthetic and I posted them just to add at least some details

@azat
Copy link
Copy Markdown
Member

azat commented Nov 11, 2020

test error looks weird

Indeed:

2020.11.11 13:29:06.062314 [ 5605 ] {} <Debug> TCPHandler: Processed in 0.013796241 sec.
...
2020.11.11 13:29:06.465597 [ 5605 ] {} <Error> TCPHandler: Code: 101, e.displayText() = DB::Exception: Unexpected packet Data received from client, Stack trace:

Looks like previous query finished, but new query (by some reason) was started from the Data/Scalar packet

@azat
Copy link
Copy Markdown
Member

azat commented Nov 14, 2020

test error looks weird

2020-11-11 13:29:07 01247_optimize_distributed_group_by_sharding_key:                       [ FAIL ] 2.43 sec. - return code 101
2020-11-11 13:29:07 Received exception from server (version 20.12.1):
2020-11-11 13:29:07 Code: 101. DB::Exception: Received from localhost:9000. DB::Exception: Received from 127.0.0.2:9000. DB::Exception: Unexpected packet Data received from client. 

@amosbird should be fixed with #17006

@amosbird
Copy link
Copy Markdown
Collaborator Author

Test failures seem unrelated.

@Avogar Avogar self-assigned this Dec 15, 2020
@Avogar Avogar merged commit d82c23d into ClickHouse:master Dec 17, 2020
azat added a commit to azat/ClickHouse that referenced this pull request Jan 27, 2022
CI reports [1]:

    Indirect leak of 648 byte(s) in 9 object(s) allocated from:
    ...
        2 0x12b96503 in DB::AggregateFunctionSimpleState::getReturnType() const obj-x86_64-linux-gnu/../src/AggregateFunctions/AggregateFunctionSimpleState.h:47:15
    ...

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/33957/08f4f45fd9da923ae3e3fdd8a527c297d35247eb/stress_test__address__actions_.html

After we can get this query by using query_log artifact:

    $ wget https://s3.amazonaws.com/clickhouse-test-reports/33957/08f4f45fd9da923ae3e3fdd8a527c297d35247eb/stress_test__address__actions_/query_log_dump.tar
    $ tar -xf query_log_dump.tar
    $ clickhouse-local --path var/lib/clickhouse/
    SELECT query
    FROM system.query_log
    ARRAY JOIN used_aggregate_function_combinators AS func
    WHERE has(used_aggregate_functions, 'groupBitOr') AND has(used_aggregate_function_combinators, 'SimpleState') AND (type != 'QueryStart')

    Query id: 5b7722b3-f77e-4e7e-bd0b-586d6d32a899

    ┌─query────────────────────────────────────────────────────────────────────────────┐
    │ with groupBitOrSimpleState(number) as c select toTypeName(c), c from numbers(1); │
    └──────────────────────────────────────────────────────────────────────────────────┘

Fixes: 01570_aggregator_combinator_simple_state.sql
Fixes: ClickHouse#16853
Signed-off-by: Azat Khuzhin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants