Skip to content

A setting aggregate_function_input_format to simplify insertion into columns with the AggregateFunction data type #87827

@alexey-milovidov

Description

@alexey-milovidov

Company or project name

ClickHouse

Use case

AggregateFunction is a parameterized data type, representing the internal state of aggregate functions in ClickHouse. It allows storing, manipulating, and merging the intermediate states of aggregate functions like first-class values, enabling incremental aggregation. This is a unique feature of ClickHouse, available since its first release.

However, AggregateFunction's serialization format is an implementation detail of each aggregate function. It is represented as a binary string in all data formats, which makes direct insertion of values of this type not feasible (the users have to do INSERT SELECT instead, which calculates the aggregation states in the SELECT query).

The goal is to enable direct insertion by providing explicit values of arguments to the aggregate function, which will be aggregated to form the state, much like the initializeAggregation and arrayReduce functions do.

Describe the solution you'd like

Add a setting, aggregate_function_input_format, with the following possible values:

  • state - binary string with the serialized state (the default);
  • value - the format will expect a single value of the argument of the aggregate function, or in the case of multiple arguments, a tuple of them; they will be deserialized using the corresponding IDataType or DataTypeTuple of them.
  • array - the format will expect an Array of values, as described in the values option above; all the elements of the array will be aggregated to form the state.

For example, given the table of this structure:

user_id UInt64,
avg_session_length AggregateFunction(avg, UInt32)

the user can SET aggregate_function_input_format = 'value'
and insert the data as follows:

INSERT INTO table FORMAT CSV
123,456

or the user can SET aggregate_function_input_format = 'array'
and insert the data as follows:

INSERT INTO table FORMAT CSV
123,[456,789]

Describe alternatives you've considered

No response

Additional context

This task is listed in the official ClickHouse Roadmap 2020.

Metadata

Metadata

Labels

featurewarmup taskThe task for new ClickHouse team members. Low risk, moderate complexity, no urgency.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions