Skip to content

Adding Setting Aggregate function input format to allow Insert queries into tables with AggregateFunction columns#88088

Merged
GrigoryPervakov merged 51 commits intoClickHouse:masterfrom
punithns97:master
Nov 26, 2025
Merged

Adding Setting Aggregate function input format to allow Insert queries into tables with AggregateFunction columns#88088
GrigoryPervakov merged 51 commits intoClickHouse:masterfrom
punithns97:master

Conversation

@punithns97
Copy link
Copy Markdown
Contributor

@punithns97 punithns97 commented Oct 3, 2025

Resolves #87827.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Adds session-level setting aggregate_function_input_format to improve INSERT queries into tables with AggregateFunction columns, allowing insertion of data as serialized state, raw values, or arrays.

Documentation entry for user-facing changes

Adds a Session level setting aggregate_function_input_format with the following possible values:

  • state - binary string with the serialized state (the default)
  • value - the format will expect a single value of the argument of the aggregate function, or in the case of multiple arguments, a tuple of them; that will be deserialized to form the relevant state
  • array - the format will expect an Array of values, as described in the values option above; all the elements of the array will be aggregated to form the state

Details

The goal of this PR is to allow the usage of AggregateFunction to support various other formats like 'JSON', 'CSV', 'TSV'.

Resolves #87827.

Example use

For a table with this structure:

CREATE TABLE test_agg_single (
    user_id UInt64,
    avg_session_length AggregateFunction(avg, UInt32)
)

The user can SET aggregate_function_input_format = 'value' and perform queries such as:

INSERT INTO test_agg_single VALUES (124, '456'), (125, '789'), (126, '321');

Or the user can SET aggregate_function_input_format = 'array':

INSERT INTO test_agg_single VALUES (127, '[100,200,300]'), (128, '[400,500]'), (129, '[600]');

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Oct 3, 2025

CLA assistant check
All committers have signed the CLA.

@punithns97
Copy link
Copy Markdown
Contributor Author

Same as #88049 .
Raising new PR cause I changed accounts.

@GrigoryPervakov GrigoryPervakov added the can be tested Allows running workflows for external contributors label Oct 6, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Oct 6, 2025

Workflow [PR], commit [eebf00a]

Summary:

job_name test_name status info comment
Stateless tests (arm_asan, targeted) failure
03167_base64_url_functions_sh FAIL cidb
03167_base64_url_functions_sh FAIL cidb
00596_limit_on_expanded_ast FAIL cidb, flaky
00596_limit_on_expanded_ast FAIL cidb, flaky
00596_limit_on_expanded_ast FAIL cidb, flaky
00596_limit_on_expanded_ast FAIL cidb, flaky
00596_limit_on_expanded_ast FAIL cidb, flaky
00352_external_sorting_and_constants FAIL cidb
00596_limit_on_expanded_ast FAIL cidb, flaky
00596_limit_on_expanded_ast FAIL cidb, flaky
4 more test cases not shown
Stateless tests (amd_msan, parallel) failure
03208_array_of_json_read_subcolumns_2_memory FAIL cidb
BuzzHouse (amd_debug) failure
Logical error: 'Inconsistent AST formatting: the query: FAIL cidb
BuzzHouse (amd_msan) failure
Let op! ERROR cidb
BuzzHouse (amd_ubsan) failure
/home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Common/ThreadProfileEvents.cpp:564:13: runtime error: 1.84467e+19 is outside the range of representable values of type 'unsigned long' FAIL cidb
Performance Comparison (amd_release, master_head, 1/6) failure
Start failure
Performance Comparison (amd_release, master_head, 2/6) failure
Start failure
Performance Comparison (amd_release, master_head, 3/6) failure
Start failure
Performance Comparison (amd_release, master_head, 4/6) failure
Check Results failure
Performance Comparison (amd_release, master_head, 5/6) failure
Check Results failure

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Oct 6, 2025
@punithns97
Copy link
Copy Markdown
Contributor Author

punithns97 commented Oct 9, 2025

@Avogar @GrigoryPervakov Please review. Fixed the test cases and updated PR.
The Failing test cases don't seem to be related to these changes. They are breaking in previous PR's as well.

@punithns97
Copy link
Copy Markdown
Contributor Author

@GrigoryPervakov The checks which are failing are not related to my changes. The tests seem to be flaky as mentioned here .

@punithns97
Copy link
Copy Markdown
Contributor Author

@GrigoryPervakov Thanks for reviewing the PR and approving the changes. The Build failures don't seem to be related to these changes as per this : Reports

Can we proceed with merging the PR ?

@GrigoryPervakov
Copy link
Copy Markdown
Member

Can we proceed with merging the PR ?

Yes, all tests are confirmed as flaky or ran out of time limit

@GrigoryPervakov GrigoryPervakov added this pull request to the merge queue Nov 26, 2025
Merged via the queue into ClickHouse:master with commit 46edc61 Nov 26, 2025
118 of 130 checks passed
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Nov 26, 2025
@fm4v
Copy link
Copy Markdown
Member

fm4v commented Dec 1, 2025

@punithns97 hi, could input values for AggegateFunction column be formatted not as string? Or in case if aggregate_function_input_format=array input it as a regular array?

@punithns97
Copy link
Copy Markdown
Contributor Author

punithns97 commented Dec 1, 2025

hi, could input values for AggegateFunction column be formatted not as string? Or in case if aggregate_function_input_format=array input it as a regular array?

@fm4v It accepts real (non-string) arrays for AggregateFunction columns when the input is parsed by a FORMAT that honors aggregate_function_input_format (e.g. JSONEachRow, TabSeparated, CSV,etc). However, it will not implicitly convert SQL VALUES array literals (INSERT ... VALUES (id, [1,2,3])) into an AggregateFunction state .

// Single argument - parse the value directly
auto temp_column = argument_types[0]->createColumn();
ReadBufferFromString buf(value_str);
argument_types[0]->getDefaultSerialization()->deserializeTextCSV(*temp_column, buf, settings);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this - why CSV?

@alexey-milovidov
Copy link
Copy Markdown
Member

It should work for all input formats, including RowBinary. If it's not the case, please revert this PR and re-implement it.

@alexey-milovidov
Copy link
Copy Markdown
Member

I can argue that this is especially important for the RowBinary format, and having a partial implementation misses the point.

@punithns97
Copy link
Copy Markdown
Contributor Author

punithns97 commented Dec 3, 2025

It should work for all input formats, including RowBinary. If it's not the case, please revert this PR and re-implement it.
I can argue that this is especially important for the RowBinary format, and having a partial implementation misses the point.

This works with only text-based formats. for supporting binary format, I can raise another PR to supplement this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A setting aggregate_function_input_format to simplify insertion into columns with the AggregateFunction data type

6 participants