Skip to content

ClickHouse can be inefficient with latency/egress throughput for inserts over native interface #75719

@andreev-io

Description

@andreev-io

Company or project name

ClickHouse

Describe the unexpected behaviour

@CheSema and I discovered that when performing inserts over the native interface, ClickHouse server will send table metadata to the client before the client sends data for the insert. While clients may be reusing TCP connections indefinitely for thousands of inserts, each individual (async) insert will wait to receive table metadata. The issue is particularly acute when performing async inserts that insert a single row at a time and is two-fold:

  1. For a table of a sufficiently complicated schema, ClickHouse sends to the client far more than the client sends to ClickHouse, and this unnecessarily happens for each insert. In bandwidth-constrained environments (such as in any cloud provider where egress is not free), this is problematic.
  2. The client needs to wait for the metadata from the server on each insert, affecting latency.

Architecture of network send/receive over the HTTP interface is architected differently and doesn't exhibit the same behaviour, since there is no table metadata forwarding from the server to client over HTTP.

How to reproduce

Example query_log:

hostname:                   <redacted>
type:                       QueryFinish
event_date:                 2025-02-07
event_time:                 2025-02-07 13:33:29
event_time_microseconds:    2025-02-07 13:33:29.698443
query_start_time:           2025-02-07 13:33:29
query_start_⋯icroseconds:   2025-02-07 13:33:29.636553
query_duration_ms:          61
read_rows:                  0
read_bytes:                 0
written_rows:               0
written_bytes:              0
result_rows:                0
result_bytes:               0
memory_usage:               0
current_database:          <redacted>
query:                      INSERT INTO <table> (<columns>) FORMAT Native
formatted_query:
normalized_query_hash:      11837435207090871028 -- 11.84 quintillion
query_kind:                 Insert
databases:                  ['<redacted>']
tables:                     ['<redacted>']
columns:                    []
partitions:                 []
projections:                []
views:                      []
exception_code:             0
exception:
stack_trace:
is_initial_query:           1
user:                       <redacted>
query_id:                   <redacted>
address:                    <redacted>
port:                       52542
initial_user:               <redacted>
initial_query_id:           <redacted>
initial_address:            <redacted>
initial_port:               52542
initial_query_start_time:   2025-02-07 13:33:29
initial_quer⋯icroseconds:   2025-02-07 13:33:29.636553
interface:                  1
is_secure:                  0
os_user:
client_hostname:            <redacted>
client_name:                <redacted>
client_revision:            54460
client_version_major:       2
client_version_minor:       30
client_version_patch:       0
http_method:                0
http_user_agent:
http_referer:
forwarded_for:              <redacted>
quota_key:
distributed_depth:          0
revision:                   54491
log_comment:
thread_ids:                 [5442]
peak_threads_usage:         1
ProfileEvents:              {'Query':1,'InsertQuery':1,'InitialQuery':1,'AsyncInsertQuery':1,'ReadCompressedBytes':72,'CompressedReadBufferBlocks':2,'CompressedReadBufferBytes':20,'IOBufferAllocs':3,'IOBufferAllocBytes':1048913,'NetworkReceiveElapsedMicroseconds':58902,'NetworkSendElapsedMicroseconds':101,'NetworkReceiveBytes':38,'NetworkSendBytes':7584,'ContextLock':20,'ContextLockWaitMicroseconds':16,'RWLockAcquiredReadLocks':2,'PartsLockHoldMicroseconds':2,'RealTimeMicroseconds':61887,'UserTimeMicroseconds':2162,'SystemTimeMicroseconds':24,'SoftPageFaults':1,'OSCPUVirtualTimeMicroseconds':2185,'OSReadChars':454,'OSWriteChars':3982,'LogTrace':1,'LogDebug':1,'LoggerElapsedNanoseconds':80830,'InterfaceNativeSendBytes':7584,'InterfaceNativeReceiveBytes':38}
Settings:                   {'min_external_table_block_size_rows':'1048449','min_external_table_block_size_bytes':'268402944','max_insert_threads':'4','max_threads':'9','max_parsing_threads':'\'auto(9)\'','use_hedged_requests':'0','azure_strict_upload_part_size':'0','azure_max_blocks_in_multipart_upload':'50000','azure_min_upload_part_size':'16777216','azure_max_upload_part_size':'5368709120','s3_max_part_number':'10000','s3_max_single_operation_copy_size':'33554432','azure_upload_part_size_multiply_factor':'2','azure_upload_part_size_multiply_parts_count_threshold':'500','azure_max_inflight_parts_for_one_file':'20','azure_max_single_part_copy_size':'268435456','s3_skip_empty_files':'0','azure_allow_parallel_part_upload':'1','hdfs_throw_on_zero_files_match':'0','azure_throw_on_zero_files_match':'0','s3_ignore_file_doesnt_exist':'0','hdfs_ignore_file_doesnt_exist':'0','azure_ignore_file_doesnt_exist':'0','azure_sdk_max_retries':'10','azure_sdk_retry_initial_backoff_ms':'10','azure_sdk_retry_max_backoff_ms':'1000','s3_validate_request_settings':'1','s3_connect_timeout_ms':'1000','s3queue_migrate_old_metadata_to_buckets':'0','azure_skip_empty_files':'0','dictionary_validate_primary_key_type':'0','distributed_insert_skip_read_only_replicas':'0','distributed_foreground_insert':'1','allow_reorder_prewhere_conditions':'1','alter_sync':'0','allow_suspicious_variant_types':'1','allow_suspicious_primary_key':'1','enable_memory_bound_merging_of_aggregation_results':'1','function_locate_has_mysql_compatible_argument_order':'0','enable_parsing_to_custom_serialization':'0','do_not_merge_across_partitions_select_final':'0','split_parts_ranges_into_intersecting_and_non_intersecting_final':'1','split_intersecting_parts_ranges_into_layers_final':'1','mysql_map_string_to_text_in_show_columns':'0','mysql_map_fixed_string_to_text_in_show_columns':'0','materialize_skip_indexes_on_insert':'1','materialize_statistics_on_insert':'1','log_queries':'1','log_queries_probability':'1','log_processors_profiles':'0','merge_tree_read_split_ranges_into_intersecting_and_non_intersecting_injection_probability':'0','max_http_get_redirects':'10','join_output_by_rowlist_perkey_rows_threshold':'0','min_free_disk_bytes_to_perform_insert':'0','min_free_disk_ratio_to_perform_insert':'0','ignore_on_cluster_for_replicated_named_collections_queries':'0','enable_zstd_qat_codec':'0','query_profiler_real_time_period_ns':'0','enable_vertical_final':'0','max_bytes_before_external_group_by':'19327352832','prefer_external_sort_block_bytes':'0','max_bytes_before_external_sort':'19327352832','cross_join_min_rows_to_compress':'0','cross_join_min_bytes_to_compress':'0','max_memory_usage':'38654705664','backup_restore_keeper_retry_max_backoff_ms':'60000','backup_restore_s3_retry_attempts':'1000','query_metric_log_interval':'0','read_in_order_use_buffering':'0','cancel_http_readonly_queries_on_client_close':'1','max_parts_to_move':'0','max_table_size_to_drop':'1000000000000','max_partition_size_to_drop':'1000000000000','postgresql_connection_attempt_timeout':'2','postgresql_connection_pool_retries':'2','default_table_engine':'ReplicatedMergeTree','mutations_sync':'0','lightweight_deletes_sync':'2','optimize_functions_to_subcolumns':'0','optimize_time_filter_with_preimage':'1','throw_if_deduplication_in_dependent_materialized_views_enabled_with_async_insert':'0','allow_materialized_view_with_bad_select':'1','max_parser_backtracks':'0','max_recursive_cte_evaluation_depth':'1000','optimize_trivial_insert_select':'0','query_cache_system_table_handling':'save','query_cache_tag':'','optimize_rewrite_sum_if_to_count_if':'0','collect_hash_table_stats_during_joins':'0','max_size_to_preallocate_for_joins':'0','allow_experimental_kafka_offsets_storage_in_keeper':'0','database_replicated_allow_only_replicated_engine':'1','database_replicated_allow_replicated_engine_arguments':'1','database_replicated_allow_explicit_uuid':'1','database_replicated_allow_heavy_create':'1','cloud_mode':'1','cloud_mode_engine':'2','cloud_mode_database_engine':'1','distributed_ddl_output_mode':'none_only_active','distributed_ddl_entry_format_version':'6','enable_named_columns_in_function_tuple':'0','query_plan_merge_filters':'0','query_plan_convert_outer_join_to_inner_join':'0','query_plan_optimize_prewhere':'1','merge_tree_min_rows_for_concurrent_read_for_remote_filesystem':'163840','merge_tree_min_bytes_for_concurrent_read_for_remote_filesystem':'251658240','merge_tree_min_bytes_per_task_for_remote_reading':'4194304','merge_tree_min_read_task_size':'8','async_insert':'1','wait_for_async_insert':'0','async_insert_max_data_size':'104857600','async_insert_max_query_number':'5000','async_insert_poll_timeout_ms':'10','async_insert_use_adaptive_busy_timeout':'0','async_insert_busy_timeout_min_ms':'50','async_insert_busy_timeout_max_ms':'10000','async_insert_busy_timeout_increase_rate':'0.2','async_insert_busy_timeout_decrease_rate':'0.2','filesystem_cache_name':'','enable_filesystem_cache_on_write_operations':'1','filesystem_cache_reserve_space_wait_lock_timeout_milliseconds':'1000','filesystem_cache_prefer_bigger_buffer_size':'1','temporary_data_in_cache_reserve_space_wait_lock_timeout_milliseconds':'600000','use_page_cache_for_disks_without_file_cache':'0','read_from_page_cache_if_exists_otherwise_bypass_cache':'0','page_cache_inject_eviction':'0','load_marks_asynchronously':'1','allow_prefetched_read_pool_for_remote_filesystem':'1','filesystem_prefetch_max_memory_usage':'3865470566','filesystem_prefetches_limit':'200','compatibility':'24.1','read_through_distributed_cache':'0','write_through_distributed_cache':'0','distributed_cache_throw_on_error':'0','distributed_cache_log_mode':'on_error','distributed_cache_fetch_metrics_only_from_current_az':'1','distributed_cache_connect_max_tries':'100','distributed_cache_receive_response_wait_milliseconds':'60000','distributed_cache_receive_timeout_milliseconds':'10000','distributed_cache_wait_connection_from_pool_milliseconds':'100','distributed_cache_bypass_connection_pool':'0','distributed_cache_pool_behaviour_on_limit':'allocate_bypassing_pool','distributed_cache_read_alignment':'0','distributed_cache_max_unacked_inflight_packets':'10','distributed_cache_data_packet_ack_window':'5','keeper_max_retries':'10','keeper_retry_initial_backoff_ms':'100','keeper_retry_max_backoff_ms':'5000','insert_keeper_max_retries':'20','ignore_drop_queries_probability':'0','traverse_shadow_remote_data_paths':'0','geo_distance_returns_float64_on_float64_arguments':'0','allow_get_client_http_header':'0','cast_string_to_dynamic_use_inference':'0','enable_blob_storage_log':'1','use_json_alias_for_old_object_type':'1','default_normal_view_sql_security':'INVOKER','default_materialized_view_sql_security':'DEFINER','default_view_definer':'CURRENT_USER','use_async_executor_for_materialized_views':'0','allow_deprecated_error_prone_window_functions':'1','allow_deprecated_snowflake_conversion_functions':'1','extract_key_value_pairs_max_pairs_per_row':'0','restore_replace_external_engines_to_null':'0','restore_replace_external_table_functions_to_null':'0','restore_replace_external_dictionary_source_to_null':'0','create_if_not_exists':'0','enforce_strict_identifier_format':'0','mongodb_throw_on_unsupported_query':'0','allow_experimental_materialized_postgresql_table':'0','allow_experimental_time_series_table':'0','allow_experimental_vector_similarity_index':'0','allow_experimental_dynamic_type':'0','allow_experimental_json_type':'0','allow_experimental_shared_set_join':'0','hnsw_candidate_list_size_for_search':'64','join_to_sort_minimum_perkey_rows':'0','join_to_sort_maximum_table_rows':'0','allow_experimental_join_right_table_sorting':'0','use_hive_partitioning':'0','allow_statistics_optimize':'0','allow_experimental_statistics':'0','allow_experimental_parallel_reading_from_replicas':'0','parallel_replicas_mode':'read_tasks','parallel_replicas_custom_key_range_lower':'0','parallel_replicas_custom_key_range_upper':'0','cluster_for_parallel_replicas':'default','parallel_replicas_allow_in_with_subquery':'0','parallel_replicas_prefer_local_join':'1','parallel_replicas_mark_segment_size':'128','allow_archive_path_syntax':'0','parallel_replicas_local_plan':'0','parallel_replicas_index_analysis_only_on_coordinator':'0','allow_experimental_full_text_index':'0','allow_experimental_join_condition':'0','allow_experimental_analyzer':'0','analyzer_compatibility_join_using_top_level_identifier':'0','implicit_select':'0','allow_experimental_shared_merge_tree':'0','allow_experimental_database_replicated':'0','allow_experimental_refreshable_materialized_view':'0','s3queue_allow_experimental_sharded_mode':'0','http_max_chunk_size':'0','enable_deflate_qpl_codec':'0','output_format_csv_serialize_tuple_into_separate_columns':'1','input_format_csv_deserialize_separate_columns_into_tuple':'1','input_format_force_null_for_omitted_fields':'0','input_format_parquet_bloom_filter_push_down':'0','input_format_parquet_use_native_reader':'0','input_format_orc_reader_time_zone_name':'GMT','input_format_orc_dictionary_as_low_cardinality':'0','input_format_parquet_enable_row_group_prefetch':'0','input_format_hive_text_allow_variable_number_of_columns':'0','input_format_csv_try_infer_strings_from_quoted_tuples':'1','input_format_parquet_max_block_size':'8192','input_format_parquet_prefer_block_bytes':'0','input_format_json_use_string_type_for_ambiguous_paths_in_named_tuples_inference_from_objects':'0','input_format_json_throw_on_bad_escape_sequence':'1','input_format_json_ignore_unnecessary_fields':'0','input_format_try_infer_variants':'0','type_json_skip_duplicated_paths':'0','input_format_json_max_depth':'1000000','input_format_json_empty_as_default':'0','input_format_try_infer_datetimes_only_datetime64':'1','input_format_try_infer_exponent_floats':'1','input_format_tsv_crlf_end_of_line':'0','input_format_native_decode_types_in_binary_format':'0','output_format_native_encode_types_in_binary_format':'0','output_format_native_write_json_as_string':'0','date_time_input_format':'best_effort','date_time_64_output_format_cut_trailing_zeros_align_to_groups_of_thousands':'0','input_format_binary_decode_types_in_binary_format':'0','output_format_binary_encode_types_in_binary_format':'0','input_format_binary_read_json_as_string':'0','output_format_binary_write_json_as_string':'0','output_format_pretty_max_value_width_apply_for_single_value':'1','output_format_pretty_display_footer_column_names':'0','output_format_pretty_display_footer_column_names_min_rows':'0','output_format_parquet_string_as_string':'0','output_format_parquet_compression_method':'lz4','output_format_parquet_use_custom_encoder':'0','output_format_parquet_write_page_index':'0','format_template_row_format':'','format_template_resultset_format':'','output_format_pretty_row_numbers':'0','output_format_pretty_highlight_digit_groups':'0','output_format_pretty_single_large_number_tip_threshold':'0','rows_before_aggregation':'0','output_format_arrow_string_as_string':'0','output_format_orc_string_as_string':'0','output_format_orc_compression_method':'lz4','output_format_orc_dictionary_key_size_threshold':'0','output_format_values_escape_quote_with_quote':'0','validate_experimental_and_suspicious_types_inside_nested_types':'0','show_create_query_identifier_quoting_rule':'when_necessary','show_create_query_identifier_quoting_style':'Backticks'}
used_aggregate_functions:   []
used_aggrega⋯combinators:   []
used_database_engines:      []
used_data_type_families:    ['LowCardinality','UInt16','String']
used_dictionaries:          []
used_formats:               []
used_functions:             ['_CAST','replicate']
used_storages:              []
used_table_functions:       []
used_row_policies:          []
used_privileges:            [<redacted>]
missing_privileges:         []
transaction_id:             (0,0,'00000000-0000-0000-0000-000000000000')
query_cache_usage:          None
asynchronous_read_counters: {'max_parallel_read_tasks':184,'max_parallel_prefetch_tasks':100,'total_prefetch_tasks':574574}
DistributedC⋯ofileEvents:   {}

This can be easily reproduced with

CREATE TABLE default.trips
(
    `trip_id` UInt32,
    `vendor_id` Enum8('1' = 1, '2' = 2, '3' = 3, '4' = 4, 'CMT' = 5, 'VTS' = 6, 'DDS' = 7, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14, '' = 15),
    `pickup_date` Date,
    `pickup_datetime` DateTime,
    `dropoff_date` Date,
    `dropoff_datetime` DateTime,
    `store_and_fwd_flag` UInt8,
    `rate_code_id` UInt8,
    `pickup_longitude` Float64,
    `pickup_latitude` Float64,
    `dropoff_longitude` Float64,
    `dropoff_latitude` Float64,
    `passenger_count` UInt8,
    `trip_distance` Float64,
    `fare_amount` Float32,
    `extra` Float32,
    `mta_tax` Float32,
    `tip_amount` Float32,
    `tolls_amount` Float32,
    `ehail_fee` Float32,
    `improvement_surcharge` Float32,
    `total_amount` Float32,
    `payment_type` Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4),
    `trip_type` UInt8,
    `pickup` String,
    `dropoff` String,
    `cab_type` Enum8('yellow' = 1, 'green' = 2, 'uber' = 3),
    `pickup_nyct2010_gid` Int8,
    `pickup_ctlabel` Float32,
    `pickup_borocode` Int8,
    `pickup_ct2010` String,
    `pickup_boroct2010` String,
    `pickup_cdeligibil` String,
    `pickup_ntacode` FixedString(4),
    `pickup_ntaname` String,
    `pickup_puma` UInt16,
    `dropoff_nyct2010_gid` UInt8,
    `dropoff_ctlabel` Float32,
    `dropoff_borocode` UInt8,
    `dropoff_ct2010` String,
    `dropoff_boroct2010` String,
    `dropoff_cdeligibil` String,
    `dropoff_ntacode` FixedString(4),
    `dropoff_ntaname` String,
    `dropoff_puma` UInt16
)
ENGINE = SharedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toYYYYMM(pickup_date)
ORDER BY pickup_datetime
SETTINGS index_granularity = 8192

Populate table with some data:

INSERT INTO trips
   SELECT *
   FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_*.gz', 'TabSeparatedWithNames')
   LIMIT 1000000;

Export one row into a file in the native format

SELECT * FROM trips LIMIT 1 INTO OUTFILE 'data.clickhouse' FORMAT Native

Insert this one row

INSERT INTO trips FROM INFILE 'data.clickhouse' FORMAT Native

Observe large InterfaceNativeSendBytes in ProfileEvent in query_log for the insert query ('InterfaceNativeSendBytes':13865):

{'Query':1,'InsertQuery':1,'InitialQuery':1,'FileOpen':1,'WriteBufferFromFileDescriptorWrite':1,'WriteBufferFromFileDescriptorWriteBytes':40249,'ReadCompressedBytes':54880,'CompressedReadBufferBlocks':3,'CompressedReadBufferBytes':204221,'IOBufferAllocs':17,'IOBufferAllocBytes':4612706,'FunctionExecute':8,'DiskWriteElapsedMicroseconds':53,'NetworkReceiveElapsedMicroseconds':5118,'NetworkSendElapsedMicroseconds':98,'NetworkReceiveBytes':54848,'NetworkSendBytes':13865,'LocalThreadPoolJobs':1,'InsertedRows':19,'InsertedBytes':204884,'ZooKeeperTransactions':9,'ZooKeeperRemove':1,'ZooKeeperExists':1,'ZooKeeperMulti':7,'SelectedRows':19,'SelectedBytes':204884,'MergeTreeDataWriterRows':19,'MergeTreeDataWriterUncompressedBytes':204884,'MergeTreeDataWriterCompressedBytes':39453,'MergeTreeDataWriterBlocks':1,'MergeTreeDataWriterSortingBlocksMicroseconds':3,'InsertedCompactParts':1,'ContextLock':24,'RWLockAcquiredReadLocks':2,'PartsLockHoldMicroseconds':41,'RealTimeMicroseconds':112009,'UserTimeMicroseconds':3905,'SystemTimeMicroseconds':603,'SoftPageFaults':195,'OSCPUWaitMicroseconds':18,'OSCPUVirtualTimeMicroseconds':4506,'OSWriteBytes':49152,'OSReadChars':1572,'OSWriteChars':89616,'S3WriteMicroseconds':39374,'S3WriteRequestsCount':1,'DiskS3WriteMicroseconds':39374,'DiskS3WriteRequestsCount':1,'S3PutObject':1,'DiskS3PutObject':1,'WriteBufferFromS3Microseconds':39725,'WriteBufferFromS3Bytes':40249,'CachedWriteBufferCacheWriteBytes':40249,'CachedWriteBufferCacheWriteMicroseconds':228,'FilesystemCacheReserveMicroseconds':138,'FileSegmentCompleteMicroseconds':15,'FileSegmentWriteMicroseconds':78,'FileSegmentHolderCompleteMicroseconds':1,'FilesystemCacheHoldFileSegments':1,'FilesystemCacheUnusedHoldFileSegments':1,'MetadataFromKeeperCacheHit':7,'MetadataFromKeeperCacheMiss':6,'MetadataFromKeeperTransactionCommit':1,'MetadataFromKeeperOperations':6,'MetadataFromKeeperIndividualOperations':5,'SharedMergeTreeMetadataCacheHintLoadedFromCache':1,'LogTrace':9,'LogDebug':8,'LoggerElapsedNanoseconds':377902,'InterfaceNativeSendBytes':13865,'InterfaceNativeReceiveBytes':54848,'DiskConnectionsReused':1,'DiskConnectionsPreserved':1,'HTTPConnectionsReused':1,'HTTPConnectionsPreserved':1,'ReadWriteBufferFromHTTPRequestsSent':1,'ReadWriteBufferFromHTTPBytes':670,'ConcurrencyControlSlotsGranted':1,'ConcurrencyControlSlotsAcquired':1}

In this case send is smaller than receive, but is still excessively large in absolute terms and is not amortised when the insert is performed a million times over a reused connection. The experiment can be easily modified for the send to be larger than received.

Expected behavior

Don't send many kilobytes of data on inserts of a single row. The volume of send currently consists of table metadata, profile events, and progress updates, all of which can be optimised.

Error message and/or stacktrace

No response

Additional context

Current behavior is an architectural choice and it can be improved in multiple ways:

  • Amortise metadata sent to clients across different inserts made within/across TCP connections of the same client.
  • Make it possible for clients to still perform async inserts without receiving metadata from the server before performing the insert.
  • Introduce controls over how profile events and progress blocks are sent.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions