-
Notifications
You must be signed in to change notification settings - Fork 8.3k
ClickHouse can be inefficient with latency/egress throughput for inserts over native interface #75719
Description
Company or project name
ClickHouse
Describe the unexpected behaviour
@CheSema and I discovered that when performing inserts over the native interface, ClickHouse server will send table metadata to the client before the client sends data for the insert. While clients may be reusing TCP connections indefinitely for thousands of inserts, each individual (async) insert will wait to receive table metadata. The issue is particularly acute when performing async inserts that insert a single row at a time and is two-fold:
- For a table of a sufficiently complicated schema, ClickHouse sends to the client far more than the client sends to ClickHouse, and this unnecessarily happens for each insert. In bandwidth-constrained environments (such as in any cloud provider where egress is not free), this is problematic.
- The client needs to wait for the metadata from the server on each insert, affecting latency.
Architecture of network send/receive over the HTTP interface is architected differently and doesn't exhibit the same behaviour, since there is no table metadata forwarding from the server to client over HTTP.
How to reproduce
Example query_log:
hostname: <redacted>
type: QueryFinish
event_date: 2025-02-07
event_time: 2025-02-07 13:33:29
event_time_microseconds: 2025-02-07 13:33:29.698443
query_start_time: 2025-02-07 13:33:29
query_start_⋯icroseconds: 2025-02-07 13:33:29.636553
query_duration_ms: 61
read_rows: 0
read_bytes: 0
written_rows: 0
written_bytes: 0
result_rows: 0
result_bytes: 0
memory_usage: 0
current_database: <redacted>
query: INSERT INTO <table> (<columns>) FORMAT Native
formatted_query:
normalized_query_hash: 11837435207090871028 -- 11.84 quintillion
query_kind: Insert
databases: ['<redacted>']
tables: ['<redacted>']
columns: []
partitions: []
projections: []
views: []
exception_code: 0
exception:
stack_trace:
is_initial_query: 1
user: <redacted>
query_id: <redacted>
address: <redacted>
port: 52542
initial_user: <redacted>
initial_query_id: <redacted>
initial_address: <redacted>
initial_port: 52542
initial_query_start_time: 2025-02-07 13:33:29
initial_quer⋯icroseconds: 2025-02-07 13:33:29.636553
interface: 1
is_secure: 0
os_user:
client_hostname: <redacted>
client_name: <redacted>
client_revision: 54460
client_version_major: 2
client_version_minor: 30
client_version_patch: 0
http_method: 0
http_user_agent:
http_referer:
forwarded_for: <redacted>
quota_key:
distributed_depth: 0
revision: 54491
log_comment:
thread_ids: [5442]
peak_threads_usage: 1
ProfileEvents: {'Query':1,'InsertQuery':1,'InitialQuery':1,'AsyncInsertQuery':1,'ReadCompressedBytes':72,'CompressedReadBufferBlocks':2,'CompressedReadBufferBytes':20,'IOBufferAllocs':3,'IOBufferAllocBytes':1048913,'NetworkReceiveElapsedMicroseconds':58902,'NetworkSendElapsedMicroseconds':101,'NetworkReceiveBytes':38,'NetworkSendBytes':7584,'ContextLock':20,'ContextLockWaitMicroseconds':16,'RWLockAcquiredReadLocks':2,'PartsLockHoldMicroseconds':2,'RealTimeMicroseconds':61887,'UserTimeMicroseconds':2162,'SystemTimeMicroseconds':24,'SoftPageFaults':1,'OSCPUVirtualTimeMicroseconds':2185,'OSReadChars':454,'OSWriteChars':3982,'LogTrace':1,'LogDebug':1,'LoggerElapsedNanoseconds':80830,'InterfaceNativeSendBytes':7584,'InterfaceNativeReceiveBytes':38}
Settings: {'min_external_table_block_size_rows':'1048449','min_external_table_block_size_bytes':'268402944','max_insert_threads':'4','max_threads':'9','max_parsing_threads':'\'auto(9)\'','use_hedged_requests':'0','azure_strict_upload_part_size':'0','azure_max_blocks_in_multipart_upload':'50000','azure_min_upload_part_size':'16777216','azure_max_upload_part_size':'5368709120','s3_max_part_number':'10000','s3_max_single_operation_copy_size':'33554432','azure_upload_part_size_multiply_factor':'2','azure_upload_part_size_multiply_parts_count_threshold':'500','azure_max_inflight_parts_for_one_file':'20','azure_max_single_part_copy_size':'268435456','s3_skip_empty_files':'0','azure_allow_parallel_part_upload':'1','hdfs_throw_on_zero_files_match':'0','azure_throw_on_zero_files_match':'0','s3_ignore_file_doesnt_exist':'0','hdfs_ignore_file_doesnt_exist':'0','azure_ignore_file_doesnt_exist':'0','azure_sdk_max_retries':'10','azure_sdk_retry_initial_backoff_ms':'10','azure_sdk_retry_max_backoff_ms':'1000','s3_validate_request_settings':'1','s3_connect_timeout_ms':'1000','s3queue_migrate_old_metadata_to_buckets':'0','azure_skip_empty_files':'0','dictionary_validate_primary_key_type':'0','distributed_insert_skip_read_only_replicas':'0','distributed_foreground_insert':'1','allow_reorder_prewhere_conditions':'1','alter_sync':'0','allow_suspicious_variant_types':'1','allow_suspicious_primary_key':'1','enable_memory_bound_merging_of_aggregation_results':'1','function_locate_has_mysql_compatible_argument_order':'0','enable_parsing_to_custom_serialization':'0','do_not_merge_across_partitions_select_final':'0','split_parts_ranges_into_intersecting_and_non_intersecting_final':'1','split_intersecting_parts_ranges_into_layers_final':'1','mysql_map_string_to_text_in_show_columns':'0','mysql_map_fixed_string_to_text_in_show_columns':'0','materialize_skip_indexes_on_insert':'1','materialize_statistics_on_insert':'1','log_queries':'1','log_queries_probability':'1','log_processors_profiles':'0','merge_tree_read_split_ranges_into_intersecting_and_non_intersecting_injection_probability':'0','max_http_get_redirects':'10','join_output_by_rowlist_perkey_rows_threshold':'0','min_free_disk_bytes_to_perform_insert':'0','min_free_disk_ratio_to_perform_insert':'0','ignore_on_cluster_for_replicated_named_collections_queries':'0','enable_zstd_qat_codec':'0','query_profiler_real_time_period_ns':'0','enable_vertical_final':'0','max_bytes_before_external_group_by':'19327352832','prefer_external_sort_block_bytes':'0','max_bytes_before_external_sort':'19327352832','cross_join_min_rows_to_compress':'0','cross_join_min_bytes_to_compress':'0','max_memory_usage':'38654705664','backup_restore_keeper_retry_max_backoff_ms':'60000','backup_restore_s3_retry_attempts':'1000','query_metric_log_interval':'0','read_in_order_use_buffering':'0','cancel_http_readonly_queries_on_client_close':'1','max_parts_to_move':'0','max_table_size_to_drop':'1000000000000','max_partition_size_to_drop':'1000000000000','postgresql_connection_attempt_timeout':'2','postgresql_connection_pool_retries':'2','default_table_engine':'ReplicatedMergeTree','mutations_sync':'0','lightweight_deletes_sync':'2','optimize_functions_to_subcolumns':'0','optimize_time_filter_with_preimage':'1','throw_if_deduplication_in_dependent_materialized_views_enabled_with_async_insert':'0','allow_materialized_view_with_bad_select':'1','max_parser_backtracks':'0','max_recursive_cte_evaluation_depth':'1000','optimize_trivial_insert_select':'0','query_cache_system_table_handling':'save','query_cache_tag':'','optimize_rewrite_sum_if_to_count_if':'0','collect_hash_table_stats_during_joins':'0','max_size_to_preallocate_for_joins':'0','allow_experimental_kafka_offsets_storage_in_keeper':'0','database_replicated_allow_only_replicated_engine':'1','database_replicated_allow_replicated_engine_arguments':'1','database_replicated_allow_explicit_uuid':'1','database_replicated_allow_heavy_create':'1','cloud_mode':'1','cloud_mode_engine':'2','cloud_mode_database_engine':'1','distributed_ddl_output_mode':'none_only_active','distributed_ddl_entry_format_version':'6','enable_named_columns_in_function_tuple':'0','query_plan_merge_filters':'0','query_plan_convert_outer_join_to_inner_join':'0','query_plan_optimize_prewhere':'1','merge_tree_min_rows_for_concurrent_read_for_remote_filesystem':'163840','merge_tree_min_bytes_for_concurrent_read_for_remote_filesystem':'251658240','merge_tree_min_bytes_per_task_for_remote_reading':'4194304','merge_tree_min_read_task_size':'8','async_insert':'1','wait_for_async_insert':'0','async_insert_max_data_size':'104857600','async_insert_max_query_number':'5000','async_insert_poll_timeout_ms':'10','async_insert_use_adaptive_busy_timeout':'0','async_insert_busy_timeout_min_ms':'50','async_insert_busy_timeout_max_ms':'10000','async_insert_busy_timeout_increase_rate':'0.2','async_insert_busy_timeout_decrease_rate':'0.2','filesystem_cache_name':'','enable_filesystem_cache_on_write_operations':'1','filesystem_cache_reserve_space_wait_lock_timeout_milliseconds':'1000','filesystem_cache_prefer_bigger_buffer_size':'1','temporary_data_in_cache_reserve_space_wait_lock_timeout_milliseconds':'600000','use_page_cache_for_disks_without_file_cache':'0','read_from_page_cache_if_exists_otherwise_bypass_cache':'0','page_cache_inject_eviction':'0','load_marks_asynchronously':'1','allow_prefetched_read_pool_for_remote_filesystem':'1','filesystem_prefetch_max_memory_usage':'3865470566','filesystem_prefetches_limit':'200','compatibility':'24.1','read_through_distributed_cache':'0','write_through_distributed_cache':'0','distributed_cache_throw_on_error':'0','distributed_cache_log_mode':'on_error','distributed_cache_fetch_metrics_only_from_current_az':'1','distributed_cache_connect_max_tries':'100','distributed_cache_receive_response_wait_milliseconds':'60000','distributed_cache_receive_timeout_milliseconds':'10000','distributed_cache_wait_connection_from_pool_milliseconds':'100','distributed_cache_bypass_connection_pool':'0','distributed_cache_pool_behaviour_on_limit':'allocate_bypassing_pool','distributed_cache_read_alignment':'0','distributed_cache_max_unacked_inflight_packets':'10','distributed_cache_data_packet_ack_window':'5','keeper_max_retries':'10','keeper_retry_initial_backoff_ms':'100','keeper_retry_max_backoff_ms':'5000','insert_keeper_max_retries':'20','ignore_drop_queries_probability':'0','traverse_shadow_remote_data_paths':'0','geo_distance_returns_float64_on_float64_arguments':'0','allow_get_client_http_header':'0','cast_string_to_dynamic_use_inference':'0','enable_blob_storage_log':'1','use_json_alias_for_old_object_type':'1','default_normal_view_sql_security':'INVOKER','default_materialized_view_sql_security':'DEFINER','default_view_definer':'CURRENT_USER','use_async_executor_for_materialized_views':'0','allow_deprecated_error_prone_window_functions':'1','allow_deprecated_snowflake_conversion_functions':'1','extract_key_value_pairs_max_pairs_per_row':'0','restore_replace_external_engines_to_null':'0','restore_replace_external_table_functions_to_null':'0','restore_replace_external_dictionary_source_to_null':'0','create_if_not_exists':'0','enforce_strict_identifier_format':'0','mongodb_throw_on_unsupported_query':'0','allow_experimental_materialized_postgresql_table':'0','allow_experimental_time_series_table':'0','allow_experimental_vector_similarity_index':'0','allow_experimental_dynamic_type':'0','allow_experimental_json_type':'0','allow_experimental_shared_set_join':'0','hnsw_candidate_list_size_for_search':'64','join_to_sort_minimum_perkey_rows':'0','join_to_sort_maximum_table_rows':'0','allow_experimental_join_right_table_sorting':'0','use_hive_partitioning':'0','allow_statistics_optimize':'0','allow_experimental_statistics':'0','allow_experimental_parallel_reading_from_replicas':'0','parallel_replicas_mode':'read_tasks','parallel_replicas_custom_key_range_lower':'0','parallel_replicas_custom_key_range_upper':'0','cluster_for_parallel_replicas':'default','parallel_replicas_allow_in_with_subquery':'0','parallel_replicas_prefer_local_join':'1','parallel_replicas_mark_segment_size':'128','allow_archive_path_syntax':'0','parallel_replicas_local_plan':'0','parallel_replicas_index_analysis_only_on_coordinator':'0','allow_experimental_full_text_index':'0','allow_experimental_join_condition':'0','allow_experimental_analyzer':'0','analyzer_compatibility_join_using_top_level_identifier':'0','implicit_select':'0','allow_experimental_shared_merge_tree':'0','allow_experimental_database_replicated':'0','allow_experimental_refreshable_materialized_view':'0','s3queue_allow_experimental_sharded_mode':'0','http_max_chunk_size':'0','enable_deflate_qpl_codec':'0','output_format_csv_serialize_tuple_into_separate_columns':'1','input_format_csv_deserialize_separate_columns_into_tuple':'1','input_format_force_null_for_omitted_fields':'0','input_format_parquet_bloom_filter_push_down':'0','input_format_parquet_use_native_reader':'0','input_format_orc_reader_time_zone_name':'GMT','input_format_orc_dictionary_as_low_cardinality':'0','input_format_parquet_enable_row_group_prefetch':'0','input_format_hive_text_allow_variable_number_of_columns':'0','input_format_csv_try_infer_strings_from_quoted_tuples':'1','input_format_parquet_max_block_size':'8192','input_format_parquet_prefer_block_bytes':'0','input_format_json_use_string_type_for_ambiguous_paths_in_named_tuples_inference_from_objects':'0','input_format_json_throw_on_bad_escape_sequence':'1','input_format_json_ignore_unnecessary_fields':'0','input_format_try_infer_variants':'0','type_json_skip_duplicated_paths':'0','input_format_json_max_depth':'1000000','input_format_json_empty_as_default':'0','input_format_try_infer_datetimes_only_datetime64':'1','input_format_try_infer_exponent_floats':'1','input_format_tsv_crlf_end_of_line':'0','input_format_native_decode_types_in_binary_format':'0','output_format_native_encode_types_in_binary_format':'0','output_format_native_write_json_as_string':'0','date_time_input_format':'best_effort','date_time_64_output_format_cut_trailing_zeros_align_to_groups_of_thousands':'0','input_format_binary_decode_types_in_binary_format':'0','output_format_binary_encode_types_in_binary_format':'0','input_format_binary_read_json_as_string':'0','output_format_binary_write_json_as_string':'0','output_format_pretty_max_value_width_apply_for_single_value':'1','output_format_pretty_display_footer_column_names':'0','output_format_pretty_display_footer_column_names_min_rows':'0','output_format_parquet_string_as_string':'0','output_format_parquet_compression_method':'lz4','output_format_parquet_use_custom_encoder':'0','output_format_parquet_write_page_index':'0','format_template_row_format':'','format_template_resultset_format':'','output_format_pretty_row_numbers':'0','output_format_pretty_highlight_digit_groups':'0','output_format_pretty_single_large_number_tip_threshold':'0','rows_before_aggregation':'0','output_format_arrow_string_as_string':'0','output_format_orc_string_as_string':'0','output_format_orc_compression_method':'lz4','output_format_orc_dictionary_key_size_threshold':'0','output_format_values_escape_quote_with_quote':'0','validate_experimental_and_suspicious_types_inside_nested_types':'0','show_create_query_identifier_quoting_rule':'when_necessary','show_create_query_identifier_quoting_style':'Backticks'}
used_aggregate_functions: []
used_aggrega⋯combinators: []
used_database_engines: []
used_data_type_families: ['LowCardinality','UInt16','String']
used_dictionaries: []
used_formats: []
used_functions: ['_CAST','replicate']
used_storages: []
used_table_functions: []
used_row_policies: []
used_privileges: [<redacted>]
missing_privileges: []
transaction_id: (0,0,'00000000-0000-0000-0000-000000000000')
query_cache_usage: None
asynchronous_read_counters: {'max_parallel_read_tasks':184,'max_parallel_prefetch_tasks':100,'total_prefetch_tasks':574574}
DistributedC⋯ofileEvents: {}
This can be easily reproduced with
CREATE TABLE default.trips
(
`trip_id` UInt32,
`vendor_id` Enum8('1' = 1, '2' = 2, '3' = 3, '4' = 4, 'CMT' = 5, 'VTS' = 6, 'DDS' = 7, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14, '' = 15),
`pickup_date` Date,
`pickup_datetime` DateTime,
`dropoff_date` Date,
`dropoff_datetime` DateTime,
`store_and_fwd_flag` UInt8,
`rate_code_id` UInt8,
`pickup_longitude` Float64,
`pickup_latitude` Float64,
`dropoff_longitude` Float64,
`dropoff_latitude` Float64,
`passenger_count` UInt8,
`trip_distance` Float64,
`fare_amount` Float32,
`extra` Float32,
`mta_tax` Float32,
`tip_amount` Float32,
`tolls_amount` Float32,
`ehail_fee` Float32,
`improvement_surcharge` Float32,
`total_amount` Float32,
`payment_type` Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4),
`trip_type` UInt8,
`pickup` String,
`dropoff` String,
`cab_type` Enum8('yellow' = 1, 'green' = 2, 'uber' = 3),
`pickup_nyct2010_gid` Int8,
`pickup_ctlabel` Float32,
`pickup_borocode` Int8,
`pickup_ct2010` String,
`pickup_boroct2010` String,
`pickup_cdeligibil` String,
`pickup_ntacode` FixedString(4),
`pickup_ntaname` String,
`pickup_puma` UInt16,
`dropoff_nyct2010_gid` UInt8,
`dropoff_ctlabel` Float32,
`dropoff_borocode` UInt8,
`dropoff_ct2010` String,
`dropoff_boroct2010` String,
`dropoff_cdeligibil` String,
`dropoff_ntacode` FixedString(4),
`dropoff_ntaname` String,
`dropoff_puma` UInt16
)
ENGINE = SharedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toYYYYMM(pickup_date)
ORDER BY pickup_datetime
SETTINGS index_granularity = 8192
Populate table with some data:
INSERT INTO trips
SELECT *
FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_*.gz', 'TabSeparatedWithNames')
LIMIT 1000000;
Export one row into a file in the native format
SELECT * FROM trips LIMIT 1 INTO OUTFILE 'data.clickhouse' FORMAT Native
Insert this one row
INSERT INTO trips FROM INFILE 'data.clickhouse' FORMAT Native
Observe large InterfaceNativeSendBytes in ProfileEvent in query_log for the insert query ('InterfaceNativeSendBytes':13865):
{'Query':1,'InsertQuery':1,'InitialQuery':1,'FileOpen':1,'WriteBufferFromFileDescriptorWrite':1,'WriteBufferFromFileDescriptorWriteBytes':40249,'ReadCompressedBytes':54880,'CompressedReadBufferBlocks':3,'CompressedReadBufferBytes':204221,'IOBufferAllocs':17,'IOBufferAllocBytes':4612706,'FunctionExecute':8,'DiskWriteElapsedMicroseconds':53,'NetworkReceiveElapsedMicroseconds':5118,'NetworkSendElapsedMicroseconds':98,'NetworkReceiveBytes':54848,'NetworkSendBytes':13865,'LocalThreadPoolJobs':1,'InsertedRows':19,'InsertedBytes':204884,'ZooKeeperTransactions':9,'ZooKeeperRemove':1,'ZooKeeperExists':1,'ZooKeeperMulti':7,'SelectedRows':19,'SelectedBytes':204884,'MergeTreeDataWriterRows':19,'MergeTreeDataWriterUncompressedBytes':204884,'MergeTreeDataWriterCompressedBytes':39453,'MergeTreeDataWriterBlocks':1,'MergeTreeDataWriterSortingBlocksMicroseconds':3,'InsertedCompactParts':1,'ContextLock':24,'RWLockAcquiredReadLocks':2,'PartsLockHoldMicroseconds':41,'RealTimeMicroseconds':112009,'UserTimeMicroseconds':3905,'SystemTimeMicroseconds':603,'SoftPageFaults':195,'OSCPUWaitMicroseconds':18,'OSCPUVirtualTimeMicroseconds':4506,'OSWriteBytes':49152,'OSReadChars':1572,'OSWriteChars':89616,'S3WriteMicroseconds':39374,'S3WriteRequestsCount':1,'DiskS3WriteMicroseconds':39374,'DiskS3WriteRequestsCount':1,'S3PutObject':1,'DiskS3PutObject':1,'WriteBufferFromS3Microseconds':39725,'WriteBufferFromS3Bytes':40249,'CachedWriteBufferCacheWriteBytes':40249,'CachedWriteBufferCacheWriteMicroseconds':228,'FilesystemCacheReserveMicroseconds':138,'FileSegmentCompleteMicroseconds':15,'FileSegmentWriteMicroseconds':78,'FileSegmentHolderCompleteMicroseconds':1,'FilesystemCacheHoldFileSegments':1,'FilesystemCacheUnusedHoldFileSegments':1,'MetadataFromKeeperCacheHit':7,'MetadataFromKeeperCacheMiss':6,'MetadataFromKeeperTransactionCommit':1,'MetadataFromKeeperOperations':6,'MetadataFromKeeperIndividualOperations':5,'SharedMergeTreeMetadataCacheHintLoadedFromCache':1,'LogTrace':9,'LogDebug':8,'LoggerElapsedNanoseconds':377902,'InterfaceNativeSendBytes':13865,'InterfaceNativeReceiveBytes':54848,'DiskConnectionsReused':1,'DiskConnectionsPreserved':1,'HTTPConnectionsReused':1,'HTTPConnectionsPreserved':1,'ReadWriteBufferFromHTTPRequestsSent':1,'ReadWriteBufferFromHTTPBytes':670,'ConcurrencyControlSlotsGranted':1,'ConcurrencyControlSlotsAcquired':1}
In this case send is smaller than receive, but is still excessively large in absolute terms and is not amortised when the insert is performed a million times over a reused connection. The experiment can be easily modified for the send to be larger than received.
Expected behavior
Don't send many kilobytes of data on inserts of a single row. The volume of send currently consists of table metadata, profile events, and progress updates, all of which can be optimised.
Error message and/or stacktrace
No response
Additional context
Current behavior is an architectural choice and it can be improved in multiple ways:
- Amortise metadata sent to clients across different inserts made within/across TCP connections of the same client.
- Make it possible for clients to still perform async inserts without receiving metadata from the server before performing the insert.
- Introduce controls over how profile events and progress blocks are sent.