Skip to content

Setting: max_readers for S3/url/hdfs cluster table engines #52437

@danthegoodman1

Description

@danthegoodman1

Use case

Increase the max readers used for reading files from s3(Cluster) table engine and similar. When using clusters on the size of 96 cores, and running an s3Cluster select on 96 files, I observe only 50-something S3 readers being used (see https://clickhousedb.slack.com/archives/CU478UEQZ/p1689859840124579 for more context, including long query output):

Describe the solution you'd like

A max_readers setting that would be propagated to the nodes (or just on the single node for s3()) to allow a max number of readers used to pull s3 files per node. For example:

SELECT count() FROM s3Cluster('{cluster}', 'https://s3.us-east-1.amazonaws.com/altinity-clickhouse-data/nyc_taxi_rides/data/tripdata/data-*.csv.gz', 'CSVWithNames', 
	'pickup_date Date, id UInt64, vendor_id String, tpep_pickup_datetime DateTime, tpep_dropoff_datetime DateTime, passenger_count UInt8, trip_distance Float32, pickup_longitude Float32, pickup_latitude Float32, rate_code_id String, store_and_fwd_flag String, dropoff_longitude Float32, dropoff_latitude Float32, payment_type LowCardinality(String), fare_amount Float32, extra String, mta_tax Float32, tip_amount Float32, tolls_amount Float32, improvement_surcharge Float32, total_amount Float32, pickup_location_id UInt16, dropoff_location_id UInt16, junk1 String, junk2 String', 
	'gzip')
settings max_readers=16

This would ensure I get up to 16 readers per node. It could also instead be the aggregate where I'd make it 96 for a 6x16vCPU cluster.

Describe alternatives you've considered

None, it seems like this is some hard-coded wall that disrupts performance :(

Additional context

https://clickhousedb.slack.com/archives/CU478UEQZ/p1689859840124579

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions