Skip to content

S3Queue is producing 1k+ ListBlob calls per second #54998

@rv32ima

Description

@rv32ima

Describe the unexpected behaviour
I have a S3Queue table like so:

CREATE TABLE events_queue (
  `Timestamp` DateTime64(9),
  `SchemaVersion` LowCardinality(String),
  `EventId` String,
  `FullName_Name` LowCardinality(String),
  `FullName_Namespace` LowCardinality(String),
  `Entity_Id` String,
  `Entity_Type` LowCardinality(String),
  `EventData` String
)
ENGINE = S3Queue('[redacted]', '[redacted]', '[redacted]', 'Parquet')
PARTITION BY toYYYYMMDD(`Timestamp`)
ORDER BY (`Timestamp`, `EventId`)
PRIMARY KEY (`Timestamp`, `EventId`)
SETTINGS
  mode = 'unordered',
  keeper_path = '/clickhouse/playfab_s3',
  after_processing = 'delete',
  s3queue_polling_size=5;

Streaming import works fine, but checking billing on AWS, I see that my egress charges are abnormally high, as well as the number of requests made per second are abnormally high. It looks like ClickHouse is generating thousands of ListBlob calls per second, even though it shouldn't.

How to reproduce

  • Which ClickHouse server version to use: 23.8.2
  • Create a table with an S3Queue engine
  • Watch as the amount of requests made skyrocket 🚀

Expected behavior
ClickHouse should be polling at a more reasonable rate - leaving an S3Queue running for 3 days racked us up nearly $100+ in egress / transaction fees. This is an absurd amount of requests to be making per second and almost amounts to a DoS attack.

Metadata

Metadata

Assignees

Labels

unexpected behaviourResult is unexpected, but not entirely wrong at the same time.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions