Skip to content

S3 Wildcard Issue When Using OR (Not usable) #49929

@warleysa

Description

@warleysa

There is an issue with performance when using the S3 Wildcard below.

Clickhouse Versions Tested (We have about 20 total servers):

  • clickhouse/clickhouse-server:23.4.2.11
  • clickhouse/clickhouse-server:23.1.3.5
  • clickhouse/clickhouse-server:23.2.1.2537

Wildcard Used from Documentation: {some_string,another_string,yet_another_one} — Substitutes any of strings 'some_string', 'another_string', 'yet_another_one'

Without Wildcard:

  • Expected result: Return all data under presto_data/config/bts_id=555251 folder (5 files, 4MB total)
  • S3 Path: us-east-1.amazonaws.com/presto_data/config/bts_id=555251/*
  • Speed: Finishes in about 1 second

With Wildcard:

  • Expected result: Return all data under presto_data/config/bts_id=555251 folder (5 files, 4MB total) AND presto_data/config/bts_id=555256 folder (4 files, 3MB total)
  • S3 Path: us-east-1.amazonaws.com/presto_data/config/bts_id={555251,555256}/*
  • Speed: Never Finishes; Memory Limit Error after about 150 seconds

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    comp-object-storageObject storage connectivity (S3/GCS/Azure) including credentials, retries, multipart, etc.performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions