Skip to content

Query to remote table with enable_analyzer=1 try to read all table from remote or Distributed, instead of executing query remotely #81718

@astudnev

Description

@astudnev

Company or project name

bitquery

Describe the situation

Query to remote table with enable_analyzer=1 try to read all table from remote or Distributed, instead of executing query remotely

Compare execution of a query:

charch37 :) SELECT max(`Block_Date`) AS `_Date_0`, count() AS `count()` FROM remote('chstrm36', 'eth', 'dex_trades_tx', 'default', '[PWD]') SETTINGS enable_analyzer=0

SELECT
    max(Block_Date) AS _Date_0,
    count() AS `count()`
FROM remote('chstrm36', 'eth', 'dex_trades_tx', 'default', '[PWD]')
         SETTINGS enable_analyzer = 0

Query id: ea088f8d-1dca-4deb-a8ef-513f86bce214

    ┌────_Date_0─┬───count()─┐
1. │ 2025-06-12 │ 394548119 │
   └────────────┴───────────┘

1 row in set. Elapsed: 5.044 sec. Processed 395.86 million rows, 794.61 MB (78.48 million rows/s., 157.54 MB/s.)
    Peak memory usage: 1.04 GiB.

    charch37 :) SELECT max(`Block_Date`) AS `_Date_0`, count() AS `count()` FROM remote('chstrm36', 'eth', 'dex_trades_tx', 'default', '[PWD]') SETTINGS enable_analyzer=1

SELECT
    max(Block_Date) AS _Date_0,
    count() AS `count()`
FROM remote('chstrm36', 'eth', 'dex_trades_tx', 'default', '[PWD]')
         SETTINGS enable_analyzer = 1

Query id: d0e5543e-413f-436e-a02f-4127f52dc582

Cancelling query.
Query was cancelled.

0 rows in set. Elapsed: 45.494 sec. Processed 28.98 million rows, 80.72 GB (636.94 thousand rows/s., 1.77 GB/s.)
    Peak memory usage: 4.47 GiB.

Same happens with using Distributed table.

Difference in huge: note the amount of disk read on the second case, and the query take so long that i can not wait for it

Which ClickHouse versions are affected?

25.3.3.42 / 25.6.1.2609 / 25.4.5.24

How to reproduce

  1. create remote table
  2. query with remote() with analyser on and off and compare

Expected performance

Perfomance exected be the same with analyzer ON

Additional context

  1. when same query executed locally on server , where remote table located, it s executed OK
  2. when Distributed query executed locally on server , where remote table located, it s executed OK

I also tried to query remote() from latest version 25.6.1.2609

behavior exactly the same, but it shows more stats:

Image

Note here the first line of Gb downloaded - with analyzer OFF this line does not appear

On remote server query with analyser ON is

SELECT max(`__table1`.`Block_Date`) AS `_Date_0`, count() AS `count()` FROM `eth`.`dex_trades_tx` AS `__table1`

and with OFF is

SELECT max(`Block_Date`) AS `_Date_0`, count() AS `count()` FROM `eth`.`dex_trades_tx`

Note that executing both of these queries take same time and resources, no difference.

For me the problem is not in the query it executes on remote server, but looks analyser tries to download something and it is not related to remote query per se

Metadata

Metadata

Assignees

Labels

analyzerIssues and pull-requests related to new analyzercomp-distributedDistributed table engine & query routing across shards (sharding/load balancing).performance

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions