Skip to content

FULL OUTER JOIN works incorrectly with Cluster functions #89996

@alsugiliazova

Description

@alsugiliazova

Describe the unexpected behaviour

A FULL OUTER JOIN between two identical s3Cluster table functions on the same dataset returns only left-side rows with NULL values on the right side, instead of matching pairs. The same join using plain s3 table functions produces the correct result with matched rows.

Same with icebergS3Cluster.

SELECT *
FROM s3Cluster(replicated_cluster, 'http://minio:9000/warehouse/data2/data/**.parquet', 'admin', 'password') AS t1
FULL OUTER JOIN s3Cluster(replicated_cluster, 'http://minio:9000/warehouse/data2/data/**.parquet', 'admin', 'password') AS t2 ON t1.boolean_col = t2.boolean_col
ORDER BY tuple(*) ASC

Query id: 561496ba-623f-4e34-a5a8-6da1f6e8567a

   ┌─boolean_col─┬─long_col─┬─t2.boolean_col─┬─t2.long_col─┐
1. │ false       │     2000 │ ᴺᵁᴸᴸ           │        ᴺᵁᴸᴸ │
2. │ true        │     1000 │ ᴺᵁᴸᴸ           │        ᴺᵁᴸᴸ │
   └─────────────┴──────────┴────────────────┴─────────────┘

2 rows in set. Elapsed: 0.025 sec.

Expected:

SELECT *
FROM s3('http://minio:9000/warehouse/data2/data/**.parquet', 'admin', 'password') AS t1
FULL OUTER JOIN s3('http://minio:9000/warehouse/data2/data/**.parquet', 'admin', 'password') AS t2 ON t1.boolean_col = t2.boolean_col
ORDER BY tuple(*) ASC

Query id: 1da58346-23c0-4bce-aa79-e53094a4b455

   ┌─boolean_col─┬─long_col─┬─t2.boolean_col─┬─t2.long_col─┐
1. │ false       │     2000 │ false          │        20002. │ true        │     1000 │ true           │        1000 │
   └─────────────┴──────────┴────────────────┴─────────────┘

2 rows in set. Elapsed: 0.015 sec. 

Additional context

SELECT *
FROM s3Cluster(replicated_cluster, 'http://minio:9000/warehouse/data2/data/**.parquet', 'admin', 'password')

Query id: 5f8c80ba-c897-4772-ae74-8349304d664c

   ┌─boolean_col─┬─long_col─┐
1. │ true        │     10002. │ false       │     2000 │
   └─────────────┴──────────┘

2 rows in set. Elapsed: 0.019 sec. 
SELECT *
FROM icebergS3Cluster(replicated_cluster, 'http://minio:9000/warehouse/data2', 'admin', 'password') AS t1
FULL OUTER JOIN icebergS3Cluster(replicated_cluster, 'http://minio:9000/warehouse/data2', 'admin', 'password') AS t2 ON t1.boolean_col = t2.boolean_col
ORDER BY tuple(*) ASC

Query id: 128b9db1-f53e-45c7-a009-75ab47f7266d

   ┌─boolean_col─┬─long_col─┬─t2.boolean_col─┬─t2.long_col─┐
1. │ false       │     2000 │ ᴺᵁᴸᴸ           │        ᴺᵁᴸᴸ │
2. │ true        │     1000 │ ᴺᵁᴸᴸ           │        ᴺᵁᴸᴸ │
   └─────────────┴──────────┴────────────────┴─────────────┘

2 rows in set. Elapsed: 0.026 sec. 

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions