-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Adding unnecesary "SELECT * FROM ()" multiplies 3x interpretation time #35490
Description
Given a UNION query, if we add unnecesary "SELECT * FROM ()" for each UNION subquery we can see a significan increase in the query time.
Describe the situation
We have 8 tables with similar schema (4 fields UInt64), we execute the following query to obtain all the records from all tables in a single query:
SELECT a1, a2, a3, a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, 0 AS c1, ... FROM test_select.table_a UNION ALL
SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, b1, b2, b3, b4, 0 AS c1, ... FROM test_select.table_b UNION ALL
... same for the rest of 8 tables
I have been measuring the interpretation time (executeQueryImpl() -> InterpreterFactory::get(...) for this query in my local environment. The best time is 739 µs.
While with the same query, but just adding SELECT * FROM() for each one of the subqueries, like this:
SELECT * FROM (SELECT a1, a2, a3, a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, 0 AS c1, ... FROM test_select.table_a) UNION ALL
SELECT * FROM (SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, b1, b2, b3, b4, 0 AS c1, ... FROM test_select.table_b) UNION ALL
...
It is taking 2289 µs. It means more than 3 times the previous query measurement.
How to reproduce
Tested with different versions with similar results. From 21.13.1.1 to 22.2.1.1.
Table creation
DROP DATABASE IF EXISTS test_select;
CREATE DATABASE test_select;
CREATE TABLE test_select.table_a (`a1` Int32, `a2` Int32, `a3` Int32, `a4` Int32) Engine=MergeTree() order by a1;
CREATE TABLE test_select.table_b (`b1` Int32, `b2` Int32, `b3` Int32, `b4` Int32) Engine=MergeTree() order by b1;
CREATE TABLE test_select.table_c (`c1` Int32, `c2` Int32, `c3` Int32, `c4` Int32) Engine=MergeTree() order by c1;
CREATE TABLE test_select.table_d (`d1` Int32, `d2` Int32, `d3` Int32, `d4` Int32) Engine=MergeTree() order by d1;
CREATE TABLE test_select.table_e (`e1` Int32, `e2` Int32, `e3` Int32, `e4` Int32) Engine=MergeTree() order by e1;
CREATE TABLE test_select.table_f (`f1` Int32, `f2` Int32, `f3` Int32, `f4` Int32) Engine=MergeTree() order by f1;
CREATE TABLE test_select.table_g (`g1` Int32, `g2` Int32, `g3` Int32, `g4` Int32) Engine=MergeTree() order by g1;
CREATE TABLE test_select.table_h (`h1` Int32, `h2` Int32, `h3` Int32, `h4` Int32) Engine=MergeTree() order by h1;Query 1 - Not additional wrappers
SELECT a1, a2, a3, a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS d1, 0 AS d2, 0 AS d3, 0 AS d4, 0 AS e1, 0 AS e2, 0 AS e3, 0 AS e4, 0 AS f1, 0 AS f2, 0 AS f3, 0 AS f4, 0 AS g1, 0 AS g2, 0 AS g3, 0 AS g4, 0 AS h1, 0 AS h2, 0 AS h3, 0 AS h4 FROM test_select.table_a UNION ALL
SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, b1, b2, b3, b4, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS d1, 0 AS d2, 0 AS d3, 0 AS d4, 0 AS e1, 0 AS e2, 0 AS e3, 0 AS e4, 0 AS f1, 0 AS f2, 0 AS f3, 0 AS f4, 0 AS g1, 0 AS g2, 0 AS g3, 0 AS g4, 0 AS h1, 0 AS h2, 0 AS h3, 0 AS h4 FROM test_select.table_b UNION ALL
SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, c1, c2, c3, c4, 0 AS d1, 0 AS d2, 0 AS d3, 0 AS d4, 0 AS e1, 0 AS e2, 0 AS e3, 0 AS e4, 0 AS f1, 0 AS f2, 0 AS f3, 0 AS f4, 0 AS g1, 0 AS g2, 0 AS g3, 0 AS g4, 0 AS h1, 0 AS h2, 0 AS h3, 0 AS h4 FROM test_select.table_c UNION ALL
SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, d1, d2, d3, d4, 0 AS e1, 0 AS e2, 0 AS e3, 0 AS e4, 0 AS f1, 0 AS f2, 0 AS f3, 0 AS f4, 0 AS g1, 0 AS g2, 0 AS g3, 0 AS g4, 0 AS h1, 0 AS h2, 0 AS h3, 0 AS h4 FROM test_select.table_d UNION ALL
SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS d1, 0 AS d2, 0 AS d3, 0 AS d4, e1, e2, e3, e4, 0 AS f1, 0 AS f2, 0 AS f3, 0 AS f4, 0 AS g1, 0 AS g2, 0 AS g3, 0 AS g4, 0 AS h1, 0 AS h2, 0 AS h3, 0 AS h4 FROM test_select.table_e UNION ALL
SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS d1, 0 AS d2, 0 AS d3, 0 AS d4, 0 AS e1, 0 AS e2, 0 AS e3, 0 AS e4, f1, f2, f3, f4, 0 AS g1, 0 AS g2, 0 AS g3, 0 AS g4, 0 AS h1, 0 AS h2, 0 AS h3, 0 AS h4 FROM test_select.table_f UNION ALL
SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS d1, 0 AS d2, 0 AS d3, 0 AS d4, 0 AS e1, 0 AS e2, 0 AS e3, 0 AS e4, 0 AS f1, 0 AS f2, 0 AS f3, 0 AS f4, g1, g2, g3, g4, 0 AS h1, 0 AS h2, 0 AS h3, 0 AS h4 FROM test_select.table_g UNION ALL
SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS d1, 0 AS d2, 0 AS d3, 0 AS d4, 0 AS e1, 0 AS e2, 0 AS e3, 0 AS e4, 0 AS f1, 0 AS f2, 0 AS f3, 0 AS f4, 0 AS g1, 0 AS g2, 0 AS g3, 0 AS g4, h1, h2, h3, h4 FROM test_select.table_h;Query 2 - With additional (SELECT * FROM x) for each UNION subquery <- Much more slower
SELECT * FROM (SELECT a1, a2, a3, a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS d1, 0 AS d2, 0 AS d3, 0 AS d4, 0 AS e1, 0 AS e2, 0 AS e3, 0 AS e4, 0 AS f1, 0 AS f2, 0 AS f3, 0 AS f4, 0 AS g1, 0 AS g2, 0 AS g3, 0 AS g4, 0 AS h1, 0 AS h2, 0 AS h3, 0 AS h4 FROM test_select.table_a) UNION ALL
SELECT * FROM (SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, b1, b2, b3, b4, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS d1, 0 AS d2, 0 AS d3, 0 AS d4, 0 AS e1, 0 AS e2, 0 AS e3, 0 AS e4, 0 AS f1, 0 AS f2, 0 AS f3, 0 AS f4, 0 AS g1, 0 AS g2, 0 AS g3, 0 AS g4, 0 AS h1, 0 AS h2, 0 AS h3, 0 AS h4 FROM test_select.table_b) UNION ALL
SELECT * FROM (SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, c1, c2, c3, c4, 0 AS d1, 0 AS d2, 0 AS d3, 0 AS d4, 0 AS e1, 0 AS e2, 0 AS e3, 0 AS e4, 0 AS f1, 0 AS f2, 0 AS f3, 0 AS f4, 0 AS g1, 0 AS g2, 0 AS g3, 0 AS g4, 0 AS h1, 0 AS h2, 0 AS h3, 0 AS h4 FROM test_select.table_c) UNION ALL
SELECT * FROM (SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, d1, d2, d3, d4, 0 AS e1, 0 AS e2, 0 AS e3, 0 AS e4, 0 AS f1, 0 AS f2, 0 AS f3, 0 AS f4, 0 AS g1, 0 AS g2, 0 AS g3, 0 AS g4, 0 AS h1, 0 AS h2, 0 AS h3, 0 AS h4 FROM test_select.table_d) UNION ALL
SELECT * FROM (SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS d1, 0 AS d2, 0 AS d3, 0 AS d4, e1, e2, e3, e4, 0 AS f1, 0 AS f2, 0 AS f3, 0 AS f4, 0 AS g1, 0 AS g2, 0 AS g3, 0 AS g4, 0 AS h1, 0 AS h2, 0 AS h3, 0 AS h4 FROM test_select.table_e) UNION ALL
SELECT * FROM (SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS d1, 0 AS d2, 0 AS d3, 0 AS d4, 0 AS e1, 0 AS e2, 0 AS e3, 0 AS e4, f1, f2, f3, f4, 0 AS g1, 0 AS g2, 0 AS g3, 0 AS g4, 0 AS h1, 0 AS h2, 0 AS h3, 0 AS h4 FROM test_select.table_f) UNION ALL
SELECT * FROM (SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS d1, 0 AS d2, 0 AS d3, 0 AS d4, 0 AS e1, 0 AS e2, 0 AS e3, 0 AS e4, 0 AS f1, 0 AS f2, 0 AS f3, 0 AS f4, g1, g2, g3, g4, 0 AS h1, 0 AS h2, 0 AS h3, 0 AS h4 FROM test_select.table_g) UNION ALL
SELECT * FROM (SELECT 0 AS a1, 0 AS a2, 0 AS a3, 0 AS a4, 0 AS b1, 0 AS b2, 0 AS b3, 0 AS b4, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS d1, 0 AS d2, 0 AS d3, 0 AS d4, 0 AS e1, 0 AS e2, 0 AS e3, 0 AS e4, 0 AS f1, 0 AS f2, 0 AS f3, 0 AS f4, 0 AS g1, 0 AS g2, 0 AS g3, 0 AS g4, h1, h2, h3, h4 FROM test_select.table_h);Expected performance
Of course, by adding more subqueries I would expect an increase in interpretation time, but IMHO 3x is too much.
Additional context
We face it in a real scenario, with a much more complex query, we have tried to isolate the problem with this simpler example. In the real query we have the same 3.1x relationship but suffering a time increase from 100 to 150ms.