Skip to content

sumMap values return strangely huge number when query on distributed table with long data range  #35359

@dongxiao-yang

Description

@dongxiao-yang

Hi , I'm trying to use sumMap function to do a 'word count' compute but find something strange.

The data field formart looks like below , switch_pcpBuckets1Min Array(Int32) represent which minute user is active druing this hour.

SELECT
    interval_startTimeMs,
    switch_pcpBuckets1Min
FROM test_all
LIMIT 10

Query id: 8aa42523-a6aa-4d1f-8f52-6bf62d3ed1a7

┌────interval_startTimeMs─┬─switch_pcpBuckets1Min───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 2022-03-12 02:00:00.000 │ [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]                                                                                                                      │
│ 2022-03-12 02:00:00.000 │ []                                                                                                                                                                          │
│ 2022-03-12 02:00:00.000 │ [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59] │
│ 2022-03-12 02:00:00.000 │ []                                                                                                                                                                          │
│ 2022-03-12 02:00:00.000 │ []                                                                                                                                                                          │
│ 2022-03-12 02:00:00.000 │ [31,32,33,34,35,36,37,38,39,40]                                                                                                                                             │
│ 2022-03-12 02:00:00.000 │ [1,2,3,4,5,6,7,8]                                                                                                                                                           │
│ 2022-03-12 02:00:00.000 │ [0,1]                                                                                                                                                                       │
│ 2022-03-12 02:00:00.000 │ [24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59]                                                               │
│ 2022-03-12 02:00:00.000 │ [49,50,51,52,53,54,55,56,57,58,59]                                                                                                                                          │
└─────────────

I have 3 ck nodes with local table on each , and one distributed table base on them. The data distribution between is cluster is like below , each node stored 8 hours data every day :

ck1: day1.00,day1.03,day1.06,day1.09....day1.18,day1.21,day2.00,day2.03,....day2.18,day2.21.......
ck2: day1.01,day1.04,day1.07,day1.10....day1.19,day1.22,day2.01,day2.04,....day2.18,day2.22.......
ck3: day1.02,day1.05,day1.08,day1.11....day1.20,day1.23,day2.02,day2.05,....day2.18,day2.23.......

when I apply sumMap (cast(arrayMap(x -> (x,1) , switch_pcpBuckets1Min) , 'Map(UInt8,Int64)')) funtion on distributed table I got some strange values

SELECT
    interval_startTimeMs,
    sumMap(CAST(arrayMap(x -> (x, 1), switch_pcpBuckets1Min), 'Map(UInt8,Int64)'))
FROM test_all
WHERE notEmpty(switch_pcpBuckets1Min) AND (interval_startTimeMs >= '2022-03-13 15:00:00.000') AND (interval_startTimeMs < '2022-03-15 11:00:00.000')
GROUP BY interval_startTimeMs
ORDER BY interval_startTimeMs ASC

result
image

or

........
│ 2022-03-14 13:00:00.000 │ {0:1888498,1:1806408,2:1738233,3:1714562,4:1716166,5:1718846,6:1721077,7:1723416,8:1726652,9:1729414,10:1732370,11:1735659,12:1739089,13:1743080,14:1746563,15:1750973,16:1752769,17:1756472,18:1759138,19:1762198,20:1769251,21:1767879,22:1770599,23:1775166,24:1776618,25:1779658,26:1782262,27:1785372,28:1788181,29:1791996,30:1806137,31:1828201,32:1799067,33:1798101,34:1799876,35:1803645,36:1806954,37:1809457,38:1812673,39:1814831,40:1818984,41:1821903,42:1824868,43:1826708,44:1829861,45:1832978,46:1836875,47:1835713,48:1837610,49:1838824,50:1840808,51:1843022,52:1846121,53:1849784,54:1851913,55:1854833,56:1857469,57:1859817,58:1859672,59:1857749} │
│ 2022-03-14 14:00:00.000 │ {0:1973499,1:2038673,2:1889800,3:1866611,4:1864982,5:1868397,6:1870355,7:1873609,8:1878854,9:1878353,10:1884231,11:1886529,12:1887418,13:1890487,14:1894295,15:1896781,16:1899235,17:1901512,18:1906641,19:1907196,20:1910120,21:1912812,22:1915795,23:1917535,24:1921423,25:1924899,26:1927412,27:1929536,28:1931833,29:1933543,30:1969748,31:1949704,32:1937275,33:1937204,34:1939046,35:1943434,36:1948582,37:1951652,38:1951330,39:1953289,40:1956900,41:1958959,42:1960718,43:1963856,44:1966952,45:1970839,46:1969867,47:1972716,48:1974749,49:1974589,50:1977219,51:1979250,52:1981908,53:1983727,54:1984699,55:1986857,56:1988738,57:1990400,58:1990058,59:1990354} │
│ 2022-03-14 15:00:00.000 │ {0:1374179596971150604,1:216736832572504376,2:3399704436437297448,3:1663540288323457296,4:4267786510494217524,5:2531622362380377372,6:795458214266537220,7:3399704436437297448,8:4267786510494217524,9:1663540288323457296,10:3110343745084990756,11:1952900979675763988,12:2242261671028070680,13:1663540288323457296,14:1084818905618843912,15:506097522914230528,16:4267786510494217524,17:3689065127789604140,18:3110343745084990756,19:2531622362380377372,20:1952900979675763988,21:1374179596971150604,22:795458214266537220,23:795458214266537220,24:216736832572504376,25:3978425819141910832,26:506097522914230528,27:3110343745084990756,28:1374179596971150604,29:3978425819141910832,30:2242261671028070680,31:506097522914230528,32:216736832572504376,33:3978425819141910832,34:3399704436437297448,35:2820983053732684064,36:2242261671028070680,37:2242261671028070680,38:1663540288323457296,39:1084818905618843912,40:506097522914230528,41:4267786510494217524,42:3689065127789604140,43:3110343745084990756,44:3399704436437297448,45:2820983053732684064,46:2531622362380377372,47:1952900979675763988,48:795458214266537220,49:3978425819141910832,50:1084818905618843912,51:3689065127789604140,52:1952900979675763988,53:216736832572504376,54:2820983053732684064,55:3689065127789604140,56:1084818905618843912,57:2531622362380377372,58:1374179596971150604,59:2820983053732684064} │
│ 2022-03-14 16:00:00.000 │ {0:2262290,1:2349170,2:2174879,3:2134685,4:2131627,5:2136194,6:2138389,7:2142454,8:2144920,9:2148797,10:2156529,11:2159239,12:2164067,13:2168475,14:2172825,15:2176691,16:2180780,17:2183832,18:2190913,19:2191753,20:2197542,21:2200060,22:2202295,23:2206060,24:2210022,25:2213551,26:2216838,27:2219448,28:2223532,29:2225276,30:2247946,31:2290295,32:2230129,33:2229426,34:2230748,35:2235986,36:2237546,37:2240905,38:2244287,39:2246736,40:2249971,41:2252959,42:2257151,43:2261020,44:2264851,45:2269398,46:2267360,47:2273254,48:2271068,49:2273006,50:2273937,51:2275677,52:2278537,53:2281759,54:2285372,55:2290137,56:2288032,57:2289823,58:2290632,59:2281998} │
│ 2022-03-14 17:00:00.000 │ {0:2342135,1:2470707,2:2358015,3:2286061,4:2282772,5:2282175,6:2286603,7:2289242,8:2292591,9:2294802,10:2299370,11:2302519,12:2305614,13:2311185,14:2314095,15:2321138,16:2322415,17:2324769,18:2328666,19:2331253,20:2341946,21:2338189,22:2341113,23:2344412,24:2347824,25:2351667,26:2354843,27:2355172,28:2358307,29:2357599,30:2389072,31:2397644,32:2362451,33:2357039,34:2357853,35:2361892,36:2364202,37:2366512,38:2368998,39:2371447,40:2373504,41:2372957,42:2380379,43:2380959,44:2381836,45:2384311,46:2384152,47:2384083,48:2384127,49:2384485,50:2387646,51:2387461,52:2386520,53:2390678,54:2391310,55:2395460,56:2391889,57:2391053,58:2388713,59:2378835}
..........

As you can see from the screenshot , the result of hour 2022-03-14 15:00:00.000 values are strangely huge like 4267786510494217524 , 3689065127789604140 . But when I narrow down search range from 2 days to 3 hours

SELECT
    interval_startTimeMs,
    sumMap(CAST(arrayMap(x -> (x, 1), switch_pcpBuckets1Min), 'Map(UInt8,Int64)'))
FROM test_all
WHERE notEmpty(switch_pcpBuckets1Min) AND (interval_startTimeMs >= '2022-03-14 13:00:00.000') AND (interval_startTimeMs < '2022-03-14 16:00:00.000')
GROUP BY interval_startTimeMs
ORDER BY interval_startTimeMs ASC

Query id: f0619e61-a3bd-410c-b363-c827250e9f75

┌────interval_startTimeMs─┬─sumMap(CAST(arrayMap(lambda(tuple(x), tuple(x, 1)), switch_pcpBuckets1Min), 'Map(UInt8,Int64)'))───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 2022-03-14 13:00:00.000 │ {0:1888498,1:1806408,2:1738233,3:1714562,4:1716166,5:1718846,6:1721077,7:1723416,8:1726652,9:1729414,10:1732370,11:1735659,12:1739089,13:1743080,14:1746563,15:1750973,16:1752769,17:1756472,18:1759138,19:1762198,20:1769251,21:1767879,22:1770599,23:1775166,24:1776618,25:1779658,26:1782262,27:1785372,28:1788181,29:1791996,30:1806137,31:1828201,32:1799067,33:1798101,34:1799876,35:1803645,36:1806954,37:1809457,38:1812673,39:1814831,40:1818984,41:1821903,42:1824868,43:1826708,44:1829861,45:1832978,46:1836875,47:1835713,48:1837610,49:1838824,50:1840808,51:1843022,52:1846121,53:1849784,54:1851913,55:1854833,56:1857469,57:1859817,58:1859672,59:1857749} │
│ 2022-03-14 14:00:00.000 │ {0:1973499,1:2038673,2:1889800,3:1866611,4:1864982,5:1868397,6:1870355,7:1873609,8:1878854,9:1878353,10:1884231,11:1886529,12:1887418,13:1890487,14:1894295,15:1896781,16:1899235,17:1901512,18:1906641,19:1907196,20:1910120,21:1912812,22:1915795,23:1917535,24:1921423,25:1924899,26:1927412,27:1929536,28:1931833,29:1933543,30:1969748,31:1949704,32:1937275,33:1937204,34:1939046,35:1943434,36:1948582,37:1951652,38:1951330,39:1953289,40:1956900,41:1958959,42:1960718,43:1963856,44:1966952,45:1970839,46:1969867,47:1972716,48:1974749,49:1974589,50:1977219,51:1979250,52:1981908,53:1983727,54:1984699,55:1986857,56:1988738,57:1990400,58:1990058,59:1990354} │
│ 2022-03-14 15:00:00.000 │ {0:2143412,1:2148294,2:2015013,3:1992616,4:1989841,5:1993092,6:1995793,7:1995933,8:1998535,9:2000326,10:2002517,11:2006287,12:2009980,13:2013617,14:2016581,15:2022306,16:2021699,17:2025477,18:2027663,19:2029937,20:2032646,21:2035411,22:2037660,23:2040148,24:2042953,25:2045425,26:2050923,27:2051131,28:2053060,29:2053878,30:2077570,31:2079524,32:2061122,33:2060250,34:2065343,35:2068150,36:2068601,37:2071234,38:2073857,39:2076418,40:2080423,41:2082912,42:2085963,43:2092586,44:2092529,45:2096102,46:2096640,47:2098708,48:2100604,49:2102188,50:2105055,51:2106478,52:2109066,53:2113995,54:2120969,55:2121535,56:2124629,57:2126631,58:2129707,59:2124955} │
└─────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

3 rows in set. Elapsed: 0.257 sec. Processed 31.75 million rows, 1.90 GB (123.41 million rows/s., 7.38 GB/s.)

The result of hour 2022-03-14 15:00:00.000 change back to right value.

The sql of create table looks like

CREATE TABLE test ()
...
...
 interval_startTimeMs DateTime64,
switch_pcpBuckets1Min Array(Int32),
...
)ENGINE = MergeTree
PARTITION BY toYYYYMMDD(interval_startTimeMs)
ORDER BY (interval_startTimeMs....

ClickHouse version :
SELECT version()

Query id: f03f4927-6719-4a92-829a-4732b0657a84

┌─version()─┐
│ 22.2.2.1 │
└───────────┘

For now I'm thinking this as a bug for sumMap , not sure if I'm right.
Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed user-visible misbehaviour in official releasecomp-map-datatypeMap datatype implementation and semantics.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions