-
Notifications
You must be signed in to change notification settings - Fork 8.3k
AnyHash of arrays or tuples of nullable values #48365
Description
Currently these don't work:
SELECT cityHash64([1, null])
SELECT cityHash64((1, null))
In contrast, these work and return null:
SELECT cityHash64(null)
SELECT cityHash64(1, null)
What should the behavior be for arrays and tuples of nullables? Options:
(a) Treat nulls as values, with some hash. Say that hash of null is some constant (chosen at random).
(b) Null if any element is null.
(b) seems more consistent with typical behavior of nulls in SQL, like cityHash64(1, null) being null.
But (a) would be useful in practice for hashing whole tables or query results:
SELECT sum(cityHash64(tuple(*))) FROM (SELECT ...)
With (b), this would calculate a hash of the exact full results of the query, invariant to the order of rows (which is usually nondeterministic when using more than one thread). With (a), it can return the same hash for different data. E.g. if one column is always null, the whole sum of hashes will always be null, even if other columns contain actual nontrivial data.
Which one should we do? (a), (b), or something else?