Skip to content

AnyHash of arrays or tuples of nullable values #48365

@al13n321

Description

@al13n321

Currently these don't work:

SELECT cityHash64([1, null])
SELECT cityHash64((1, null))

In contrast, these work and return null:

SELECT cityHash64(null)
SELECT cityHash64(1, null)

What should the behavior be for arrays and tuples of nullables? Options:
(a) Treat nulls as values, with some hash. Say that hash of null is some constant (chosen at random).
(b) Null if any element is null.

(b) seems more consistent with typical behavior of nulls in SQL, like cityHash64(1, null) being null.

But (a) would be useful in practice for hashing whole tables or query results:

SELECT sum(cityHash64(tuple(*))) FROM (SELECT ...)

With (b), this would calculate a hash of the exact full results of the query, invariant to the order of rows (which is usually nondeterministic when using more than one thread). With (a), it can return the same hash for different data. E.g. if one column is always null, the whole sum of hashes will always be null, even if other columns contain actual nontrivial data.

Which one should we do? (a), (b), or something else?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions