Skip to content

Pick the correct partition when partition key is a function of a column used in condition #11796

@filimonov

Description

@filimonov
DROP TABLE users;
create table users (
    userid String
)
Engine = MergeTree
ORDER BY tuple() 
PARTITION BY cityHash64(userid) % 64;

INSERT INTO users SELECT hex(number) FROM numbers(10000000);
OPTIMIZE TABLE users FINAL;

select count() from users where userid = hex(1000);

... Selected 64 parts by date, 64 parts by key, 1278 marks to read from 64 ranges

select count() from users where identity(userid) = hex(1000);

... Selected 64 parts by date, 64 parts by key, 1278 marks to read from 64 ranges

The desired effect is like that:

DROP TABLE users;
create table users (
    userid String
)
Engine = MergeTree
ORDER BY tuple() 
PARTITION BY substring(userid,1,1) ;

INSERT INTO users SELECT hex(number) FROM numbers(10000000);

OPTIMIZE TABLE users FINAL;

select count() from users where userid = hex(1000);

... Selected 1 parts by date, 1 parts by key, 121 marks to read from 1 ranges

select count() from users where identity(userid) = hex(1000);

... Selected 16 parts by date, 16 parts by key, 1229 marks to read from 16 ranges

Loosely coupled with
#10220
#7948

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions