Skip to content

Support bloom filter for any type#5678

Merged
alexey-milovidov merged 7 commits intoClickHouse:masterfrom
zhang2014:feature/bloom_filter
Jun 29, 2019
Merged

Support bloom filter for any type#5678
alexey-milovidov merged 7 commits intoClickHouse:masterfrom
zhang2014:feature/bloom_filter

Conversation

@zhang2014
Copy link
Copy Markdown
Contributor

@zhang2014 zhang2014 commented Jun 19, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Category:

  • New Feature

Detailed description:
Add bloom filter for Number, Enum, Date, DateTime, String and FixedString DataTypes. bloom filter index working on equals, notEquals, in, notIn functions.

@zhang2014 zhang2014 force-pushed the feature/bloom_filter branch from 11004d0 to a50aea0 Compare June 19, 2019 15:13
@zhang2014 zhang2014 marked this pull request as ready for review June 19, 2019 15:32
@zhang2014 zhang2014 changed the title support bloom filter for any type Support bloom filter for any type Jun 19, 2019
@zhang2014
Copy link
Copy Markdown
Contributor Author

zhang2014 commented Jun 20, 2019

Performance info:

CREATE TABLE bf_performance_test
(
    `order_key` Int64, 
    `bf_index` String, 
    `other_column_1` UInt64, 
    `other_column_2` UInt32, 
    `other_column_3` String, 
    `other_column_4` Float64, 
    INDEX idx bf_index TYPE bloom_filter GRANULARITY 1
)
ENGINE = MergeTree
ORDER BY order_key
SETTINGS index_granularity = 8192;

INSERT INTO bf_performance_test SELECT 
    number AS order_key, 
    toString(number) AS bf_index, 
    number AS other_column_1, 
    toUInt32(number) AS other_column_2, 
    toString(number) AS other_column_3, 
    toFloat64(number) AS other_column_4
FROM system.numbers 
LIMIT 1000000000;

SELECT COUNT()
FROM bf_performance_test;

┌────COUNT()─┐
│ 1000000000 │
└────────────┘

1 rows in set. Elapsed: 2.652 sec. Processed 1.00 billion rows, 4.00 GB (377.10 million rows/s., 1.51 GB/s.) 



SELECT *
FROM bf_performance_test 
WHERE bf_index = '200000';

┌─order_key─┬─bf_index─┬─other_column_1─┬─other_column_2─┬─other_column_3─┬─other_column_4─┐
│    200000200000200000200000200000200000 │
└───────────┴──────────┴────────────────┴────────────────┴────────────────┴────────────────┘

1 rows in set. Elapsed: 2.867 sec. Processed 23.61 million rows, 422.31 MB (8.24 million rows/s., 147.31 MB/s.) 

Processed rows ≈ Total rows * false positive probability(default 2.5%)

@zhangcl6066
Copy link
Copy Markdown

COOL

@andyyzh
Copy link
Copy Markdown
Contributor

andyyzh commented Jun 26, 2019

nice!

@alexey-milovidov alexey-milovidov merged commit 8221dd2 into ClickHouse:master Jun 29, 2019
@stavrolia stavrolia added the pr-feature Pull request with new product feature label Jul 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants