Skip to content

Bloom filter support Map with key of String or FixedString type#28511

Merged
kitaisreal merged 8 commits intoClickHouse:full-text-bloom-filter-index-map-data-typefrom
lingtaolf:feature/map_bloom_filter
Sep 21, 2021
Merged

Bloom filter support Map with key of String or FixedString type#28511
kitaisreal merged 8 commits intoClickHouse:full-text-bloom-filter-index-map-data-typefrom
lingtaolf:feature/map_bloom_filter

Conversation

@lingtaolf
Copy link
Copy Markdown
Contributor

@lingtaolf lingtaolf commented Sep 2, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

tokenbf_v1 and ngram support Map with key of String of FixedSring type. It enhance data skipping in query with map key filter.

CREATE TABLE map_tokenbf (
    row_id UInt32, 
    map Map(String, String),
    INDEX map_tokenbf map TYPE ngrambf_v1(4,256,2,0) GRANULARITY 1
    ) 
Engine=MergeTree() 
Order by id

With table above, the query select * from map_tokebf where map['K']='V' will skip the granule that doesn't contain key A . Of course, how many rows will skipped is depended on the granularity and index_granularity you set.

benchmark
Use our prod data to test.
rows on single server:497500000

query with tokenbf skipping index:

queries 10, QPS: 3.619, RPS: 104465928.009, MiB/s: 9927.368, result RPS: 3.619, result MiB/s: 0.000.

0.000%      0.219 sec.
10.000%     0.231 sec.
20.000%     0.247 sec.
30.000%     0.250 sec.
40.000%     0.277 sec.
50.000%     0.289 sec.
60.000%     0.289 sec.
70.000%     0.307 sec.
80.000%     0.312 sec.
90.000%     0.313 sec.
95.000%     0.319 sec.
99.000%     0.319 sec.
99.900%     0.319 sec.
99.990%     0.319 sec.

query without index:

queries 10, QPS: 0.128, RPS: 63571835.943, MiB/s: 18509.076, result RPS: 0.128, result MiB/s: 0.000.

0.000%      6.881 sec.
10.000%     7.038 sec.
20.000%     7.669 sec.
30.000%     7.695 sec.
40.000%     7.985 sec.
50.000%     8.050 sec.
60.000%     8.050 sec.
70.000%     8.155 sec.
80.000%     8.169 sec.
90.000%     8.194 sec.
95.000%     8.422 sec.
99.000%     8.422 sec.
99.900%     8.422 sec.
99.990%     8.422 sec.

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Sep 2, 2021
@kitaisreal kitaisreal self-assigned this Sep 2, 2021
@lingtaolf
Copy link
Copy Markdown
Contributor Author

@kitaisreal Hi,could you please review my code when you're free? Thanks a lot!

@kitaisreal
Copy link
Copy Markdown
Contributor

@lingtaolf could you please fix fast test CI check ?

@lingtaolf lingtaolf force-pushed the feature/map_bloom_filter branch from 2f93059 to db4fa6f Compare September 6, 2021 08:45
@lingtaolf lingtaolf force-pushed the feature/map_bloom_filter branch from db4fa6f to 6d8cce9 Compare September 6, 2021 08:47
@lingtaolf
Copy link
Copy Markdown
Contributor Author

lingtaolf commented Sep 7, 2021

@lingtaolf could you please fix fast test CI check ?

@kitaisreal Hi, there are still some checks failed. Do I need to fix them all?

@kitaisreal
Copy link
Copy Markdown
Contributor

@lingtaolf only test failures related to your code changes.

@kitaisreal kitaisreal changed the base branch from master to full-text-bloom-filter-index-map-data-type September 21, 2021 12:12
@kitaisreal kitaisreal merged this pull request into ClickHouse:full-text-bloom-filter-index-map-data-type Sep 21, 2021
@kitaisreal kitaisreal mentioned this pull request Sep 21, 2021
kitaisreal added a commit that referenced this pull request Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants