Conversation
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Nice, no regressions? |
Seems like the coalesce batches work is paying off ! |
d236925 to
025d411
Compare
|
I have been thinking about memory usage as well as that will be a major factor in if we can cache the predicate results. I annotated the Q22 with code to calculate memory usage of the cached results:
Q22 (some of the best performance gains): // Q22: SELECT "SearchPhrase", MIN("URL"), MIN("Title"), COUNT(*) AS c, COUNT(DISTINCT "UserID") FROM hits WHERE "Title" LIKE '%Google%' AND "URL" NOT LIKE '%.google.%' AND "SearchPhrase" <> '' GROUP BY "SearchPhrase" ORDER BY c DESC LIMIT 10;
Query {
name: "Q22",
filter_columns: vec!["Title", "URL", "SearchPhrase"],
projection_columns: vec!["SearchPhrase", "URL", "Title", "UserID"],
predicates: vec![
ClickBenchPredicate::like_Google(0),
ClickBenchPredicate::nlike_google(1),
ClickBenchPredicate::not_empty(2),
],
expected_row_count: 46,
},for hits_1.parquet, the data sizes are:
(I totally used @XiangpengHao 's https://parquet-viewer.xiangpeng.systems/ for this analysis) I will try and add some additional debugging / annotation code to see what the peak memory usage was (and if I limit it to 1MB if that will get triggered for any query) |
Seems reasonable to me. I need to think about memory handling a bit more carefully now |
|
Superceded by #7850 |

Which issue does this PR close?
TODO:
coalesce)coalesecekernel (BatchCoalescer) #7761Rationale for this change
I am working on not decoding predicate columns twice when evaluating filters in the reader
In #7513 we prototyped several APIs that we have now started making real (like
BatchCoalescer) so I made a new PR that used those APIs which doesn't have hundreds of comments.I am pleased with how it is looking now, and as before I don't really plan to merge this PR as is, I am using it as a design vehicle
What changes are included in this PR?
Are there any user-facing changes?
Not yet,