ARROW-10240: [Rust] Optionally load data into memory before running benchmark query#8409
ARROW-10240: [Rust] Optionally load data into memory before running benchmark query#8409jhorstmann wants to merge 4 commits intoapache:masterfrom
Conversation
rust/benchmarks/src/bin/tpch.rs
Outdated
| file_format: String, | ||
|
|
||
| /// Load the data into a MemTable before executing the query | ||
| #[structopt(short = "l", long = "load")] |
There was a problem hiding this comment.
There is probably a better/clearer name for this parameter
rust/benchmarks/src/bin/tpch.rs
Outdated
| }; | ||
|
|
||
| if opt.load { | ||
| let memtable = MemTable::load(tableprovider.as_ref()).await?; |
There was a problem hiding this comment.
This is just a nit but it would be nice to have some printlns here showing that the data is loading, and how long it takes
There was a problem hiding this comment.
That's a very good idea
|
The results are pretty interesting for me. Without With I filed https://issues.apache.org/jira/browse/ARROW-10251 to fix the single-threaded loading in MemTable but I'm not sure why the actual query time is slower for mem tables than for Parquet. |
|
That's indeed interesting. Could the issue actually be the batch size? Seems the MemTable::scan method ignores the batch size parameter and instead uses the hardcoded one used for loading. |
|
It's looking much better now 🚀 |
No description provided.