Blog: Apache DataFusion is now the fastest single node engine for querying Apache Parquet files#33
Conversation
…rying Apache Parquet files
alamb
left a comment
There was a problem hiding this comment.
fyi @Weijun-H @dharanad @Lordworms, @goldmedal @wiedld, @tlm365 @my-vegetable-has-exploded @doupache, @jayzhan211, @xinlifoobar, @Kev1n8
@tshauck, @austin362667, @demetribu, @PsiACE, @devanbenz, @thinh2, @Omega359 @XiangpengHao, @ariesdevil, @tustvold , @RinChanNOWW, @a10y @Dandandan @viirya @itsjunetime, @eejbyfeldt and @Rachelint
@korowa @pmcgleenon
I mentioned you and your work in this blog post -- thank you again 🙏
For names, I copy/pasted whatever was publically available on your github profiles. If you would like different names / attributions (or none at all) please propose a change 🙏
Also, if you remember others who should be on this list, please let me know
| a challenge!), and we have subsequently rallied to steadily improve the | ||
| performance release on release as shown in Figure 2. | ||
|
|
||
| [Mehmet Ozan Kabak]: https://www.linkedin.com/in/mehmet-ozan-kabak/) |
_posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
Outdated
Show resolved
Hide resolved
_posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
Outdated
Show resolved
Hide resolved
_posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
Outdated
Show resolved
Hide resolved
_posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
Outdated
Show resolved
Hide resolved
_posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Bruce Ritchie <[email protected]> Co-authored-by: Patrick McGleenon <[email protected]>
|
Thats greatest news. Congrats! |
|
For the ClickBench run for DataFusion what is the |
I don't think the scripts change the default setting -- the scripts used are here: https://github.com/ClickHouse/ClickBench/tree/main/datafusion Here is the PR to update for 43.0.0: ClickHouse/ClickBench#251 |
_posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
Outdated
Show resolved
Hide resolved
_posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
Outdated
Show resolved
Hide resolved
_posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Alex Huang <[email protected]> Co-authored-by: Patrick McGleenon <[email protected]> Co-authored-by: Jay Zhan <[email protected]>
…ite into alamb/clickbench_blog
_posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
Outdated
Show resolved
Hide resolved
…bench.md Co-authored-by: Tai Le Manh <[email protected]>
|
|
||
| # Rallying The Community around Performance | ||
|
|
||
| In July, 2024 [Mehmet Ozan Kabak], CEO of [Synnada], [called on the community to |
There was a problem hiding this comment.
Is this central to this post? I don't mean to "discredit" the call by any means, but I'm not sure the work described in this post was driven by this comment?
There was a problem hiding this comment.
I certainly was inspired by the comment to help me focus where I spent my time reviewing PRs and helping push them through, though it is a good point that this may imply it motivated others as well, when I don't really know what did.
Perhaps we could rephrase the motivation with something like this?
"Performance has long been a focus for DataFusion: one of the core benefits of DataFusion is its core performance, which both excites contributors and attracts users. There seems to have been a renewed focus on performance recently, including a call in July 2024 from Mehmet ....?
…ite into alamb/clickbench_blog
|
Thank you to everyone who reviewed this PR. I plan to merge / publish it later today unless there are any other comments |
|
Amazing work Andrew! You and all of the DataFusion contributors should be incredibly proud of this accomplishment. |
|
Let's get this published to the world |

Let's celebrate the accomplishment of getting to the top of the ClickBench leaderboard