[WEBSITE] DataFusion 16.0.0 blog post#294
Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format? See also: |
|
It is a work in progress, but I think it is no coherent enough to gather some community input |
alamb
left a comment
There was a problem hiding this comment.
It would also be great to add a section to this document about planned feature work
|
|
||
| ## Community Growth | ||
|
|
||
| The three months since [our last update](https://arrow.apache.org/blog/2022/10/25/datafusion-13.0.0/) again saw significant growth in the DataFusion. |
There was a problem hiding this comment.
It would be great if someone could help clean this section up and clearly explain the growth of the community; There is a wonderful story there to tell
| closing the gap quickly. Performance highlights from the last three | ||
| months: | ||
|
|
||
| * XX% Faster Sorting and Merging using the new [Row Format](https://arrow.apache.org/blog/2022/11/07/multi-column-sorts-in-arrow-rust-part-1/) |
There was a problem hiding this comment.
@tustvold do you have any suggstions about what numbers to use here?
| * Basic filter selectivity analysis (#3868) | ||
|
|
||
|
|
||
| In the coming few months, we plan work on: |
There was a problem hiding this comment.
Feedback from the rest of the community would be great
| - Implement current_date scalar function (#4022) | ||
| - Compressed CSV/JSON support (#3642) | ||
|
|
||
| The community has also been investing in sqllogic based tests to help keep DataFusion's quality high with less work (TODO add some more detail / lnks) |
There was a problem hiding this comment.
@xudong963 I wonder if you have any thoughts on how to word this better
|
|
||
| # Substrait | ||
|
|
||
| TODO motivating introduction of substrait and why this is interesting |
There was a problem hiding this comment.
@andygrove perhaps you can help with content for the substrait area
There was a problem hiding this comment.
Substrait isn't going to make it into 16, and maybe this should be a separate post? I started a google doc https://docs.google.com/document/d/1vK0AyDBhIibmKZ2scGN3jBypBvqMPuztbBzZ1eh0dKM/edit?usp=sharing
There was a problem hiding this comment.
Well, maybe there is a chance it makes it in. Assuming the vote passes tomorrow, we could get the first PR merged 🤔
| DataFusion has basic python bindings which has the potential to expand datafusion to more end users a major missing piece are the python bindings | ||
|
|
||
|
|
||
| # python bindings and growing the community and ecosystem |
There was a problem hiding this comment.
I tried to work in a mention of the python bindings and encouraging a champion for them to step forward, however it felt more like it should be a separate post 🤔 -- @andygrove what do you think about a post describing the python bindings, why they are cool, and trying to find people to help drive that project?
There was a problem hiding this comment.
I can't figure out how to work this in so I think we should write another post -- maybe on a different site
I took the content / notes I had and put them in a google doc: https://docs.google.com/document/d/1zNfK8pIOqgHURX2lHK0JhSKTaCH3t23tYDKpGLWdFRY/edit
There was a problem hiding this comment.
Thanks. I agree. I can work on the Python post
Co-authored-by: Andy Grove <[email protected]>
|
|
||
| Growth of new systems based on as the engine in [many open source and commercial projects](https://github.com/apache/arrow-datafusion#known-uses) and was one of the early open source projects to provide this capability. | ||
|
|
||
| Several new databases built on datafusion (synnada.ai, greptimedb, probably others) |
There was a problem hiding this comment.
Here is what I am aware of:
Databases: greptimedb (new), IOx (GA)
Data platform: Synnada (new)
Use case: Backend for PRQL (relatively new?)
There was a problem hiding this comment.
Thanks -- added in ffe2e0a. Still needs polish
|
I will start contributing to this tomorrow |
|
Ok I think this one is now ready for some more review -- it is plausibly ready to publish |
…ite into alamb/datafusion_update_16
|
I plan to merge this tomorrow unless there are any other comments |
Closes apache/datafusion#4804
This blog post highlights some improvements and features in DataFusion the last 3 releases 😅
Rendered: https://arrow.apache.org/blog/2023/01/19/datafusion-16.0.0/