Add datafusion-python #69
Conversation
Codecov Report
@@ Coverage Diff @@
## master #69 +/- ##
==========================================
- Coverage 76.43% 75.77% -0.67%
==========================================
Files 135 142 +7
Lines 23264 23467 +203
==========================================
Hits 17782 17782
- Misses 5482 5685 +203
Continue to review full report at Codecov.
|
|
Thank you @jorgecarleitao I am really excited to see this and would love to see this merged into arrow-datafusion. |
|
Some notes:
|
| @@ -0,0 +1,72 @@ | |||
| name: Build | |||
There was a problem hiding this comment.
The tag release probably won't work in the context of an ASF repo anymore?
There was a problem hiding this comment.
Yeap, we will need to work out a packaging; the build of the wheels is imo still relevant, as it is not so easy in Rust (afai understood support for this is still a bit WIP). Building the manylinux was a feat.
|
|
||
| ```bash | ||
| pip install datafusion | ||
| ``` |
There was a problem hiding this comment.
Adding here as a suggestion but I'll take a look at packaging it as a conda package. I'll cc you on the PR once I got a bit working.
| ``` |
or via conda/mamba:
conda install -c conda-forge datafusion
mamba install -c conda-forge datafusion
There was a problem hiding this comment.
@xhochy
If you want you can ping me on the staged-recipes PR, once you create it. I was just reading up on the state of arrow vs. rust, and was surprised that datafusion isn't yet in conda-forge. ;-)
| @@ -0,0 +1,98 @@ | |||
| import unittest | |||
There was a problem hiding this comment.
Out of curiosity: Why not pytest?
There was a problem hiding this comment.
it comes with python, so no need to install other stuff. But no feelings here; we can refactor this whole thing. =)
|
Pushed the license and also hopefully fixed the CI. |
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: Uwe L. Korn <[email protected]>
|
Ok, I have now fixed the CI, pushed the license headers, and bumped to latest datafusion. There was a regression, documented in #226. Once we fix the regression, this can be released in pypi as 0.2.2 since there were no backward incompatible changes on it 🎉 |
|
@alamb @Dandandan Any objection to merging this PR? |
|
@andygrove please go ahead 🚀 |
alamb
left a comment
There was a problem hiding this comment.
No objections to merging from me. I skimmed it quickly and all seems good.
I think a significant investment in documentation will be needed for this code, but it seems like a good start to me
|
I hate to be a nuisance, but didn't this need to go through IP clearance? |
We can revert if this is the case, but because Jorge was the only contributor (except for one contribution fixing a typo in a README) this didn't seem to be required in this case? |
|
Probably best to check with general@incubator to determine the preferred protocol in this situation. I don't want to subject you to unneeded process, but would be good to go by the book |
|
@wesm thank you. Not a nuisance at all, it is important to have this done correctly. The rational here: I hold the copyright over the whole code base, except for a 1 word typo fix on the README. The code was MIT licensed on jorgecarleitao/python-datafusion. As part of this PR, I pushed a commit that added the license headers to every file in the source code. As copyright holder, I thereby licensed all this code to ASF under the ICA. |
This reverts commit 46bde0b.
|
PR to revert: #257 |
|
Thanks, I'm not enough of an expert to know what is the correct protocol, a vote may not be needed at all but let's double check |
|
Yeah, my interpretation was that since @jorgecarleitao authored this code, I was treating this as "just a normal PR" (it happens to have lived somewhere else for a while but from an IP provenance perspective it seemed no different to a normal PR to me). However, I am not an expert in such matters. |
* downgrade substrait (cherry picked from commit 40242b4) * downgrade prost (cherry picked from commit 3ae6613) * downgrade prost for ffi (cherry picked from commit 42c8585) * Fix clippy warning --------- Co-authored-by: Ahmed Mezghani <[email protected]>
This is a PR with the source code of python-datafusion, currently available at https://github.com/jorgecarleitao/datafusion-python and released in pypi as datafusion.
The goal of this PR is to gauge interest of moving that code base closer to datafusion and to within ASF.
Some notes: