-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35506][PYTHON][INFRA] Run tests with Python 3.9 in GitHub Actions #32657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #138904 has finished for PR 32657 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
cc @ueshin, @BryanCutler @viirya FYI |
|
Test build #138921 has finished for PR 32657 at commit
|
| uses: actions/setup-python@v2 | ||
| with: | ||
| python-version: 3.9 | ||
| architecture: x64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: is it necessary to specify the architecture here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems not .. but let me just leave it for consistency with other places above, and just to be explicit.
| architecture: x64 | ||
| - name: Install Python packages (Python 3.9) | ||
| run: | | ||
| python3.9 -m pip install numpy 'pyarrow<5.0.0' pandas scipy xmlrunner plotly>=4.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional to add a new PyArrow version test coverage on Python 3.9 only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah. I should've commented here. Python 3.9 support was added from https://issues.apache.org/jira/browse/ARROW-10224, and I just tentatively tried PyArrow 4.0.0 but it worked. So I just set it to the highest working version for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @BryanCutler FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. The Arrow binary format remains the same so it's good to continue testing with the latest pyarrow.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
| # TODO(SPARK-35510): This fails with Python 3.9. We should fix and reenable it. | ||
| # self.assert_eq( | ||
| # len(psdf.quantile(q=0.5, numeric_only=True)), | ||
| # len(pdf.quantile(q=0.5, numeric_only=True)), | ||
| # ) | ||
| # self.assert_eq( | ||
| # len(psdf.quantile(q=[0.25, 0.5, 0.75], numeric_only=True)), | ||
| # len(pdf.quantile(q=[0.25, 0.5, 0.75], numeric_only=True)), | ||
| # ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this only fail with Python 3.9 on GitHub Actions? I saw we update tests for Python 3.9 before, seems this was not caught previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this test was added after we tested with Python 3.9 (as part of pandas-on-Spark).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and Koalas was not running tests against Python 3.9 due to the missing Python 3.9 support in Arrow. Seems now they support fine :-).
|
Thanks guys! Merged to master. |
…mns_should_be_discarded_if_numeric_only_is_true ### What changes were proposed in this pull request? This PR proposes to fix and reenable `test_stats_on_non_numeric_columns_should_be_discarded_if_numeric_only_is_true` that was disabled when we upgrade Python 3.9 in CI at #32657. Seems like this is because of the latest NumPy's behaviour change, see also `https://github.com/numpy/numpy/pull/16273#discussion_r641264085`. pandas inherits this behaviour but it doesn't make sense when `numeric_only` is set to `True` in pandas. I will track and follow the status of the issue between pandas and NumPy. For the time being, I propose to exclude boolean case alone in percentile/quartile test case ### Why are the changes needed? To keep the test coverage. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? I roughly locally tested. But it should pass in CI. Closes #32690 from HyukjinKwon/SPARK-35510. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
This PR enables GitHub Actions to test PySpark with Python 3.9.
Why are the changes needed?
To verify the support of Python 3.9.
Does this PR introduce any user-facing change?
No, test-only.
How was this patch tested?
Existing tests should cover.