-
Notifications
You must be signed in to change notification settings - Fork 4k
Add support for Pandas 2.0 #6378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tconkling
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with all the changes except for some hesitation around the unpinning. My vote would be to merge all these other changes, and submit a second "unpin pandas" PR with the setup.py changes that we discuss as a team at the next standup.
lib/setup.py
Outdated
| "numpy", | ||
| "packaging>=14.1", | ||
| "pandas<2,>=0.25", | ||
| "pandas>=0.25", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing I'm concerned about is that, if we unpin pandas < 2, we may introduce pandas 2.0-only changes into the codebase, and break Streamlit for anyone running old pandas. (Of course: we may already be doing this by claiming to support pandas < 1).
How confident are we that we won't do this, in the absence of automated tests that run against multiple versions of pandas? And a corollary: should we run automated tests against multiple versions of pandas before we remove the "pandas < 2" requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are checking with our Python min tests for older Pandas versions since it's still running on Python 3.7, and Pandas 2.x will not be available for 3.7. But this is only a side-effect of us supporting a mostly outdated Python version. I think we should prioritize implementing a way for automated tests with older versions of our dependency. We probably could get a simple approach implemented without too much effort.
How confident are we that we won't do this
I'm confident that I will not integrate new Pandas stuff, but there is of course a risk (but already even with Pandas 0.25 vs 1.0), and not all changes are obvious in Pandas.
|
@tconkling I added the pin for the major version of Pandas again. I think we can also just remove this in a later release since Pandas 2.0 is anyways not released yet. |
lib/setup.py
Outdated
| "numpy", | ||
| "packaging>=14.1", | ||
| "pandas<2,>=0.25", | ||
| "pandas==1.0.*", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be pandas>=1.0, <2? (Also, any concerns about "dropping" support for old Pandas?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep :) just did some quick tests in this branch with older Pandas versions if we can get it running with our CI. But it already fails with installation :( It should be corrected again (= "pandas<2,>=0.25")
* develop: StreamlitEndpoints.buildAppPageURL (streamlit#6386) Rename query param ?testing to ?_stcore_testing (streamlit#6392) supress warning when call `get_script_run_ctx` from `gather_metrics` decorator (streamlit#6384) Improve typing annotations for file_uploader and text_input (streamlit#6371) Add support for Pandas 2.0 (streamlit#6378) Remove UriUtil.buildMediaUri (streamlit#6379) Allow pytest unit test to be discoverable via IDE v2 (streamlit#6374) Allow session state to only allow serializable items (streamlit#6165)
📚 Context
This PR prepares Streamlit to work with all Pandas versions from 0.25-2.0.
Pandas 2.0 is being released soon, which also comes with many breaking changes. I have investigated how well Streamlit works together with Pandas 2.0. In summary, functionality-wise, the breaking changes only slightly impact the legacy dataframe. But I was able to resolve this with a simple change. Furthermore, our tests would not have been able to run with Pandas 2.0, which also gets fixed in this PR.
Note: The final state of this PR currently does not run with Pandas 2.0 since it is only available as a release candidate. But it was extensively tested with
2.0.0rc1during all of the commits of this PR.🧠 Description of Changes
Float64Index,UInt64IndexandInt64Index. These are deprecated indices that got removed in Pandas 2.0.🧪 Testing Done
Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.