Skip to content

Conversation

@lukasmasuch
Copy link
Collaborator

@lukasmasuch lukasmasuch commented Mar 27, 2023

📚 Context

This PR prepares Streamlit to work with all Pandas versions from 0.25-2.0.

Pandas 2.0 is being released soon, which also comes with many breaking changes. I have investigated how well Streamlit works together with Pandas 2.0. In summary, functionality-wise, the breaking changes only slightly impact the legacy dataframe. But I was able to resolve this with a simple change. Furthermore, our tests would not have been able to run with Pandas 2.0, which also gets fixed in this PR.

Note: The final state of this PR currently does not run with Pandas 2.0 since it is only available as a release candidate. But it was extensively tested with 2.0.0rc1 during all of the commits of this PR.

🧠 Description of Changes

  • Allow usage of Pandas 2.0 in setup.py.
  • Remove usage of Float64Index, UInt64Index and Int64Index. These are deprecated indices that got removed in Pandas 2.0.
  • Explicitly set RangeIndex for empty dataframes to behave similarly across all Pandas versions.

🧪 Testing Done

  • Screenshots included
  • Added/Updated unit tests
  • Added/Updated e2e tests

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

@lukasmasuch lukasmasuch marked this pull request as ready for review March 28, 2023 10:04
@lukasmasuch lukasmasuch added the security-assessment-completed Security assessment has been completed for PR label Mar 28, 2023
@lukasmasuch lukasmasuch changed the title [WIP] Add support for Pandas 2.0 Add support for Pandas 2.0 Mar 28, 2023
Copy link
Contributor

@tconkling tconkling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with all the changes except for some hesitation around the unpinning. My vote would be to merge all these other changes, and submit a second "unpin pandas" PR with the setup.py changes that we discuss as a team at the next standup.

lib/setup.py Outdated
"numpy",
"packaging>=14.1",
"pandas<2,>=0.25",
"pandas>=0.25",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I'm concerned about is that, if we unpin pandas < 2, we may introduce pandas 2.0-only changes into the codebase, and break Streamlit for anyone running old pandas. (Of course: we may already be doing this by claiming to support pandas < 1).

How confident are we that we won't do this, in the absence of automated tests that run against multiple versions of pandas? And a corollary: should we run automated tests against multiple versions of pandas before we remove the "pandas < 2" requirement?

Copy link
Collaborator Author

@lukasmasuch lukasmasuch Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are checking with our Python min tests for older Pandas versions since it's still running on Python 3.7, and Pandas 2.x will not be available for 3.7. But this is only a side-effect of us supporting a mostly outdated Python version. I think we should prioritize implementing a way for automated tests with older versions of our dependency. We probably could get a simple approach implemented without too much effort.

How confident are we that we won't do this

I'm confident that I will not integrate new Pandas stuff, but there is of course a risk (but already even with Pandas 0.25 vs 1.0), and not all changes are obvious in Pandas.

@lukasmasuch
Copy link
Collaborator Author

@tconkling I added the pin for the major version of Pandas again. I think we can also just remove this in a later release since Pandas 2.0 is anyways not released yet.

lib/setup.py Outdated
"numpy",
"packaging>=14.1",
"pandas<2,>=0.25",
"pandas==1.0.*",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be pandas>=1.0, <2? (Also, any concerns about "dropping" support for old Pandas?)

Copy link
Collaborator Author

@lukasmasuch lukasmasuch Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep :) just did some quick tests in this branch with older Pandas versions if we can get it running with our CI. But it already fails with installation :( It should be corrected again (= "pandas<2,>=0.25")

@lukasmasuch lukasmasuch merged commit a703fab into develop Mar 29, 2023
tconkling added a commit to tconkling/streamlit that referenced this pull request Mar 29, 2023
* develop:
  StreamlitEndpoints.buildAppPageURL (streamlit#6386)
  Rename query param ?testing to ?_stcore_testing (streamlit#6392)
  supress warning when call `get_script_run_ctx` from `gather_metrics` decorator (streamlit#6384)
  Improve typing annotations for file_uploader and text_input (streamlit#6371)
  Add support for Pandas 2.0 (streamlit#6378)
  Remove UriUtil.buildMediaUri (streamlit#6379)
  Allow pytest unit test to be discoverable via IDE v2 (streamlit#6374)
  Allow session state to only allow serializable items (streamlit#6165)
@sfc-gh-kbregula sfc-gh-kbregula mentioned this pull request Apr 4, 2023
5 tasks
@seanabreau seanabreau mentioned this pull request Apr 4, 2023
9 tasks
@connortann connortann mentioned this pull request Apr 18, 2023
7 tasks
@sfc-gh-kmcgrady sfc-gh-kmcgrady deleted the fix/pandas-2 branch October 5, 2023 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

security-assessment-completed Security assessment has been completed for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants