Skip to content

Add support for pandas 3.x#13812

Merged
lukasmasuch merged 16 commits intodevelopfrom
lukasmasuch/pandas-3-support
Mar 23, 2026
Merged

Add support for pandas 3.x#13812
lukasmasuch merged 16 commits intodevelopfrom
lukasmasuch/pandas-3-support

Conversation

@lukasmasuch
Copy link
Copy Markdown
Collaborator

@lukasmasuch lukasmasuch commented Feb 4, 2026

Describe your changes

Update pandas dependency upper bound from <3 to <4 to support pandas 3.0 and later versions. Streamlit's codebase is already fully compatible with pandas 3.x, including all breaking changes such as Copy-on-Write semantics, string dtype inference, and datetime resolution changes.

Github Issues

Testing Plan

  • All 8072 Python unit tests pass with pandas 3.0.0
  • Type checking (mypy and ty) passes
  • All lint and format checks pass
  • No code changes were needed; the existing codebase handles all pandas 3.x breaking changes

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

@lukasmasuch lukasmasuch requested a review from a team as a code owner February 4, 2026 02:22
Copilot AI review requested due to automatic review settings February 4, 2026 02:22
@snyk-io
Copy link
Copy Markdown
Contributor

snyk-io bot commented Feb 4, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 4, 2026

✅ PR preview is ready!

Name Link
📦 Wheel file https://core-previews.s3-us-west-2.amazonaws.com/pr-13812/streamlit-1.55.0-py3-none-any.whl
📦 @streamlit/component-v2-lib Download from artifacts
🕹️ Preview app pr-13812.streamlit.app (☁️ Deploy here if not accessible)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the pandas dependency to support pandas 3.x by changing the upper bound from <3 to <4. The existing Streamlit codebase is already fully compatible with all pandas 3.x breaking changes, including Copy-on-Write semantics, string dtype inference, and datetime resolution changes.

Changes:

  • Updated pandas dependency upper bound from <3 to <4 in lib/pyproject.toml

@lukasmasuch lukasmasuch added security-assessment-completed change:feature PR contains new feature or enhancement implementation impact:users PR changes affect end users labels Feb 4, 2026
@lukasmasuch
Copy link
Copy Markdown
Collaborator Author

We might want to wait for pydeck to resolve this pandas 3 issue: visgl/deck.gl#9986

@lukasmasuch lukasmasuch marked this pull request as draft February 4, 2026 12:55
@lukasmasuch lukasmasuch marked this pull request as draft February 4, 2026 12:55
@lukasmasuch lukasmasuch changed the title Support pandas 3.x [WIP] Support pandas 3.x Feb 4, 2026
@bodacious-bill
Copy link
Copy Markdown

Looks like PyDeck finished their update and merged the fix into 9.2.7

https://github.com/visgl/deck.gl/releases/tag/v9.2.7

@lukasmasuch
Copy link
Copy Markdown
Collaborator Author

But its not yet shipped as part of pydeck :(

image

lukasmasuch and others added 3 commits March 18, 2026 15:18
Update pandas upper bound from <3 to <4 to support pandas 3.0 and later versions. Streamlit's codebase is already fully compatible with pandas 3.x breaking changes including Copy-on-Write semantics, string dtype inference, and datetime resolution changes.

Co-Authored-By: Claude (claude-haiku-4-5) <[email protected]>
This commit addresses various compatibility issues with pandas 3.x:

- Update hashing.py: Use regex patterns to match both pandas 2.x
  (`pandas.core.frame.DataFrame`) and pandas 3.x (`pandas.DataFrame`)
  type paths for DataFrame and Series hashing.

- Update metrics_util.py: Add pandas 3.x type paths to the object
  name mapping since pandas 3.x changed __module__ from `pandas.core.*`
  to `pandas.*`.

- Update column_config_utils.py: Handle `large_string` PyArrow type
  which pandas 3.x uses for string columns instead of `string`.

- Update map.py: Convert string columns to object dtype before
  mapping color values to tuples, since pandas 3.x StringDtype
  cannot hold tuple values.

- Update test files: Handle pandas 3.x behavior where string columns
  use NA instead of None, and use flexible type checks for PyArrow
  arrays which may be StringArray or LargeStringArray.

- Update CI workflow: Add step to upgrade pandas to latest version
  for Python >= 3.11 (pandas 3.x requires Python >= 3.11), with
  UV_NO_SYNC=1 to prevent downgrade during uv run commands.

- Add pydeck test skip: Skip pydeck-related tests on pandas 3.x due
  to upstream pydeck incompatibility (vars() on DataFrame issue).

Co-Authored-By: Claude (claude-opus-4-5) <[email protected]>
Add _prepare_pydeck_for_json() function that converts pandas DataFrames
in pydeck layers to lists of dicts before JSON serialization. This works
around a pandas 3.x issue where DataFrames no longer have a __dict__
attribute that vars() can access, which breaks pydeck's default_serialize
function in json_tools.py.

This removes the need to skip pydeck tests on pandas 3.x.

Co-Authored-By: Claude (claude-opus-4-5) <[email protected]>
@lukasmasuch lukasmasuch force-pushed the lukasmasuch/pandas-3-support branch from 274a444 to 85c1f52 Compare March 18, 2026 14:19
lukasmasuch and others added 7 commits March 18, 2026 15:34
Replace the CI workaround that manually upgraded pandas with a uv
override-dependencies setting. This tells uv to automatically use
pandas 3.x for Python 3.11+ while keeping pandas 2.x for Python 3.10.

Also refine the pydeck compatibility fix to only apply for pandas >= 3.0.0.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The override-dependencies approach caused pandas to not be installed
on Python 3.10 due to unexpected resolution behavior. Reverting to
the explicit CI upgrade step which works correctly.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Define separate override-dependencies for Python < 3.11 (pandas 2.x)
and Python >= 3.11 (pandas 3.x) to help uv resolve correctly for
both cases.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Ray is not yet compatible with pandas 3.x (SettingWithCopyWarning
was removed in pandas 3.0). Skip the test until Ray releases a
compatible version.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Ray is not compatible with pandas 3.x and has had recurring CI issues
with initialization hangs. Remove the integration test and dependency
entirely - Ray dataset support is still tested via mocks.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Move the version check outside of _prepare_pydeck_for_json so the
conditional nature of the workaround is clear at the call site.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@lukasmasuch lukasmasuch changed the title [WIP] Support pandas 3.x Support pandas 3.x Mar 18, 2026
@lukasmasuch lukasmasuch changed the title Support pandas 3.x Add support for pandas 3.x Mar 18, 2026
@lukasmasuch lukasmasuch marked this pull request as ready for review March 18, 2026 15:36
@lukasmasuch lukasmasuch added the ai-review If applied to PR or issue will run AI review workflow label Mar 18, 2026
@github-actions github-actions bot removed the ai-review If applied to PR or issue will run AI review workflow label Mar 18, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Summary

This PR updates the pandas dependency upper bound from <3 to <4, officially adding support for pandas 3.x. It addresses several pandas 3.x breaking changes including module path changes (pandas.core.*pandas.*), new StringDtype inference behavior, PyArrow large_string type support, and pydeck serialization incompatibilities. The root pyproject.toml uses override-dependencies to force pandas 3.x on Python >= 3.11 for CI testing. The Ray integration test dependency is removed.

Code Quality

The implementation is clean, well-targeted, and follows existing patterns. Key observations:

  1. pyproject.toml:42 — missing upper bound on override (blocking): The override-dependencies entry pandas>=3.0.0; python_version >= '3.11' lacks an upper bound, while lib/pyproject.toml:70 caps pandas at <4. Although this only affects dev/CI environments (not user installations), it creates an inconsistency that could cause unexpected CI breakage when pandas 4.x is eventually released. Adding ,<4 is a trivial fix. (Raised by gpt-5.3-codex-high, verified — all three reviewers flagged the override-dependencies in varying degrees.)

  2. re.compile() called inline in hot path (hashing.py:419, 442): The regex patterns for matching pandas type names are compiled on every _to_bytes invocation. While Python's re module caches compiled patterns internally, pre-compiling as module-level Final constants would be cleaner and consistent with the file's existing conventions (e.g., _PANDAS_ROWS_LARGE). (Raised by opus-4.6-thinking — non-blocking.)

  3. In-place mutation of pydeck objects (deck_gl_json_chart.py:645-677): _prepare_pydeck_for_json replaces layer.data from a DataFrame to a list of dicts in-place. If a user passes the same Deck object to st.pydeck_chart multiple times, subsequent calls would receive lists instead of DataFrames. The docstring documents this, but it could be surprising. (Raised by opus-4.6-thinking — low risk given typical usage patterns.)

  4. Positive highlights: The map.py fix converting to object dtype before mapping color tuples is clean and well-commented. The is_large_string addition in column_config_utils.py is correct. Defensive getattr usage in the pydeck helper and the overall regex-based type matching approach are robust.

Test Coverage

  • Existing tests updated correctly: Assertions in dataframe_util_test.py and data_editor_test.py properly handle both pandas 2.x and 3.x behavior (pd.isna() instead of is None, isinstance(data, pa.Array) for pyarrow arrays).
  • Hashing tests exercise the new regex-based type matching on both pandas versions.
  • Coverage gap — _prepare_pydeck_for_json: The new helper function has no dedicated unit tests. Existing pydeck tests exercise the code path end-to-end (verifying serialized JSON output), but there are no targeted tests for edge cases (weakref-wrapped DataFrames, None pydeck objects, layers without data). (Raised by opus-4.6-thinking.)
  • Ray integration test removed: All three reviewers noted this. The removal is operationally justified (Ray was causing CI hangs), but is_ray_dataset loses direct integration test coverage. Mock-based tests still cover the logic, but consider either:
    • Keeping the test with @pytest.mark.skipif for unsupported environments (gemini-3.1-pro)
    • Documenting the Ray support scope going forward (gpt-5.3-codex-high)

Backwards Compatibility

No breaking changes for users. All three reviewers agree. The dependency constraint pandas>=1.4.0,<4 is a strict superset of the previous pandas>=1.4.0,<3. All code changes use version-agnostic approaches (regex patterns, pd.isna(), dtype checks accepting both object and StringDtype), ensuring correct behavior with both pandas 2.x and 3.x.

Security & Risk

No security concerns. All three reviewers agree. Changes are limited to internal data processing, type checking, hashing logic, and dependency configuration. No changes to auth, routing, WebSocket, embedding, or input processing.

Regression risk is low-to-moderate: The pydeck in-place mutation could affect edge cases with reused Deck objects, but this is unlikely in typical usage.

External test recommendation

  • Recommend external_test: No (unanimous)
  • Triggered categories: None
  • Confidence: High
  • All changes are in Python backend data handling with no modifications to routing, auth, WebSocket, embedding, asset serving, or security headers.

Accessibility

No frontend UI changes. No accessibility impact. (Unanimous)

Reviewer Agreement Matrix

Finding gemini-3.1-pro gpt-5.3-codex-high opus-4.6-thinking
Override-deps missing <4 Noted constraint setup Blocking issue Not raised
Ray test removal concern Raised (keep w/ skipif) Raised (follow-up plan) Noted (acceptable)
Regex pre-compilation Raised (non-blocking)
Pydeck in-place mutation Raised (low risk)
Missing pydeck helper tests Raised (non-blocking)
Code quality overall Good Good Good
Backwards compatibility Maintained Maintained Maintained
Security risk None None None
External tests needed No No No

Consolidated Recommendations

  1. (Blocking) Add upper bound <4 to the override-dependencies entry in root pyproject.toml:42 to align with lib/pyproject.toml and prevent future CI drift:

    "pandas>=3.0.0,<4; python_version >= '3.11'"
    
  2. (Non-blocking) Pre-compile regex patterns in hashing.py as module-level constants:

    _PANDAS_SERIES_RE: Final = re.compile(r"^pandas(\.core\.series)?\.Series$")
    _PANDAS_DATAFRAME_RE: Final = re.compile(r"^pandas(\.core\.frame)?\.DataFrame$")
  3. (Non-blocking) Consider adding targeted unit tests for _prepare_pydeck_for_json edge cases.

  4. (Non-blocking) Clarify the Ray support strategy — either keep the test with @pytest.mark.skipif for supported environments, or document the reduced test coverage as intentional.

Verdict

CHANGES_REQUESTED: While the overall implementation is solid and two of three reviewers approved, the missing upper bound on the pandas override-dependency (pyproject.toml:42) is a valid blocking concern that should be addressed before merge. The fix is trivial (adding ,<4) and prevents potential future CI instability. All other recommendations are non-blocking improvements.


Consolidated review by opus-4.6-thinking.


📋 Review by `gemini-3.1-pro`

Summary

The PR updates the pandas dependency upper bound from <3 to <4 to support pandas 3.0. It also includes several compatibility fixes across the codebase to handle pandas 3.x breaking changes, such as changes to string dtype inference, the __module__ path of pandas objects, and pydeck serialization issues. Additionally, it removes the ray integration test.

Code Quality

The code changes are clean and well-targeted to address pandas 3.x compatibility.

  • lib/streamlit/elements/deck_gl_json_chart.py: The workaround for pydeck serialization is well-documented and correctly handles the pandas 3.x __dict__ attribute removal.
  • lib/streamlit/runtime/caching/hashing.py: The use of regex to match both pandas 2.x and 3.x module paths (pandas.core.series.Series vs pandas.Series) is a robust solution.
  • lib/streamlit/elements/map.py: Converting the color column to object dtype before mapping to tuples is a necessary fix since pandas 3.x infers string columns as StringDtype, which cannot hold tuples.

One question:

  • pyproject.toml and lib/tests/streamlit/dataframe_util_test.py: The ray dependency and its integration test (test_verify_ray_integration) were completely removed. While the comment in pyproject.toml previously mentioned pinning Ray to avoid CI hangs, removing it entirely means is_ray_dataset is no longer tested in CI. Is this intentional, or should Ray be kept for Python < 3.12 environments?

Test Coverage

The PR author states that all Python unit tests pass with pandas 3.0.0. The changes in the test files (dataframe_util_test.py and data_editor_test.py) correctly update assertions to account for pandas 3.x behavior (e.g., using pd.isna instead of is None for missing values in string columns, and checking for pa.Array instead of an exact type match for pyarrow arrays). However, as noted above, the ray integration test was removed.

Backwards Compatibility

The changes maintain backwards compatibility with pandas 2.x while adding support for pandas 3.x. The override-dependencies in pyproject.toml correctly ensures that pandas 2.x is used for Python < 3.11, as pandas 3.x requires Python >= 3.11.

Security & Risk

No security concerns or regression risks identified. The changes are limited to data serialization, type checking, and hashing logic.

External test recommendation

  • Recommend external_test: No
  • Triggered categories: None
  • Evidence:
    • The changes only affect internal data processing, hashing, and pandas compatibility. No routing, auth, websocket, or embedding logic is modified.
  • Suggested external_test focus areas: N/A
  • Confidence: High
  • Assumptions and gaps: None

Accessibility

N/A - No frontend UI changes.

Recommendations

  1. Clarify the removal of the ray integration test. If Ray is incompatible with pandas 3.x or Python 3.12+, consider keeping the test but skipping it conditionally based on the Python/pandas version (e.g., using @pytest.mark.skipif), rather than removing it entirely, to ensure is_ray_dataset remains tested where supported.

Verdict

APPROVED: The changes correctly implement pandas 3.x support with appropriate compatibility workarounds and maintain backwards compatibility.


This is an automated AI review by gemini-3.1-pro.

📋 Review by `gpt-5.3-codex-high`

Summary

This PR expands pandas compatibility to 3.x by updating dependency constraints and applying targeted runtime/test adjustments (pydeck serialization, map color handling, Arrow string-kind detection, pandas type detection in hashing/metrics, and test expectation updates).

The direction is good and most changes are focused and low-risk, but there is one dependency-spec issue that should be fixed before merge.

Code Quality

The implementation is generally clean and follows existing patterns, with clear comments around pandas-3 behavior changes.

Issue found:

  • pyproject.toml:40-43 (blocking): tool.uv.override-dependencies sets pandas>=3.0.0 for Python >=3.11 without an upper bound, while lib/pyproject.toml:70 explicitly caps pandas at <4. Because this is an override, it can allow pandas 4.x in dev/CI once released, bypassing the intended package cap and creating future breakage risk.

Test Coverage

Coverage is mostly reasonable for the touched behaviors:

  • Existing tests in lib/tests/streamlit/elements/pydeck_test.py, lib/tests/streamlit/elements/map_test.py, lib/tests/streamlit/elements/lib/column_config_utils_test.py, lib/tests/streamlit/runtime/caching/hashing_test.py, and lib/tests/streamlit/runtime/metrics_util_test.py exercise the affected areas.
  • Updated assertions in lib/tests/streamlit/dataframe_util_test.py and lib/tests/streamlit/elements/data_editor_test.py align with pandas-3 dtype/NA behavior.

Coverage gap to note:

  • pyproject.toml:141-155 and lib/tests/streamlit/dataframe_util_test.py:470-597 remove Ray integration dependency/testing. This is understandable operationally, but it reduces real-integration confidence for continued Ray dataset support.

Backwards Compatibility

User-facing behavior remains largely backward compatible:

  • pandas 2.x remains supported.
  • pandas 3.x compatibility is improved in multiple code paths.

Main compatibility risk is tooling/dev-CI drift from the unbounded override (not immediate user runtime breakage, but likely future instability).

Security & Risk

No direct security-sensitive areas were modified (no auth/session/websocket/asset-serving/CORS/CSP changes), and I did not identify injection/XSS-style risks in this diff.

Primary risk is regression risk from dependency resolution drift and reduced Ray integration coverage.

External test recommendation

  • Recommend external_test: No
  • Triggered categories: None
  • Key evidence from changed files:
    • lib/streamlit/elements/deck_gl_json_chart.py, lib/streamlit/elements/map.py, lib/streamlit/runtime/caching/hashing.py, lib/streamlit/runtime/metrics_util.py: internal data handling/type detection changes only.
    • lib/pyproject.toml, pyproject.toml: dependency and test-environment configuration changes.
    • No changes to routing, auth/cookies/CSRF, websocket transport, embedding boundary, static asset serving, storage, or security headers.
  • Suggested external test focus areas: None required for this PR.
  • Confidence: High
  • Assumptions/gaps: Assessment is based on this PR diff only; no runtime e2e execution performed here.

Accessibility

No frontend UI changes were made, so there are no new accessibility impacts in this PR.

Recommendations

  1. Update pyproject.toml:42 to include <4 (for example, pandas>=3.0.0,<4; python_version >= '3.11') so dev/CI constraints stay aligned with lib/pyproject.toml.
  2. Consider adding a follow-up plan for Ray support confidence (either stable integration coverage or explicit support-scope documentation), since this PR removes Ray integration testing.

Verdict

CHANGES REQUESTED: The pandas override constraint in root pyproject.toml should be bounded to <4 before merge to avoid future resolver/CI breakage.


This is an automated AI review by gpt-5.3-codex-high.

📋 Review by `opus-4.6-thinking`

Summary

This PR updates the pandas dependency upper bound from <3 to <4, officially adding support for pandas 3.x. The changes address several pandas 3.x breaking changes:

  1. Module path changes (pandas.core.*pandas.*): Updated type checks in hashing and metrics utilities to handle both old and new paths.
  2. StringDtype inference: Pandas 3.x infers string columns as StringDtype instead of object, requiring fixes in st.map color handling and test assertions.
  3. PyArrow large_string type: Added is_large_string check to column config utilities.
  4. Pydeck serialization: Added a workaround for pydeck's broken vars()-based serialization with pandas 3.x DataFrames.
  5. Missing value representation: Pandas 3.x uses NA instead of None for missing values in string columns, requiring updated test assertions.

The root pyproject.toml uses override-dependencies to force pandas 3.x on Python >= 3.11 and pandas 2.x on Python < 3.11 (since pandas 3 requires Python >= 3.11). The Ray integration test dependency is removed.

Code Quality

Overall the code quality is good. Changes are well-targeted and follow existing patterns. A few observations:

  1. re.compile() called on every hash invocation (lib/streamlit/runtime/caching/hashing.py, lines 419 and 442): The regex patterns r"^pandas(\.core\.series)?\.Series$" and r"^pandas(\.core\.frame)?\.DataFrame$" are compiled inline each time _to_bytes is called. While Python's re module maintains an internal cache for compiled patterns, it would be cleaner and more explicit to pre-compile these as module-level constants (e.g., _PANDAS_SERIES_PATTERN and _PANDAS_DATAFRAME_PATTERN). This is consistent with how other module-level constants like _PANDAS_ROWS_LARGE are already used in the same file.

  2. _prepare_pydeck_for_json mutates the user's pydeck object (lib/streamlit/elements/deck_gl_json_chart.py, lines 645-677): The function replaces layer.data from a DataFrame to a list of dicts in-place. If a user passes the same Deck object to st.pydeck_chart multiple times, the first call would convert DataFrames to records, and subsequent calls would receive lists instead of DataFrames. The docstring documents this, but the mutation could be surprising. Consider either documenting this in the public API docstring or making a shallow copy of the layers data before conversion.

  3. Defensive getattr usage in _prepare_pydeck_for_json is appropriate and handles None cases well, including the weakref dereference check.

  4. map.py fix (line 454): Converting the column to object dtype before mapping color tuples is a clean fix for the StringDtype issue. The comment explains the reasoning well.

  5. column_config_utils.py (line 173): Adding is_large_string is correct since pandas 3.x + pyarrow can produce large_string types for string columns.

Test Coverage

  • Existing tests updated: The test modifications in dataframe_util_test.py and data_editor_test.py correctly adapt assertions to handle both pandas 2.x and 3.x behavior (e.g., pd.isna() instead of is None, isinstance(data, pa.Array) for pyarrow arrays).

  • Missing unit tests for _prepare_pydeck_for_json: The new helper function has no dedicated unit tests. While the existing pydeck tests (test_basic) exercise the code path indirectly (DataFrames are passed through layers), there are no tests specifically verifying:

    • Handling of weakref-wrapped DataFrames
    • Handling of None pydeck objects
    • Handling of layers without data
    • Behavior when called with non-DataFrame layer data (e.g., URLs)

    The existing test_basic pydeck test does verify the end-to-end output is correct (asserting the serialized JSON contains the expected records), which provides reasonable coverage. However, a dedicated test for the helper would improve confidence and serve as regression protection.

  • Hashing tests: The existing hashing tests (test_pandas_dataframe, test_pandas_series, test_pandas_large_dataframe) use actual pandas objects, so they will exercise the new regex-based type matching on both pandas 2.x and 3.x, providing good coverage.

  • Ray integration test removed: The test_verify_ray_integration test is removed along with the ray dependency. This is reasonable since Ray was causing CI hangs, and Ray dataset support is covered by other mock-based tests.

Backwards Compatibility

No breaking changes for users. The dependency constraint pandas>=1.4.0,<4 is a strict superset of the previous pandas>=1.4.0,<3. Users who had pandas 2.x installed will continue to work without any changes. Users who install pandas 3.x will benefit from the compatibility fixes.

The override-dependencies in the root pyproject.toml only affects the development environment (forcing pandas 3.x on Python >= 3.11 to ensure CI tests both versions). This does not affect end users' installations.

All code changes use version-agnostic approaches (regex patterns, pd.isna(), dtype checks that accept both object and StringDtype), so they work correctly with both pandas 2.x and 3.x.

Potential concern: The data_editor.py file (lines 496-498) uses is_type() with hardcoded pandas.core.indexes.numeric.* paths that weren't updated. However, these are for deprecated Index types already removed in pandas 2.0, so they serve only as backwards compatibility checks for very old pandas versions and are not affected by this change.

Security & Risk

No security concerns. The changes are limited to:

  • Version constraint updates in dependency configuration
  • Internal type checking and data conversion logic
  • Test assertion adjustments

No new external dependencies, no changes to network handling, no changes to user input processing, no new eval/exec usage.

Regression risk is low-to-moderate:

  • The pydeck workaround mutates the Deck object in-place, which could affect edge cases where users reuse Deck objects. However, this is unlikely in typical Streamlit usage patterns.
  • The re.compile patterns in hashing.py match both pandas.core.series.Series and pandas.Series, covering both pandas 2.x and 3.x paths.

External test recommendation

  • Recommend external_test: No
  • Triggered categories: None
  • Evidence:
    • All changes are in Python backend data handling (lib/streamlit/elements/, lib/streamlit/runtime/)
    • No changes to routing, auth, WebSocket, embedding, asset serving, or security headers
    • No frontend changes
    • Dependency version constraint update only
  • Suggested external_test focus areas: None
  • Confidence: High
  • Assumptions and gaps: Assumes pydeck rendering in externally hosted environments uses the same serialization path. Since the change pre-converts DataFrames to records (which was already the end state in pydeck's serialization), this should be transparent.

Accessibility

No frontend changes in this PR. All modifications are in Python backend code and test files. No accessibility impact.

Recommendations

  1. Pre-compile regex patterns in hashing.py: Move the re.compile() calls to module-level constants to avoid recompiling on every _to_bytes invocation. While Python caches compiled patterns internally, explicit constants are clearer and follow the existing pattern in the file:

    _PANDAS_SERIES_RE: Final = re.compile(r"^pandas(\.core\.series)?\.Series$")
    _PANDAS_DATAFRAME_RE: Final = re.compile(r"^pandas(\.core\.frame)?\.DataFrame$")
  2. Consider adding unit tests for _prepare_pydeck_for_json: A small test exercising the weakref path, the None pydeck path, and a basic DataFrame conversion would improve coverage for this new function.

  3. Document or mitigate in-place mutation of pydeck objects: The _prepare_pydeck_for_json function modifies layer data in-place. Consider either:

    • Adding a note in the st.pydeck_chart docstring about this behavior, or
    • Replacing layer.data = data.to_dict(orient="records") with a copy-based approach to avoid mutating the user's object.

    This is a minor concern since most users create Deck objects inline, but it's worth noting for robustness.

  4. (Nit) Consider whether the weakref import should be conditional: The import weakref at the top of deck_gl_json_chart.py is always imported but only used in the pandas 3.x code path. This is a negligible cost and acceptable, but worth noting.

Verdict

APPROVED: Well-executed pandas 3.x compatibility update with correct handling of all major breaking changes. The code changes are backwards compatible, well-commented, and the test adjustments properly accommodate both pandas 2.x and 3.x behavior. The recommendations above are minor improvements that don't block merging.


This is an automated AI review by opus-4.6-thinking.

@github-actions github-actions bot added the do-not-merge PR is blocked from merging label Mar 18, 2026
lukasmasuch and others added 2 commits March 18, 2026 16:59
Suppress the pandas FutureWarning about DataFrame concatenation with
empty/NA entries that appears when users add rows in st.data_editor.
The warning is triggered by pandas 2.1-2.x when using .loc[] to assign
a new row, and is not actionable by users. The warning was removed in
pandas 3.x where the behavior change has been implemented.

Fixes #14321

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add `<4` upper bound to the pandas override-dependency for Python 3.11+
to align with lib/pyproject.toml and prevent future CI breakage when
pandas 4.x is released.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@lukasmasuch lukasmasuch added the ai-review If applied to PR or issue will run AI review workflow label Mar 18, 2026
@github-actions github-actions bot removed the ai-review If applied to PR or issue will run AI review workflow label Mar 18, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Summary

This PR updates Streamlit to support pandas 3.x by widening the dependency upper bound from <3 to <4 and applying targeted compatibility fixes across multiple subsystems. Pandas 3.x introduces several breaking changes—Copy-on-Write semantics, StringDtype inference, large_string Arrow types, module path relocations (pandas.core.*pandas.*), and removal of __dict__ from DataFrames—that required surgical fixes in pydeck serialization, map color handling, Arrow type detection, caching/hashing, metrics, and the data editor. The PR also removes the Ray integration test and dependency (likely incompatible with pandas 3.x). Unit tests are updated to accommodate both pandas 2.x and 3.x behaviors.

Code Quality

All three reviewers agreed the code is well-structured, minimal, and cleanly addresses the compatibility issues. Specific highlights:

  • _prepare_pydeck_for_json (deck_gl_json_chart.py): Clean helper with proper None guards, weakref handling, and a clear docstring referencing the upstream issue. Follows module convention with _ prefix.
  • Regex patterns in hashing.py: Concise patterns correctly matching both pandas 2.x (pandas.core.series.Series) and 3.x (pandas.Series) module paths.
  • metrics_util.py: Both old and new module paths mapped with clear grouping comments.
  • data_editor.py: Appropriately scoped warnings.catch_warnings() with clear documentation of why the suppression exists.
  • map.py: Correct astype(object) conversion before assigning tuple values to StringDtype columns.

Minor style nit (raised by opus-4.6-thinking): The is_pandas_version_less_than import at deck_gl_json_chart.py:523 is inside the method body rather than at the top of the file. This deviates slightly from project convention but is acceptable as a conditional import guarding a workaround.

Test Coverage

Unit tests are updated appropriately to handle both pandas 2.x and 3.x behaviors:

  • Relaxed string dtype checks to accept both object and StringDtype.
  • Changed is None assertions to pd.isna() to handle pd.NA (pandas 3.x) vs None (pandas 2.x).
  • Added isinstance(converted_data, pa.Array) checks to handle LargeStringArray from pandas 3.x.

Gaps identified (consensus across reviewers):

  • No dedicated unit test for the new _prepare_pydeck_for_json function, particularly the weakref handling path. (Raised by gpt-5.3-codex-high and opus-4.6-thinking)
  • No unit test for the map.py color column astype(object) change. (Raised by opus-4.6-thinking)
  • No e2e tests added, though this is reasonable given the backend-only scope. (Raised by gpt-5.3-codex-high)

The PR description confirms all 8072 Python unit tests pass with pandas 3.0.0, providing confidence in overall coverage.

Ray test removal: All reviewers found this acceptable given the likely pandas 3.x incompatibility. One reviewer (opus-4.6-thinking) suggested extracting this to a separate PR for cleaner separation of concerns—a reasonable but non-blocking suggestion.

Backwards Compatibility

All three reviewers confirmed full backwards compatibility. Key points of agreement:

  • The pandas constraint was widened (<3<4), not narrowed. Users on pandas 2.x are unaffected.
  • override-dependencies in pyproject.toml correctly pins pandas 3.x only for Python ≥3.11 (matching pandas 3's own requirement).
  • All code changes are additive guards (version checks, or conditions, astype conversions) that don't alter pandas 2.x behavior.
  • The pydeck workaround is gated behind not is_pandas_version_less_than("3.0.0").

One concern raised (by gemini-3.1-pro and opus-4.6-thinking): _prepare_pydeck_for_json mutates the user's pydeck.Deck object in-place by converting layer.data from DataFrames to list[dict]. In Streamlit's rerun-based execution model this is unlikely to cause issues, but if a user constructs a Deck once and passes it to multiple st.pydeck_chart calls in the same script run, the second call would see list[dict] instead of DataFrames. This is a minor concern worth documenting in a code comment.

Security & Risk

No security concerns identified (unanimous). All changes are limited to:

  • Pandas version constraint updates
  • Data type handling (Arrow type checks, dtype conversions)
  • JSON serialization workarounds (DataFrame → dict conversion)
  • Warning suppression for a specific, non-actionable FutureWarning

No changes to WebSocket handling, authentication, file serving, CORS, CSP, or other security-sensitive areas.

Regression risk is low-to-medium. Main risks:

  • Undiscovered pandas 3.x behavioral differences in edge cases not covered by tests.
  • Pydeck upstream fix could change behavior when the workaround is still active.

External test recommendation

  • Recommend external_test: No (unanimous)
  • Triggered categories: None
  • Confidence: High
  • All changes are in Python backend data handling with no impact on routing, WebSocket, auth, embedding, CORS, CSP, static assets, or service workers.

Accessibility

No frontend/UI changes are included in this PR. No accessibility impact. (Unanimous)

Reviewer Agreement & Disagreements

Strong Agreement

  • Verdict: All three reviewers approved the PR.
  • Security: No concerns (unanimous).
  • Backwards compatibility: Fully maintained (unanimous).
  • External tests: Not needed (unanimous).
  • Code quality: Well-structured, clean, minimal changes (unanimous).

Complementary Findings (no conflicts)

  • PR description inaccuracy (gemini-3.1-pro): The PR description claims "no code changes were needed" but several workarounds were required. Worth correcting for future reference.
  • In-place mutation concern (gemini-3.1-pro, opus-4.6-thinking): The pydeck workaround mutates user objects in-place, which could affect multi-call patterns.
  • Missing unit tests (gpt-5.3-codex-high, opus-4.6-thinking): Both recommended dedicated tests for _prepare_pydeck_for_json and the weakref path.
  • Ray removal separation (opus-4.6-thinking): Suggested extracting to a separate PR for cleaner history.
  • Future-proofing (opus-4.6-thinking): When pydeck fixes the upstream vars() issue, the workaround should be removed.
  • Warning suppression test (gpt-5.3-codex-high): Suggested testing _assign_row_values warning suppression to catch future pandas warning-message changes.

No Conflicts

There were no disagreements between reviewers on any point. All findings are complementary.

Consolidated Recommendations

  1. Add unit tests for _prepare_pydeck_for_json — covering DataFrame-to-dict conversion, the weakref handling path, and None input. (Raised by 2/3 reviewers)
  2. Document or mitigate the in-place mutation in _prepare_pydeck_for_json — either add a code comment or consider converting DataFrames without mutating the original layer objects. (Raised by 2/3 reviewers)
  3. Fix the PR description — it states "no code changes were needed" but several compatibility workarounds were implemented. (Raised by 1/3 reviewers)
  4. Add a TODO/version-check for pydeck upstream fix — when pydeck resolves the vars() issue, the workaround should be removed. (Raised by 1/3 reviewers)
  5. Consider separating the Ray removal — tangentially related to pandas 3 support; would be cleaner as a separate PR. (Raised by 1/3 reviewers)

None of these are blocking issues.

Verdict

APPROVED: All three reviewers approved unanimously. The PR makes well-scoped, minimal changes to support pandas 3.x while maintaining full backwards compatibility with pandas 2.x. All code changes are correctly gated by version checks or compatible with both versions. No security, accessibility, or backwards compatibility concerns were identified. The recommendations above are improvements for follow-up, not blockers.


This is a consolidated AI review by opus-4.6-thinking, synthesizing reviews from gemini-3.1-pro, gpt-5.3-codex-high, and opus-4.6-thinking.


📋 Review by `gemini-3.1-pro`

Summary

This PR adds support for pandas 3.x by updating the dependency upper bound to <4 and fixing several compatibility issues related to pandas 3.x breaking changes, such as module path changes, string dtype inference, and pydeck serialization. It also removes a flaky Ray integration test and its associated version pin.

Code Quality

The code changes are well-structured and cleanly address the pandas 3.x compatibility issues.

  • The workaround for pydeck serialization in lib/streamlit/elements/deck_gl_json_chart.py correctly handles the removal of __dict__ in pandas 3.x DataFrames.
  • The use of re.compile in lib/streamlit/runtime/caching/hashing.py to match both old (pandas.core.*) and new (pandas.*) module paths is an elegant solution.
  • The updates to lib/streamlit/elements/map.py to explicitly cast to object before assigning tuples correctly handles the new StringDtype behavior.

Test Coverage

The unit tests have been appropriately updated to account for pandas 3.x behavior (e.g., pd.isna instead of is None, and checking for is_string_dtype alongside is_object_dtype). The removal of the Ray integration test (test_verify_ray_integration) seems acceptable if it was causing CI hangs, though it does reduce coverage for Ray datasets slightly.

Backwards Compatibility

The changes are fully backwards compatible. The pyproject.toml updates correctly use environment markers to ensure pandas 2.x is used for Python < 3.11, while allowing pandas 3.x for Python 3.11+. The code changes maintain support for older pandas versions.

Security & Risk

No security concerns identified. The changes are limited to data serialization, type checking, and test updates.

External test recommendation

  • Recommend external_test: No
  • Triggered categories: None
  • Evidence:
    • lib/pyproject.toml: Dependency version bump.
    • lib/streamlit/elements/*: Data conversion and serialization fixes for pandas 3.x.
    • lib/streamlit/runtime/*: Hashing and metrics path updates.
  • Suggested external_test focus areas: N/A
  • Confidence: High
  • Assumptions and gaps: None. The changes do not touch routing, auth, websockets, or embedding boundaries.

Accessibility

N/A - No frontend UI changes.

Recommendations

  1. The PR description mentions "No code changes were needed; the existing codebase handles all pandas 3.x breaking changes", but several code changes were actually required (e.g., pydeck serialization, map color column casting). It might be worth updating the PR description to accurately reflect these workarounds for future reference.

Verdict

APPROVED: The changes correctly implement pandas 3.x support while maintaining backwards compatibility and fixing necessary serialization and type inference edge cases.


This is an automated AI review by gemini-3.1-pro.

📋 Review by `gpt-5.3-codex-high`

Summary

This PR updates Streamlit to support pandas 3.x by widening dependency constraints and adding compatibility fixes in pydeck serialization, map color handling, Arrow string-kind detection, caching hash type checks, and metrics object-name mapping. It also updates related unit tests and removes the Ray integration dependency/test from the integration dependency group.

Code Quality

The implementation is focused and follows existing patterns in lib/streamlit and lib/tests.

  • No blocking code-quality issues were identified in the changed runtime paths.
  • The pandas 2.x/3.x branching is generally explicit and readable (for example in lib/streamlit/elements/deck_gl_json_chart.py:L521-L527 and lib/streamlit/runtime/caching/hashing.py:L418-L443).
  • Potential future maintainability consideration: the pydeck workaround mutates layer.data in place (lib/streamlit/elements/deck_gl_json_chart.py:L645-L677), which is acceptable here but worth keeping in mind if users reuse the same Deck object across operations.

Test Coverage

Coverage is good for several behavior shifts, but there are small remaining gaps.

  • Updated tests correctly account for pandas 3.x dtype/value behavior changes in:
    • lib/tests/streamlit/dataframe_util_test.py:L348-L351 and L648-L655
    • lib/tests/streamlit/elements/data_editor_test.py:L236-L237 and L910-L917
  • Existing suites already exercise affected areas like pydeck/map/hashing/column-config (lib/tests/streamlit/elements/pydeck_test.py, lib/tests/streamlit/elements/map_test.py, lib/tests/streamlit/runtime/caching/hashing_test.py, lib/tests/streamlit/elements/lib/column_config_utils_test.py).
  • No e2e coverage was added. Given scope (backend/data-compatibility), this is reasonable, but a targeted smoke e2e for pandas-3 st.map/st.pydeck_chart would further reduce regression risk.

Backwards Compatibility

Overall backwards compatibility looks good.

  • pandas 2.x compatibility is preserved while adding pandas 3.x support:
    • Dependency range widened in lib/pyproject.toml:L68-L70.
    • Dev resolver override split by Python version in pyproject.toml:L37-L43.
    • Runtime type-path logic supports both old and new pandas module paths (lib/streamlit/runtime/caching/hashing.py:L418-L443, lib/streamlit/runtime/metrics_util.py:L47-L54).
  • The removal of Ray from integration dependencies/tests (pyproject.toml:L140-L154 and deleted test_verify_ray_integration) reduces real integration verification for Ray objects, but does not remove runtime support paths.

Security & Risk

No direct security concerns were found.

  • No changes touch authentication/session handling, websocket handshake logic, request routing, CORS/XSRF/cookies, file-serving, or CSP/security headers.
  • No new dependency introducing external service calls or dynamic code execution (eval/exec/Function) was added.
  • Main residual risk is functional regression risk in pandas-edge serialization/type paths rather than security exposure.

External test recommendation

  • Recommend external_test: No
  • Triggered categories: None
  • Evidence:
    • lib/streamlit/elements/deck_gl_json_chart.py: pandas 3 serialization workaround for pydeck layer data; no routing/auth/session/cross-origin boundary changes.
    • lib/streamlit/elements/map.py: dataframe dtype conversion before color mapping; local data transformation only.
    • lib/streamlit/runtime/caching/hashing.py: type-detection updates for pandas module-path changes; internal cache hashing only.
    • lib/streamlit/runtime/metrics_util.py: object-name mapping updates only.
    • lib/pyproject.toml and pyproject.toml: dependency constraint updates only.
  • Suggested external_test focus areas:
    • None required by checklist hit. Optional confidence smoke: verify st.pydeck_chart and st.map render correctly on an externally hosted app using pandas 3.x.
  • Confidence: High
  • Assumptions and gaps: Assessment is based on static diff/code review only; tests/build were intentionally not executed per instructions.

Accessibility

No frontend/UI code changes were included in this PR, so there are no direct accessibility deltas to assess.

Recommendations

  1. Add a targeted unit test for the pydeck weakref branch in lib/streamlit/elements/deck_gl_json_chart.py:L669-L673 to lock in pandas-3 behavior for pydeck's DataFrame wrapping path.
  2. Consider adding a lightweight regression test around _assign_row_values warning suppression in lib/streamlit/elements/widgets/data_editor.py:L343-L354 so future pandas warning-message changes are caught early.

Verdict

APPROVED: The patch is coherent, low-risk from a security perspective, and appears ready to merge with optional follow-up hardening in a few targeted tests.


This is an automated AI review by gpt-5.3-codex-high. Please verify the feedback and use your judgment.

📋 Review by `opus-4.6-thinking`

Summary

This PR updates the pandas dependency upper bound from <3 to <4 to support pandas 3.0+. Pandas 3.x introduces several breaking changes—Copy-on-Write semantics, StringDtype inference for string columns, large_string Arrow types, module path changes (pandas.core.*pandas.*), and removal of __dict__ from DataFrames—that required targeted compatibility fixes across multiple Streamlit subsystems:

  • Dependency constraints: lib/pyproject.toml bumped to <4; root pyproject.toml uses override-dependencies to pin pandas 2.x on Python 3.10 (pandas 3 requires ≥3.11) and pandas 3.x on Python 3.11+.
  • Pydeck serialization: New _prepare_pydeck_for_json workaround converts DataFrames in layers to list[dict] before serialization to avoid pydeck's broken vars() call on pandas 3.x DataFrames.
  • Column config: Added is_large_string() check for Arrow types.
  • Map color handling: Pre-converts string columns to object dtype before assigning tuple color values.
  • Data editor: Suppresses a transient FutureWarning from pandas 2.1–2.x.
  • Caching/hashing: Uses regex patterns to match both pandas.core.* and pandas.* module paths.
  • Metrics: Added pandas 3.x module paths to the name mapping.
  • Tests: Relaxed assertions to accommodate StringDtype, pd.NA, and LargeStringArray.
  • Ray removal: Removed Ray integration test and dependency (likely incompatible with pandas 3.x).

Code Quality

The code is well-structured and the changes are minimal, surgical, and well-commented. Specific observations:

  1. _prepare_pydeck_for_json (deck_gl_json_chart.py:645–677): Clean helper with proper None guards, weakref handling, and clear docstring. The upstream issue is referenced. The function name starts with _ (private), following the module convention.

  2. Regex patterns in hashing.py:419,442: The patterns r"^pandas(\.core\.series)?\.Series$" and r"^pandas(\.core\.frame)?\.DataFrame$" are correct and concise. They properly handle both pandas 2.x (pandas.core.series.Series) and 3.x (pandas.Series) module paths.

  3. metrics_util.py:47–56: Both old and new module paths are mapped to the same short names. The grouping with comments (# pandas 2.x paths / # pandas 3.x paths) improves readability.

  4. data_editor.py:341–354: The warnings.catch_warnings() context manager is appropriately scoped. The comment clearly explains why the warning is suppressed and that it was removed in pandas 3.x.

  5. map.py:452–454: The astype(object) conversion before map(to_int_color_tuple) is the correct approach for handling StringDtype columns that need to hold tuple values.

  6. Minor note: In deck_gl_json_chart.py:523–526, the is_pandas_version_less_than import is inside the method body rather than at the top of the file. This is acceptable since it's a conditional import guarding a workaround, and dataframe_util is already imported elsewhere. But it deviates from the project convention of preferring top-level imports. This is a very minor style nit.

Test Coverage

Unit tests are updated appropriately:

  • dataframe_util_test.py:349–351: Relaxed the string dtype check to accept both object and StringDtype with an or assertion. This correctly handles both pandas 2.x and 3.x behavior.
  • dataframe_util_test.py:648–655 and data_editor_test.py:910–917: Both test files add isinstance(converted_data, pa.Array) checks for PYARROW_ARRAY format, since pandas 3.x may return LargeStringArray instead of StringArray. This is the right approach.
  • data_editor_test.py:237: Changed is None to pd.isna() to handle pd.NA (pandas 3.x) vs None (pandas 2.x).

Missing test coverage:

  • No dedicated unit tests for the _prepare_pydeck_for_json function. While E2E pydeck tests exist, a unit test verifying the DataFrame-to-dict conversion (especially the weakref handling path) would increase confidence.
  • No unit tests for the map.py color column astype(object) change.
  • The existing E2E tests for st.pydeck_chart, st.map, and st.data_editor should cover the runtime behavior, and the PR description confirms all 8072 Python unit tests pass with pandas 3.0.0.

Ray test removal: The test_verify_ray_integration test was removed along with the ray dependency. This appears related to Ray incompatibility with pandas 3.x. This is reasonable since Ray compatibility is outside Streamlit's control.

Backwards Compatibility

No breaking changes for existing users:

  • The pandas constraint was widened (<3<4), not narrowed. Users on pandas 2.x will continue to work.
  • The override-dependencies in pyproject.toml pins pandas 3.x only for Python ≥3.11 (matching pandas 3's own requirement). Python 3.10 users stay on pandas 2.x.
  • All code changes are additive guards (version checks, or conditions, astype conversions) that don't alter behavior for pandas 2.x.
  • The pydeck workaround is gated behind not is_pandas_version_less_than("3.0.0"), so it only activates on pandas 3.x.

One concern: _prepare_pydeck_for_json mutates the user's pydeck.Deck object in-place by converting layer.data from DataFrames to list[dict]. In Streamlit's execution model (scripts rerun on each interaction), this is unlikely to cause issues in practice. However, if a user constructs a Deck once and passes it to multiple st.pydeck_chart calls in the same script run, the second call would see list[dict] data instead of DataFrames. This is a minor concern given pydeck's typical usage patterns but worth documenting in a code comment.

Security & Risk

No security concerns identified. The changes are limited to:

  • Pandas version constraint updates
  • Data type handling (Arrow type checks, dtype conversions)
  • JSON serialization workarounds (DataFrame → dict conversion)
  • Warning suppression for a specific, non-actionable FutureWarning

No changes to WebSocket handling, authentication, file serving, CORS, CSP, or other security-sensitive areas.

Regression risk is low-to-medium. The main risks are:

  • Undiscovered pandas 3.x behavioral differences in edge cases not covered by the 8072 unit tests.
  • Pydeck upstream fix could change behavior when the _prepare_pydeck_for_json workaround is still active.
  • The override-dependencies mechanism forces specific pandas versions in the dev/CI environment, which may mask compatibility issues for users who install different pandas versions.

External test recommendation

  • Recommend external_test: No
  • Triggered categories: None
  • Evidence:
    • All changes are in Python backend data handling (lib/streamlit/elements/, lib/streamlit/runtime/)
    • No changes to routing, WebSocket handling, auth, embedding, CORS, CSP, static assets, or service workers
    • No frontend changes whatsoever
    • No changes to server endpoints or session management
  • Suggested external_test focus areas: N/A
  • Confidence: High
  • Assumptions and gaps: Assumes pandas version differences don't affect Arrow IPC serialization format in ways that would affect WebSocket transport. This is a safe assumption since Arrow serialization is version-independent.

Accessibility

No frontend changes are included in this PR. All changes are to Python backend code. No accessibility impact.

Recommendations

  1. Add unit test for _prepare_pydeck_for_json: A targeted unit test verifying DataFrame-to-dict conversion, including the weakref path and None input, would strengthen coverage for this new function.

  2. Document mutation in _prepare_pydeck_for_json: Add a note that the function modifies the user's pydeck object in-place, which could affect subsequent use of the same object within a single script run. Alternatively, consider converting DataFrames without mutating the original layer objects (e.g., by creating a modified copy of layer data for serialization).

  3. Consider extracting Ray removal to a separate PR: The Ray integration test and dependency removal (test_verify_ray_integration, ray>=2.9.0,<2.44) seems tangentially related to pandas 3 support. Separating it would make this PR more focused and the change easier to revert independently if needed.

  4. Minor: The is_pandas_version_less_than import at deck_gl_json_chart.py:523 is inside the method body. Consider moving it to the top-level imports or to the function-level scope of _prepare_pydeck_for_json for consistency with the project's import style.

  5. Future-proofing: When pydeck releases a fix for the pandas 3.x vars() issue (tracked at [Bug] pydeck breaks streamlit with pandas 3 visgl/deck.gl#9986), the _prepare_pydeck_for_json workaround should be removed. Consider adding a version check against pydeck or a TODO comment with the upstream issue.

Verdict

APPROVED: The PR makes well-scoped, minimal changes to support pandas 3.x while maintaining full backwards compatibility with pandas 2.x. All code changes are correctly gated by version checks or compatible with both versions. The recommendations above are minor improvements, not blockers.


This is an automated AI review by opus-4.6-thinking.

@github-actions github-actions bot removed the do-not-merge PR is blocked from merging label Mar 18, 2026
lukasmasuch and others added 3 commits March 23, 2026 20:00
Pandas 3.x uses large_string (64-bit offsets) by default instead of
string (32-bit), which produces ~18% larger Arrow IPC serialization.
This is a pandas ecosystem change, not triggered by Streamlit code.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Address code review feedback:
- Add 8 unit tests covering DataFrame conversion, weakref handling,
  None input, multiple layers, and edge cases
- Enhance docstring to document in-place mutation behavior and
  multi-call pattern implications
- Add reference to upstream pydeck issue

Co-Authored-By: Claude Opus 4.6 <[email protected]>

@pytest.mark.require_integration
@pytest.mark.timeout(60) # 60 second timeout to prevent CI hangs
def test_verify_ray_integration(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Was this removed because it's incompatible with the Pandas version? Are we losing valuable test coverage by this removal?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ray test was already a bit broken for some time and pinned to an older version; with Pandas 3, it got more flaky. But I don't think it's worth investing time in fixing this since the usage of ray objects in our dataframe commands is very, very low. We could fully remove the support, but the integration is very lightweight as well.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info, I'm fine with removing if we don't think it's valuable!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I removed the ray support here as well

Ray Dataset support was rarely used and adds maintenance burden.
This removes all Ray-related code including type detection, conversion
functions, metrics tracking, and associated tests.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@lukasmasuch lukasmasuch enabled auto-merge (squash) March 23, 2026 20:38
@lukasmasuch lukasmasuch merged commit 154b070 into develop Mar 23, 2026
43 checks passed
@lukasmasuch lukasmasuch deleted the lukasmasuch/pandas-3-support branch March 23, 2026 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:feature PR contains new feature or enhancement implementation impact:users PR changes affect end users

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Pandas 3

4 participants