Skip to content

Allow programmatically setting st.dataframe selections#13594

Merged
lukasmasuch merged 34 commits intodevelopfrom
feature/set-dataframe-selections
Mar 4, 2026
Merged

Allow programmatically setting st.dataframe selections#13594
lukasmasuch merged 34 commits intodevelopfrom
feature/set-dataframe-selections

Conversation

@lukasmasuch
Copy link
Copy Markdown
Collaborator

@lukasmasuch lukasmasuch commented Jan 15, 2026

Describe your changes

Allow programmatically setting the st.dataframe selection state via session state, and add a new selection_default parameter for specifying initial selections.

  • Session state writes to the widget key are validated and forwarded to the frontend as a one-shot selectionState proto field
  • New selection_default parameter provides initial selection on first render without overriding subsequent user selections
  • Backend validates row indices, column names, and cell positions against actual data dimensions, filtering invalid entries
  • Programmatic cell selection is limited to single-cell mode (multi-cell ranges require rectangular range info that can't be reconstructed)

GitHub Issue Link (if applicable)

Testing Plan

  • Unit Tests (Python): lib/tests/streamlit/elements/arrow_dataframe_test.py — 27+ new tests for _validate_selection_state and integration tests for selection_default
  • Unit Tests (TypeScript): useWidgetState.test.ts — 14+ new tests for getProgrammaticSelectionState and selection default loading; useSelectionHandler.test.ts — consolidated with it.each
  • E2E Tests: e2e_playwright/st_dataframe_selections_test.py — Tests for programmatic row/column/cell selection, clearing, and selection defaults

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

@snyk-io
Copy link
Copy Markdown
Contributor

snyk-io bot commented Jan 15, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 15, 2026

✅ PR preview is ready!

Name Link
📦 Wheel file https://core-previews.s3-us-west-2.amazonaws.com/pr-13594/streamlit-1.54.0-py3-none-any.whl
📦 @streamlit/component-v2-lib Download from artifacts
🕹️ Preview app pr-13594.streamlit.app (☁️ Deploy here if not accessible)

@lukasmasuch lukasmasuch changed the title Feature/set-dataframe-selections [Prototype] Allow programmatically setting the st.dataframe state Jan 15, 2026
@lukasmasuch lukasmasuch added security-assessment-completed change:feature PR contains new feature or enhancement implementation impact:users PR changes affect end users labels Jan 15, 2026
@lukasmasuch lukasmasuch changed the title [Prototype] Allow programmatically setting the st.dataframe state [Prototype] Allow programmatically setting st.dataframe selections Jan 15, 2026
lukasmasuch and others added 5 commits January 15, 2026 04:28
- Fix Python test `StArrowTableAPITest::test_table` to use `new_element.table`
  and `proto.arrow_data.data` matching the new `TableProto` structure
- Fix TypeScript type errors in `useWidgetState.test.ts` by replacing undefined
  `ArrowProto` references with the already-imported `DataframeProto`

Co-authored-by: Cursor <[email protected]>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 8, 2026

📉 Frontend coverage change detected

The frontend unit test (vitest) coverage has decreased by 0.0100%

  • Current PR: 87.6900% (14404 lines, 1773 missed)
  • Latest develop: 87.7000% (14358 lines, 1766 missed)

✅ Coverage change is within normal range.

📊 View detailed coverage comparison

lukasmasuch and others added 2 commits February 8, 2026 01:58
The `text_content()` method returns `str | None`, but `to_have_text()`
expects a non-None value. Adding an assert narrows the type.

Co-authored-by: Cursor <[email protected]>
@lukasmasuch lukasmasuch changed the title [Prototype] Allow programmatically setting st.dataframe selections Allow programmatically setting st.dataframe selections Feb 8, 2026
@lukasmasuch lukasmasuch marked this pull request as ready for review February 8, 2026 16:56
Copilot AI review requested due to automatic review settings February 8, 2026 16:56
@streamlit streamlit deleted a comment from cursor bot Feb 8, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables programmatic control of st.dataframe selection by allowing users to write a selection-state object into st.session_state[key], which is then validated on the backend and applied on the frontend.

Changes:

  • Add selection_state to the Dataframe protobuf to carry backend-driven selection state to the frontend.
  • Add backend validation (_validate_selection_state) and send validated selection JSON to the frontend when session state changes.
  • Add frontend support to apply programmatic selections, avoid feedback loops, and extend unit + e2e coverage.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
proto/streamlit/proto/Dataframe.proto Adds selection_state field to transport programmatic selection state to the frontend.
lib/streamlit/elements/arrow.py Validates session-state selection payloads and forwards programmatic selection state to the frontend; allows session-state writes for this widget.
lib/tests/streamlit/elements/arrow_dataframe_test.py Removes the “no session_state writes” expectation and adds focused unit tests for selection-state validation.
frontend/lib/src/components/widgets/DataFrame/hooks/useWidgetState.ts Adds parsing helper + programmatic selection retrieval/sync behavior.
frontend/lib/src/components/widgets/DataFrame/hooks/useWidgetState.test.ts Adds unit tests covering programmatic selection parsing + widget manager syncing.
frontend/lib/src/components/widgets/DataFrame/hooks/useSelectionHandler.ts Adds an option to skip syncing selection updates to widget state (used to avoid programmatic feedback loops).
frontend/lib/src/components/widgets/DataFrame/DataFrame.tsx Applies programmatic selection state from the element and prevents redundant backend syncs.
e2e_playwright/st_dataframe_selections.py Adds an app section demonstrating programmatic selection set/clear via session state.
e2e_playwright/st_dataframe_selections_test.py Adds e2e coverage for programmatic selection and improves a fragment rerun assertion.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 8, 2026

Summary

Adds programmatic dataframe selection via Session State, including backend validation, a new proto field, frontend application logic, and expanded unit/e2e coverage for selection behavior.

Code Quality

The overall structure is clean and consistent with existing patterns, but programmatic row selections are applied using raw row indices without accounting for sorted views. Since selection state is synced using original row indices, a sorted grid can highlight the wrong rows after a programmatic update.

  selectionState.selection?.rows?.forEach(row => {
    rowSelection = rowSelection.add(row)
  })
        selectionState.selection.rows = newSelection.rows
          .toArray()
          .map(row => getOriginalIndex(row))

Test Coverage

Good breadth across Python validation, TS hooks, and e2e coverage for programmatic selection. However, a new e2e test uses wait_for_timeout, which violates the repo’s Playwright best practices and can introduce flakiness.

    # The selection uses a debounce of 150ms; the React effect that applies the
    # programmatic selection to the grid's visual state runs after DOM commit,
    # so we need a brief wait before the manual click.
    app.wait_for_timeout(250)

Backwards Compatibility

Adding an optional proto field is backwards compatible; existing selection behavior is preserved for users who don’t set programmatic selection state.

Security & Risk

No direct security concerns. Main regression risk is incorrect row highlighting when programmatic selection occurs while a dataframe is sorted, which can cause user confusion and incorrect downstream logic.

Accessibility

No new UI elements or interaction patterns were introduced; accessibility impact appears neutral.

Recommendations

  1. Replace wait_for_timeout with wait_until/expect-based synchronization to follow Playwright best practices and reduce flakiness.
  2. Map programmatic row selections from original indices to current displayed indices when sorting is active (e.g., invert getOriginalIndex or build a lookup) so the correct rows are highlighted.

Verdict

CHANGES_REQUESTED: Please address the e2e wait strategy and the sorted-row mapping issue before merge.


This is an automated AI review using gpt-5.2-codex-high. Please verify the feedback and use your judgment.

@github-actions github-actions bot added the do-not-merge PR is blocked from merging label Feb 8, 2026
@lukasmasuch lukasmasuch added the ai-review If applied to PR or issue will run AI review workflow label Feb 9, 2026
@github-actions github-actions bot removed the ai-review If applied to PR or issue will run AI review workflow label Feb 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 9, 2026

Summary

This PR adds the ability to programmatically set st.dataframe selection state via st.session_state. Users can now pre-set, change, or clear selections by assigning a dictionary like st.session_state["key"] = {"selection": {"rows": [...], "columns": [...], "cells": [...]}}. The implementation spans all layers:

  • Proto: New optional string selection_state = 11 field on the Dataframe message.
  • Backend: A _validate_selection_state() function validates user-provided selection state, and the _dataframe() method serializes it to the proto when widget_state.value_changed is True.
  • Frontend: A new useEffect in DataFrame.tsx consumes the one-shot element.selectionState signal. A shared parseSelectionStateToGridSelection() helper in useWidgetState.ts handles parsing for both initial load and programmatic selection, with proper sort-index mapping.
  • Tests: Comprehensive Python unit tests, frontend unit tests, and E2E tests.

Code Quality

Backend (lib/streamlit/elements/arrow.py)

The _validate_selection_state function is well-designed and defensive:

  • Validates the top-level structure strictly (raises on invalid types).
  • Filters out invalid rows (non-int, out of bounds, negative), invalid columns (non-string, non-existent), and invalid cells gracefully.
  • Uses dict.fromkeys for order-preserving deduplication — a clean Python idiom.
  • Correctly enforces single-mode truncation (valid_rows[:1], valid_columns[:1], valid_cells[:1]).
  • Works correctly with AttributeDictionary (inherits from dict) since that's what widget_state.value returns after deserialization.

Frontend (useWidgetState.ts)

  • The parseSelectionStateToGridSelection() helper is well-factored, shared between loadInitialSelectionState and getProgrammaticSelectionState with clear parametric differences (returnEmptySelection, originalToDisplayIndex).
  • The reverse mapping in getProgrammaticSelectionState correctly handles sorted grids by iterating display indices and building original → display lookups.
  • getProgrammaticSelectionState syncs to widget manager with fromUi: false, correctly indicating this isn't a user interaction and shouldn't trigger a rerun.

Frontend (DataFrame.tsx)

  • The one-shot pattern (element.selectionState = null) follows the existing element.setValue pattern in useBasicWidgetState. While directly mutating a prop is not idiomatic React, it's an established convention in this codebase for consuming one-shot signals.
  • The shouldSync: false in processSelectionChange correctly avoids double-syncing since getProgrammaticSelectionState already persists to the widget manager.
  • The loadInitialSelectionState correctly skips when element.selectionState is set, deferring to the programmatic selection effect.

Proto (Dataframe.proto)

  • Field 11 is a new, unused field number — safe addition.
  • optional string is the correct type for a JSON-serialized payload that's only sent when explicitly set.
  • Good inline documentation explaining the semantics (when present vs. absent).

Test refactor (data_editor_test.py)

  • Consolidating 4 separate test_num_rows_* methods into a single @parameterized.expand is a clean improvement aligned with the testing best practices.

Test Coverage

Python unit tests (arrow_dataframe_test.py): Excellent coverage of _validate_selection_state with 18+ test cases covering valid input, invalid input, boundary conditions, deduplication, type filtering, single-mode limits, combined modes, and structural errors. Good use of @parameterized.expand and both positive and negative assertions.

Frontend unit tests (useWidgetState.test.ts): Comprehensive new test suite with 428 lines covering editing state, sync state, selection loading, programmatic selection (rows, columns, cells, clearing, sorted grids, malformed JSON), and form clearing. Good negative assertions (e.g., malformed JSON doesn't persist).

Frontend unit tests (useSelectionHandler.test.ts): The diff shows a refactoring of existing tests into cleaner it.each patterns, which is a positive change.

E2E tests (st_dataframe_selections_test.py): Three new test functions covering:

  • Pre-set row selection, programmatic change, and manual modification after programmatic change.
  • Clearing selection programmatically.
  • Column + cell selection (non-row selection types).
  • Visual snapshot tests for both row and column+cell programmatic selections.
  • Negative assertions (old selection not present after change).

Minor note on E2E flakiness: The app.wait_for_timeout(250) on line 862 violates the "never use wait_for_timeout" guideline. However, the thorough comment explaining why (<canvas> rendering doesn't expose selection state as DOM attributes, so expect/wait_until cannot observe it) justifies this as a legitimate exception. The 250ms buffer over the 150ms debounce should be sufficient.

The existing fragment test fix (test_multi_row_and_multi_column_selection_in_fragment) improves robustness by switching from get_by_text("Runs: 1").to_be_visible() to a filtered get_by_test_id locator with to_have_text(), reducing potential ambiguity.

Backwards Compatibility

This PR is fully backward compatible:

  1. Proto: New optional field 11 — old frontends ignore it, old backends don't send it.
  2. Python API: No new parameters on st.dataframe. The feature works through the existing st.session_state mechanism (setting a key before or via callback).
  3. Frontend: The new useEffect only activates when element.selectionState is present, which only happens with updated backends.
  4. Widget identity: The key_as_main_identity set doesn't change, so existing widget IDs are stable.
  5. Return value: When value_changed is True, the method returns the validated state (matching the format users already expect). When False, it returns widget_state.value as before.

Security & Risk

  1. Input validation: The _validate_selection_state function is thorough — it validates types, checks bounds, and filters invalid entries. Structurally invalid input (non-dict top-level, non-dict selection) raises StreamlitAPIException with clear messages.
  2. No injection risk: Selection state is serialized with json.dumps and only used for data selection (no HTML rendering).
  3. Malformed JSON handling: The frontend gracefully returns undefined for malformed JSON (parseSelectionStateToGridSelection catches JSON parse errors) and does not persist malformed state to the widget manager.
  4. No privilege escalation: Setting session state is already an existing user capability; this just adds a new valid shape for the value.

Accessibility

No new interactive UI elements are introduced. The programmatic selection applies the same visual highlighting (checkmarks, column highlights) that user-initiated selections use, so existing accessibility properties of glide-data-grid are preserved. The toolbar "Clear selection" button continues to work correctly with programmatically set selections.

Recommendations

  1. E2E wait_for_timeout: While the justification is sound, consider adding a brief note in the test that if this proves flaky in CI, the timeout value may need to be increased. This makes future debugging easier.

  2. Documentation: The DataframeSelectionState docstring (lines 96-158 in arrow.py) now mentions programmatic setting ("Selection states can be programmatically set through session state"), which is good. Consider adding a concrete example to the DataframeState docstring (lines 161-183) showing how to set a selection via session state, since that's the entry point users will interact with.

  3. Multi-cell programmatic selection: The PR correctly documents that multi-cell programmatic selection is not supported (because rectangular ranges can't be reconstructed from individual cell positions). This limitation could be mentioned in the user-facing docstring for DataframeSelectionState to set expectations.

Verdict

APPROVED: This is a well-designed, thoroughly tested, and backward-compatible feature addition. The validation logic is defensive, the frontend follows established patterns, test coverage is comprehensive across all layers, and the PR addresses a clear user need (issue #10128). The minor recommendations above are suggestions for polish, not blockers.


This is an automated AI review using opus-4.6-thinking. Please verify the feedback and use your judgment.

@jrieke
Copy link
Copy Markdown
Collaborator

jrieke commented Feb 11, 2026

I think it's a good idea to support setting the value via session state, but I was wondering whether we should also include a parameter for it to make it more discoverable and align with how to set the default value for other widgets. E.g. selection_default (similar to selection_mode on st.dataframe and default on other widgets). What do you think?

@lukasmasuch lukasmasuch added the ai-review If applied to PR or issue will run AI review workflow label Feb 25, 2026
@lukasmasuch
Copy link
Copy Markdown
Collaborator Author

E.g. selection_default (similar to selection_mode on st.dataframe and default on other widgets). What do you think?

I added selection_default as a parameter 👍

@github-actions github-actions bot removed the ai-review If applied to PR or issue will run AI review workflow label Feb 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Summary

This PR adds programmatic selection support for st.dataframe via two mechanisms:

  1. Session state writes: st.session_state["key"] = {"selection": {"rows": [0, 2]}} enables dynamic selection changes on re-run.
  2. selection_default parameter: Provides initial selection on first render without overriding subsequent user selections.

The implementation spans the full stack: a new _validate_selection_state backend function sanitizes user-provided selection states; two new optional proto fields (selection_state, selection_default) carry the data to the frontend; and the frontend's useWidgetState hook parses, applies, and syncs the programmatic selection to the grid widget. Programmatic cell selection is intentionally limited to single-cell mode since multi-cell rectangular ranges cannot be reconstructed from individual cell positions.

Code Quality

Both reviewers agreed the code is well-structured, follows existing codebase patterns, and cleanly separates backend/frontend responsibilities.

Backend highlights:

  • _validate_selection_state is thorough and defensive: validates types, bounds, column existence, deduplicates entries, and enforces single-selection mode limits.
  • The widget_state.value_changed check is the correct hook for detecting programmatic session state changes.
  • The selection_default validation-then-serialization flow correctly filters invalid entries before sending to the frontend.

Frontend highlights:

  • parseSelectionStateToGridSelection cleanly extracts shared parsing logic for both initial load and programmatic selection paths.
  • The originalToDisplay mapping correctly handles sorted grid state, translating backend original indices to display indices.
  • The one-shot useEffect pattern (consuming then clearing element.selectionState) mirrors existing widget patterns in the codebase.

Minor issue — boolean acceptance as row indices (non-blocking):
One reviewer flagged that isinstance(row_idx, int) (line 348) and isinstance(cell[0], int) (line 385) accept bool values because Python's bool is a subclass of int. This means True would be treated as row index 1 and False as 0. After verification, this is technically correct — isinstance(True, int) returns True in Python. However, I assess this as non-blocking for the following reasons:

  • The scenario requires a user to intentionally pass True/False as a row index, which is unlikely in practice.
  • The behavior doesn't cause crashes or data corruption — it simply maps booleans to their integer equivalents.
  • The isinstance(x, int) and not isinstance(x, bool) guard pattern is not used anywhere else in this codebase.
  • This can be addressed as a low-priority follow-up.

Test Coverage

Both reviewers agreed coverage is thorough across all layers:

  • Python unit tests (27+): Validation cases, selection_default integration, out-of-bounds filtering, deduplication, type checking, mode limits, and typing tests.
  • TypeScript unit tests (14+): Programmatic state parsing, default loading, malformed JSON handling, sorted index remapping, and it.each consolidation.
  • E2E tests (4): Default initial value, programmatic row selection via session state, clearing selection, and column/cell selection — with snapshots.
  • Test refactoring: The data_editor_test.py consolidation of four test_num_rows_* methods into @parameterized.expand is a nice cleanup.

Minor gap: No test asserts that boolean row/cell indices are rejected (relates to the non-blocking issue above).

Backwards Compatibility

Both reviewers agreed there are no breaking changes:

  • selection_default defaults to None, preserving existing call signatures.
  • Proto changes are purely additive (two new optional fields that old frontends ignore).
  • _validate_selection_state is a private function, not part of the public API.
  • The element ID computation includes selection_default in its hash, so changing the default correctly resets widget state.

Security & Risk

Both reviewers agreed security posture is sound:

  • Input validation treats all user input as untrusted with isinstance checks at every level.
  • No injection risk — selection state uses JSON-serialized strings over standard proto transport.
  • The one-shot mutation pattern is controlled and cannot be exploited externally.
  • Changes are well-isolated to the dataframe selection path.

Accessibility

Both reviewers agreed there are no accessibility regressions. The feature drives the same visual selection state that users already set via mouse/keyboard — no new interactive elements or ARIA attributes are introduced.

Reviewer Agreement & Disagreements

Topic gpt-5.3-codex-high opus-4.6-thinking Resolution
Overall code quality Strong Strong Agreed
Test coverage Thorough Thorough Agreed
Backwards compatibility No issues No issues Agreed
Security No major issues No major issues Agreed
Bool-as-int acceptance Blocking — correctness bug Not flagged Non-blocking — valid observation, low practical impact; recommend follow-up
Sorting caveat docs Not flagged Suggest documenting more prominently Agree — nice-to-have improvement
Multi-cell silent ignore Not flagged Suggest logged warning Agree — minor UX improvement

Recommendations

  1. (Non-blocking) Consider adding and not isinstance(x, bool) guards to row/cell index validation to reject booleans explicitly, with a corresponding unit test.
  2. (Non-blocking) Document the sorting caveat in the selection_default parameter docstring (row selections reset when users sort).
  3. (Non-blocking) Consider logging a warning when cells are passed in selection_default with multi-cell mode, since they are silently ignored.
  4. (Non-blocking) The wait_for_timeout(250) in the E2E test is well-justified but could become flaky on slow CI — monitor and consider a polling approach if needed.

Verdict

APPROVED: This is a well-designed, thoroughly tested feature with robust input validation, proper frontend integration, and no backwards compatibility concerns. The boolean-as-int edge case raised by one reviewer is technically valid but non-blocking — it's an unlikely user scenario with benign behavior that can be addressed as a follow-up.


Consolidated review by opus-4.6-thinking. This review synthesizes assessments from 2 of 2 expected models.


📋 Review by `gpt-5.3-codex-high`

Summary

This PR introduces programmatic selection support for st.dataframe via session state, adds a selection_default parameter for first-render defaults, and wires both through backend validation, protobuf transport, and frontend application logic. It also includes substantial Python/unit/frontend/e2e coverage and snapshot updates.

Code Quality

Overall structure is solid: the selection-mode normalization split is cleaner, backend/frontend responsibilities are well separated, and the one-shot frontend handling for selectionState is consistent with existing widget event patterns.

One correctness issue needs to be fixed before merge:

  • lib/streamlit/elements/arrow.py:348 and lib/streamlit/elements/arrow.py:385 accept booleans as row indices because bool is a subclass of int in Python.
    • Example: {"selection": {"rows": [True]}} is currently treated as row index 1, and False as 0.
    • This violates the intended validation semantics (row/cell indices should be integer indices, not booleans) and can lead to surprising/incorrect selections.

Test Coverage

Coverage is strong and well-targeted:

  • Python tests add broad validation cases and selection_default integration paths.
  • Frontend tests cover default loading, programmatic state parsing, malformed JSON handling, and sorted index remapping.
  • E2E tests cover programmatic row/column/cell selection, clearing, and initial default behavior.

Gap:

  • I did not find a regression test asserting that boolean row/cell indices are rejected/filtered (the edge case above).

Backwards Compatibility

The API change is additive (selection_default is optional), and existing selection APIs remain intact. Behavior change from read-only session-state selection values to writable values is intentional and aligns with the PR goal. No broad backward-compatibility break is apparent.

Security & Risk

No major security issue identified. Main regression risk is selection-state correctness from loosely-typed Python inputs; the bool-as-int acceptance is the key remaining risk.

Accessibility

No material accessibility regressions observed in the changed frontend logic. The changes are primarily state-handling and selection synchronization, not new interactive controls or semantics.

Recommendations

  1. Update _validate_selection_state to reject booleans for row indices and cell row indices (e.g. isinstance(x, int) and not isinstance(x, bool)).
  2. Add explicit unit tests for boolean index inputs in rows/cells to prevent regressions.

Verdict

CHANGES REQUESTED: The feature is close, but boolean values being accepted as row indices is a correctness bug that should be fixed before merge.


This is an automated AI review by gpt-5.3-codex-high. Please verify the feedback and use your judgment.

📋 Review by `opus-4.6-thinking`

Summary

This PR adds the ability to programmatically set st.dataframe selection state via two mechanisms:

  1. Session state writes: st.session_state["key"] = {"selection": {"rows": [0, 2]}} — enables dynamic selection changes on re-run.
  2. selection_default parameter: Provides initial selection on first render without overriding subsequent user selections.

The implementation spans the full stack: a new _validate_selection_state backend function sanitizes user-provided selection states; two new optional proto fields (selection_state, selection_default) carry the data to the frontend; and the frontend's useWidgetState hook parses, applies, and syncs the programmatic selection to the grid widget. Programmatic cell selection is intentionally limited to single-cell mode since multi-cell rectangular ranges cannot be reconstructed from individual cell positions.

Code Quality

The code is well-structured and follows existing patterns in the codebase.

Backend (lib/streamlit/elements/arrow.py):

  • _validate_selection_state is thorough and defensive: it validates types, bounds, column existence, deduplicates entries, and enforces single-selection mode limits. The use of dict.fromkeys() for order-preserving deduplication is idiomatic Python.
  • The selection_default validation-then-serialization flow at lines 922–930 correctly filters invalid entries before sending to the frontend.
  • The widget_state.value_changed check at line 967 is the right hook point for detecting programmatic session state changes, and the validated state is serialized to proto.selection_state as a one-shot signal.

Frontend (useWidgetState.ts):

  • The parseSelectionStateToGridSelection shared helper (lines 71–155) cleanly extracts common parsing logic for both initial load and programmatic selection paths. The returnEmptySelection parameter correctly distinguishes between "no selection found" (initial load) and "user explicitly cleared" (programmatic).
  • The originalToDisplay mapping in getProgrammaticSelectionState (lines 586–592) correctly handles sorted grid state, where backend original indices must be translated to display indices.
  • The loadInitialSelectionState callback properly gates on element.selectionState to avoid conflicts between programmatic selection and selection defaults (line 471).

Frontend (DataFrame.tsx):

  • The programmatic selection useEffect (lines 386–419) follows the one-shot pattern by clearing element.selectionState after consuming it (line 394). While mutating the proto object is generally an anti-pattern in React, this mirrors existing patterns in the codebase (e.g., element.setValue in useBasicWidgetState) and is safe because the proto is not used as a React state/prop for rendering decisions.
  • The effect's dependency array is comprehensive and includes processSelectionChange, which changes reference on every selection change. The early-return guard (if (!element.selectionState) return) prevents re-application after the one-shot is consumed.

Minor observations:

  • The eslint-disable comments for streamlit-custom/no-hardcoded-theme-values on width: 1 and height: 1 in parseSelectionStateToGridSelection (lines 143–146) are appropriate since these represent a single-cell range, not a theme value.
  • The data_editor test refactoring (data_editor_test.py) consolidating four test_num_rows_* methods into a single @parameterized.expand is a nice cleanup that follows the testing guidelines.

Test Coverage

Test coverage is thorough across all layers:

Python unit tests (27+ tests):

  • TestValidateSelectionState covers: valid/invalid row indices, column name validation, single-mode limits, non-dict errors, non-list graceful handling, deduplication of rows/columns/cells, non-string filtering, missing keys, combined modes, and empty selections.
  • Integration tests for selection_default verify proto serialization, out-of-bounds filtering, and the on_select requirement.
  • Type tests added in dataframe_types.py for the new selection_default parameter.

TypeScript unit tests (14+ tests):

  • getProgrammaticSelectionState: row selection, column selection, cell selection, multi-cell exclusion, empty selection clearing, invalid JSON handling, invalid shape handling, and sort-aware index mapping.
  • loadInitialSelectionState: selection default loading, preference of stored selection over defaults, skip when programmatic selection is set.
  • useSelectionHandler.test.ts: consolidated with it.each for selection mode detection — no functional changes.

E2E tests:

  • test_selection_default_initial_value: Verifies default is applied on frontend, toggling off a pre-selected row yields correct result, includes negative assertion.
  • test_programmatic_row_selection_via_session_state: Verifies pre-set selection, button-triggered changes, and manual interaction after programmatic change. Includes snapshot.
  • test_programmatic_clear_row_selection_via_session_state: Verifies clearing selection programmatically.
  • test_programmatic_column_and_cell_selection: Verifies non-row selection types with snapshot.

The wait_for_timeout(250) in the programmatic row selection test (line 897) is justified by the canvas-based rendering of glide-data-grid, which doesn't expose selection state as DOM attributes. The inline comment explains the reasoning well.

Backwards Compatibility

No breaking changes:

  • The selection_default parameter defaults to None, preserving existing call signatures.
  • Proto changes are purely additive: two new optional fields (11, 12) that old frontends will ignore.
  • The selection_state field is only populated on reruns where widget_state.value_changed is True, so existing behavior is unaffected.
  • _validate_selection_state is a private function (underscore-prefixed), not part of the public API.
  • The DataframeState TypedDict already existed; it now has additional documentation mentioning session state writability.
  • The element ID computation includes selection_default in its hash, so changing the default will reset widget state — this is the correct behavior.

Security & Risk

  • Input validation is robust: _validate_selection_state treats all user input as untrusted, checking types (isinstance) at every level, validating bounds, and filtering invalid entries rather than crashing.
  • No injection risk: Selection state is JSON-serialized strings; the proto transport and JSON parsing on the frontend are standard patterns.
  • One-shot mutation: The element.selectionState = null mutation in the frontend effect is a controlled pattern that prevents stale programmatic selections from being re-applied across renders. It cannot be exploited externally.
  • Low regression risk: The changes are well-isolated to the dataframe selection path and don't affect data editing, column configuration, or other dataframe features.

Accessibility

No new accessibility concerns:

  • The feature programmatically sets the same selection state that users already set via mouse/keyboard interaction. No new interactive elements or ARIA attributes are introduced.
  • The existing checkbox-based row selection UI and column highlight visuals are preserved; the programmatic API simply drives the same visual state.

Recommendations

  1. Consider documenting the sorting caveat more prominently. The DataframeSelectionState docstring includes a .. warning:: about row selections being reset when users sort. This is a significant UX caveat for the new programmatic selection feature. Consider mentioning it in the selection_default parameter docstring as well, since users setting default selections may not expect them to be cleared on sort.

  2. The wait_for_timeout(250) in E2E is well-justified but introduces a minor flakiness risk on slow CI. If this becomes flaky, consider a polling approach that checks the canvas pixel state or the output markdown, though the current approach is pragmatic for canvas-based rendering.

  3. Minor: multi-cell exclusion from selection_default. The docstring mentions that multi-cell selections cannot be set programmatically, but if a user passes cells in a selection_default with multi-cell mode, they are silently ignored. A logged warning could help users understand why their default cells aren't appearing.

Verdict

APPROVED: A well-designed, thoroughly tested feature that adds programmatic dataframe selection via session state and a selection_default parameter, with robust input validation, proper frontend integration, and no backwards compatibility concerns.


This is an automated AI review by opus-4.6-thinking.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 2, 2026

Summary

This PR adds the ability to programmatically set st.dataframe selection state through two mechanisms:

  1. selection_default parameter: Specifies initial selections on first render without overriding subsequent user selections.
  2. Session state writes: Users can assign selection dictionaries to st.session_state["key"] to programmatically update selections at any time (one-shot pattern).

Changes span the full stack: protobuf schema (two new optional fields), Python backend (validation + serialization), React frontend (one-shot programmatic selection consumption), and comprehensive tests (Python unit, TypeScript unit, E2E, typing, and snapshot tests).

Code Quality

All three reviewers agreed the code is well-structured and follows existing codebase patterns. Specific strengths noted across reviews:

  • The _validate_selection_state() function is thorough and defensive, handling non-dict values, non-list fields, out-of-range indices, non-string column names, duplicates, and single-mode truncation.
  • The originalToDisplay map in getProgrammaticSelectionState uses an efficient O(N) approach for mapping row indices when the grid is sorted.
  • The shared parseSelectionStateToGridSelection() function avoids duplication between initial load and programmatic selection.
  • Docstrings are well-written in NumPy style and document the programmatic usage clearly.

Minor concerns (non-blocking):

  1. Prop mutation in DataFrame.tsx (element.selectionState = null): This directly mutates the proto element object. While there is existing precedent in the same file (with a TODO acknowledging the issue), and the eslint suppression is appropriate, a comment linking to the existing TODO would help future developers.

  2. Non-string column label edge case (gpt-5.3-codex-high raised, others did not flag): When data_df.columns contains non-string labels (e.g., integer column names), column_names = list(data_df.columns) captures the raw types, while the validation function requires isinstance(col_name, str). This means programmatic column/cell selections would be silently filtered out for DataFrames with non-string column labels. After verification: This concern is technically valid — pandas allows non-string column labels, and Arrow stringifies them during serialization. However, this is a narrow edge case rather than a critical bug because: (a) the vast majority of DataFrames use string column names, (b) the behavior is silent filtering (no crash), and (c) the existing selection return path from the frontend already returns string column names, so this is a pre-existing representational gap, not a regression introduced by this PR. Recommendation: Address this in a follow-up PR by normalizing column names to their string representation before validation, and add a test for non-string column labels.

Test Coverage

All three reviewers agreed test coverage is excellent. Highlights:

  • Python unit tests: 27+ tests for _validate_selection_state covering valid inputs, invalid indices, invalid column names, mode limits, bad types, missing keys, empty selections, duplicates, and combined modes. Integration tests for selection_default proto serialization and error behavior.
  • TypeScript unit tests: 14+ tests for getProgrammaticSelectionState (disabled modes, row/column/cell selection, empty clearing, invalid columns, multi-cell exclusion, sort mapping, malformed JSON). Tests for loadInitialSelectionState (defaults, stored state preference, programmatic override). Parameterized it.each pattern used appropriately.
  • E2E tests: Playwright tests for default selection, programmatic set/clear, and combined column+cell scenarios with snapshot verification.
  • Type tests: dataframe_types.py updated with assert_type checks.
  • The wait_for_timeout(250) usage in E2E is well-justified in comments (canvas-based rendering prevents DOM assertions).

Gap noted by gpt-5.3-codex-high: No test covers programmatic selection with non-string column labels. This is a valid observation but non-blocking given the edge case nature.

Backwards Compatibility

All three reviewers agreed the changes are fully backwards compatible:

  • selection_default defaults to None, preserving existing behavior.
  • New proto fields (11, 12) are optional, safely ignored by older frontends.
  • writes_allowed=True in check_widget_policies follows the established widget pattern.
  • DataframeSelectionSerde uses a dataclass field with None default.
  • Test refactoring in data_editor_test.py is purely organizational with no behavioral change.

Security & Risk

All three reviewers agreed there are no security concerns. The validation logic is robust — type checks, range checks, column existence checks, and deduplication are applied defensively. Invalid entries are silently filtered. The one-shot signal pattern prevents stale state re-application.

Accessibility

All three reviewers agreed there are no accessibility regressions. No new interactive UI elements are introduced; programmatic selections update the same GridSelection state that user interactions do.

Reviewer Agreement & Disagreement

Aspect gemini-3.1-pro gpt-5.3-codex-high opus-4.6-thinking
Code quality Excellent Issue with non-string columns Excellent (minor prop mutation note)
Test coverage Thorough Strong (gap for non-string cols) Excellent
Backwards compat Fully compatible Compatible (edge case concern) Fully compatible
Security No concerns No concerns No concerns
Verdict APPROVED CHANGES REQUESTED APPROVED

Key disagreement: gpt-5.3-codex-high identified a potential issue with non-string DataFrame column labels causing programmatic selections to be silently dropped. The other two reviewers did not flag this. After independent verification, the concern is technically valid but represents a narrow edge case that does not constitute a regression or critical bug — it is a gap in the new feature for an uncommon input pattern that can be addressed in a follow-up.

Recommendations

  1. (Follow-up) Normalize column_names to string representation before validation to support DataFrames with non-string column labels. Add corresponding tests.
  2. (Non-blocking) Add a comment in DataFrame.tsx linking the element.selectionState = null mutation to the existing TODO about prop mutation at line 202.

Verdict

APPROVED: The majority of reviewers (2/3) approve, and the single concern raised by gpt-5.3-codex-high — while valid — is a narrow edge case affecting non-string column labels, not a critical/blocking issue. The feature is well-implemented, thoroughly tested across all layers, maintains full backwards compatibility, and has robust input validation. The non-string column label gap is recommended for a follow-up PR.


This is a consolidated AI review by opus-4.6-thinking.


📋 Review by `gemini-3.1-pro`

Summary

This PR introduces the ability to programmatically set the st.dataframe selection state via st.session_state and adds a new selection_default parameter to specify initial selections. The backend validates the selection state against the dataframe's dimensions and selection modes, filtering out invalid entries. The frontend correctly applies the programmatic selection and syncs it with the widget manager.

Code Quality

The code is well-structured and follows the established patterns in the Streamlit codebase.

  • The use of element.selectionState = null in DataFrame.tsx to clear the one-shot signal is consistent with how other widgets handle programmatic updates (e.g., setValue in useBasicWidgetState.ts).
  • The validation logic in _validate_selection_state is robust and handles invalid inputs defensively.
  • The use of the originalToDisplay map in getProgrammaticSelectionState is an efficient O(N) approach to map original row indices to display indices when the grid is sorted, avoiding potentially expensive O(K * N) lookups.

Test Coverage

The changes are thoroughly tested across the stack:

  • Python Unit Tests: Extensive parameterized tests cover the validation logic (_validate_selection_state), edge cases (invalid types, out-of-bounds indices), and the selection_default behavior.
  • TypeScript Unit Tests: useWidgetState.test.ts covers the new getProgrammaticSelectionState function, including mapping original indices, handling invalid JSON, and clearing selections. useSelectionHandler.test.ts was nicely refactored using parameterized tests.
  • E2E Tests: Playwright tests verify the programmatic selection behavior, including setting, clearing, and default selections. The use of wait_for_timeout is well-justified in the comments due to the canvas-based rendering of glide-data-grid.

Backwards Compatibility

The changes are fully backwards compatible. The new selection_default parameter is optional, and the programmatic selection feature does not break existing behavior. The protobuf changes add new optional fields (selection_state and selection_default), which is safe and follows the protobuf guidelines.

Security & Risk

No security concerns identified. The backend validates all programmatic selections against the actual dataframe dimensions and column names, preventing invalid indices or names from causing issues or crashes in the frontend.

Accessibility

No new UI elements were added, so there are no new accessibility concerns. The changes maintain the existing accessibility features of the dataframe component.

Recommendations

The implementation is solid, well-tested, and ready to be merged. I have no further recommendations.

Verdict

APPROVED: The PR is well-implemented, thoroughly tested, and follows the project's best practices.


This is an automated AI review by gemini-3.1-pro.

📋 Review by `gpt-5.3-codex-high`

Summary

This PR introduces programmatic st.dataframe selection updates via session state and adds a new selection_default parameter for first-render defaults. It updates backend validation/serialization, adds protobuf fields, applies one-shot programmatic selection updates in the frontend, and expands Python/frontend/e2e test coverage.

Code Quality

One functional issue needs to be addressed before merge:

  • Column-name type mismatch breaks programmatic column/cell selection for non-string dataframe columns.
    In lib/streamlit/elements/arrow.py:892, validation uses raw pandas column labels (list(data_df.columns)), which can be non-string (e.g. int). But _validate_selection_state only accepts string column identifiers (lib/streamlit/elements/arrow.py:374-375, lib/streamlit/elements/arrow.py:395-396). Frontend selection payloads are string-based (frontend/lib/src/components/widgets/DataFrame/hooks/useWidgetState.ts:379-381), so valid programmatic selections can be silently filtered out when dataframe column labels are not strings.

Test Coverage

Coverage is strong overall:

  • Python unit tests comprehensively cover _validate_selection_state and selection_default behavior.
  • Frontend unit tests cover initial/default/programmatic selection loading, mapping across sort order, malformed payload handling, and sync behavior.
  • E2E tests cover default selection, programmatic set/clear, and combined column+cell scenarios.

Gap:

  • No test covers programmatic selection behavior with non-string dataframe column labels (e.g. integer columns), which is where the current bug appears.

Backwards Compatibility

For existing string-column dataframes, behavior is largely backward compatible and aligns with the new feature intent.
For dataframes with non-string column labels, programmatic column/cell selections can be dropped silently, which is a user-visible compatibility issue for valid pandas inputs.

Security & Risk

No direct security vulnerabilities are apparent in these changes.
Main risk is behavioral regression/silent mismatch between Python state and UI selection for certain schemas (non-string column labels).

Accessibility

No accessibility regressions identified in this PR. The frontend changes are state-management and selection-sync logic; they do not introduce new interactive controls or semantic markup changes.

Recommendations

  1. Normalize backend validation column names to the same string representation used by selection payloads before membership checks in _validate_selection_state.
  2. Add backend and frontend regression tests for programmatic column/cell selection with non-string dataframe column labels.

Verdict

CHANGES REQUESTED: The feature implementation is close, but programmatic column/cell selection is currently incorrect for valid dataframes with non-string column labels.


This is an automated AI review by gpt-5.3-codex-high.

📋 Review by `opus-4.6-thinking`

Summary

This PR adds the ability to programmatically set st.dataframe selection state via two mechanisms:

  1. selection_default parameter: A new parameter for specifying initial selections on first render without overriding subsequent user selections.
  2. Session state writes: Users can assign selection dictionaries to st.session_state["key"] to programmatically change selections at any time (one-shot pattern).

Key changes span all layers of the stack:

  • Protobuf (Dataframe.proto): Two new optional fields selection_state (field 11) and selection_default (field 12).
  • Python backend (arrow.py): New selection_default parameter on st.dataframe, _validate_selection_state() function, integration with widget_state.value_changed to detect session state writes.
  • Frontend (useWidgetState.ts, DataFrame.tsx): New getProgrammaticSelectionState and updated loadInitialSelectionState functions, a useEffect to consume the one-shot selectionState signal.
  • Tests: Comprehensive unit tests (Python and TypeScript), E2E tests, snapshot tests, and typing tests.

Code Quality

The code is well-structured and follows existing patterns in the codebase.

Strengths:

  • The _validate_selection_state() function is thorough and defensive — it handles non-dict values, non-list fields, out-of-range indices, non-string column names, duplicates, and single-mode truncation.
  • The shared parseSelectionStateToGridSelection() function avoids duplication between initial load and programmatic selection.
  • The isSelectionState type guard is clean and reusable.
  • Docstrings are well-written in NumPy style and the DataframeState / DataframeSelectionState docstrings have been updated to document the programmatic usage.

Minor concerns:

  1. Prop mutation in DataFrame.tsx (line 389): element.selectionState = null directly mutates the proto element object. While there is precedent in this file for prop mutation (element.editingMode = ... at line 202), this violates React's immutability principle. The existing pattern has a TODO comment acknowledging the issue. The mutation here is used as a "one-shot" signal consumption pattern to prevent re-applying the same programmatic selection on subsequent renders. This works because protobuf objects are mutable plain objects in the JS runtime, but it's fragile if React ever re-renders with the same element reference. The eslint suppression comment is appropriate.

  2. wait_for_timeout(250) in E2E test (st_dataframe_selections_test.py, line 897): The test uses a hardcoded timeout. The comment explains why (canvas-based rendering makes DOM assertions impossible for internal selection state), which is a valid justification per E2E testing guidelines that state wait_for_timeout "should only be used when there is a specific purpose."

  3. Cells validation limited to single-cell mode: The PR correctly documents that multi-cell programmatic selection is not supported because rectangular ranges can't be reconstructed from individual cell positions. This is a reasonable limitation clearly explained in both code comments and docstrings.

Test Coverage

Test coverage is excellent and follows all relevant AGENTS.md guidelines:

Python unit tests (arrow_dataframe_test.py):

  • 27+ tests for _validate_selection_state covering: valid inputs, invalid row indices, negative indices, invalid column names, single-mode limits, non-dict values, non-list fields, missing keys, empty selections, duplicates, non-string types, combined modes.
  • Integration tests for selection_default: proto serialization, return value on first render, error when used without on_select.
  • Type tests added in dataframe_types.py.

Frontend unit tests (useWidgetState.test.ts):

  • 14+ tests for getProgrammaticSelectionState: covers disabled modes, row/column/cell selection, empty selection for clearing, invalid column names, multi-cell mode exclusion, sort mapping (original-to-display index), malformed JSON, invalid shapes.
  • Tests for loadInitialSelectionState: covers selection default loading, preference of stored state over defaults, skipping defaults when programmatic selection is set.
  • Uses it.each for parameterized tests as recommended.

Frontend unit tests (useSelectionHandler.test.ts):

  • Consolidated selection mode detection tests using it.each pattern.

E2E tests (st_dataframe_selections_test.py):

  • Tests selection_default initial value and user interaction after default.
  • Tests programmatic row selection via session state (pre-set, change, clear).
  • Tests programmatic column + cell selection.
  • Snapshot tests for visual verification.
  • Includes negative assertions per E2E guidelines.

Backwards Compatibility

This change is fully backwards compatible:

  • The new selection_default parameter defaults to None, preserving existing behavior.
  • The proto fields selection_state (11) and selection_default (12) are optional, so older frontends will simply ignore them.
  • The writes_allowed=True parameter in check_widget_policies enables session state writes for selection widgets — this was already the mechanism used by other widgets.
  • The DataframeSelectionSerde now accepts an optional selection_default via a dataclass field with None default.
  • The data_editor_test.py refactoring (consolidating test_num_rows_* into parameterized test) is a pure test cleanup with no behavioral change.
  • The DataframeState and DataframeSelectionState docstrings have been updated to document programmatic usage without breaking existing users.

Security & Risk

  • Input validation is robust: _validate_selection_state() validates all user-provided data defensively — type checks for dict, list, int, str; range checks for row indices; column name existence checks; deduplication. Invalid entries are silently filtered rather than causing errors, which is the right approach for data coming from st.session_state.
  • No arbitrary code execution risk: The selection state is JSON-serialized strings validated on both backend and frontend.
  • One-shot signal pattern: The selection_state proto field is only populated when value_changed is True, preventing stale state from being re-applied.
  • No security concerns identified.

Accessibility

  • No new interactive UI elements are introduced — the changes operate on the existing st.dataframe selection mechanism.
  • Programmatic selection updates the same GridSelection state that user interactions do, so screen readers and keyboard navigation should work identically.
  • No accessibility regressions identified.

Recommendations

  1. Consider documenting the wait_for_timeout rationale more prominently: The E2E test at line 897 includes a detailed comment explaining why wait_for_timeout(250) is needed. This is good practice and adequate per guidelines.

  2. Minor: The element.selectionState = null mutation pattern (line 389 of DataFrame.tsx) could benefit from a more explicit comment linking to the existing TODO about prop mutation at line 202, so future developers understand these are the same accepted pattern. However, this is not blocking.

  3. Consider adding a test for selection_default with selection_mode="ignore": While there is a test for test_selection_default_requires_on_select that verifies the error is raised, the test uses the default on_select="ignore" implicitly. The test is adequate as-is.

Verdict

APPROVED: This is a well-implemented, thoroughly tested feature that enables programmatic control of st.dataframe selections. The code follows existing patterns, has robust input validation, maintains full backwards compatibility, and includes comprehensive test coverage across all layers (Python unit tests, TypeScript unit tests, E2E tests, type tests, and snapshot tests).


This is an automated AI review by opus-4.6-thinking.

@lukasmasuch lukasmasuch merged commit e20270b into develop Mar 4, 2026
44 checks passed
@lukasmasuch lukasmasuch deleted the feature/set-dataframe-selections branch March 4, 2026 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:feature PR contains new feature or enhancement implementation impact:users PR changes affect end users

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pre set selections for st.dataframe

4 participants