Conversation
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
✅ PR preview is ready!
|
st.bar_chartsort parameter to st.bar_chart + disable default sorting
📈 Python coverage change detectedThe Python unit test coverage has increased by 0.0165%
✅ Coverage change is within normal range. Coverage by files
|
| assert chart_spec["mark"] in [altair_type, {"type": altair_type}] | ||
| assert chart_spec["encoding"]["x"]["type"] == "ordinal" | ||
| assert chart_spec["encoding"]["x"]["sort"] == ["c", "b", "a"] | ||
| assert chart_spec["encoding"]["x"]["sort"] is None |
There was a problem hiding this comment.
Note sure if this test still makes sense now or if there's another way to check that the ordered categories are applied properly, given that we're disabling sorting now by setting sort=None.
There was a problem hiding this comment.
I just ran this manually and it seems like it's ordered b -> c -> a, i.e. in the order of the data. Not sure how before this even worked to get the right order from the Pandas dataframe, given that Altair should sort the categories alphabetically by default?
There was a problem hiding this comment.
not sure how before this even worked to get the right order from the Pandas dataframe, given that Altair should sort the categories alphabetically by default?
I believe Altair has some extra treatment if the column is categorical with ordered=True. Also see this old issue and PR related to this aspect:
- Using ordered categorical columns with built-in charts causes an error #7776
- Fix handling of ordinal columns for builtin-charts #7771
I'm wondering if we should add some special handling in this case (if the column is categorical + ordered=True)?
There was a problem hiding this comment.
Resolved this by not doing any special casing but allowing sort=True to get the old behavior with Altair's automatic sorting back. (The drawback of special casing is that then there's no way to disable this sorting...).
| sort : str or None | ||
| How to sort the bars. If ``None`` (default), bars are shown in data | ||
| order (no sorting). If this is the name of a column (e.g. ``"col1"``), | ||
| bars are sorted by that column in ascending order. If this is the | ||
| name of a column prefixed with a minus sign (e.g. ``"-col1"``), bars are | ||
| sorted by that column in descending order. |
There was a problem hiding this comment.
One consideration: We could also support bool here with True being the current behaviour.
There was a problem hiding this comment.
Hmm yeah pretty neat idea. I changed it now so it uses False by default and allows setting it to True to have Altair's automatic sorting. This also resolves the ordered categorical issue above, since you can just set sort=True to have the same sorting of ordered categorical values as before.
There was a problem hiding this comment.
I've also been thinking about changing the default to True to break fewer existing charts, but I think False by default is much less confusing. Plotly and matplotlib also don't apply any automatic sorting for bar charts by default.
There was a problem hiding this comment.
LGTM 👍 but it might be good to validate if it makes sense to add special treatment for ordered categorical columns, see this comment: #12339 (comment)
There was a problem hiding this comment.
Pull Request Overview
This PR adds a new sort parameter to st.bar_chart to give users explicit control over how bars are ordered, while also disabling Altair's default alphabetical sorting behavior. This addresses user confusion and provides more predictable chart behavior.
Key changes:
- Adds
sortparameter supporting boolean values and column name strings (with optional "-" prefix for descending order) - Disables automatic sorting by default (
sort=False) for more predictable behavior - Updates existing snapshot tests to reflect the new default behavior
Reviewed Changes
Copilot reviewed 6 out of 57 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/streamlit/elements/vega_charts.py | Adds the sort parameter to the bar_chart function signature and documentation |
| lib/streamlit/elements/lib/built_in_chart_utils.py | Core implementation of sort functionality with column parsing, validation, and encoding logic |
| lib/tests/streamlit/elements/vega_charts_test.py | Comprehensive unit tests for the new sort parameter behavior |
| lib/streamlit/elements/arrow.py | Updates add_rows functionality to preserve sort parameter |
| e2e_playwright/st_bar_chart.py | Adds test cases for various sort scenarios |
| e2e_playwright/st_bar_chart_test.py | Updates test expectations and adds snapshot tests for sort behavior |
| def _parse_sort_column(df: pd.DataFrame, sort_from_user: bool | str) -> str | None: | ||
| if sort_from_user is False or sort_from_user is True: | ||
| return None | ||
|
|
||
| sort_column = sort_from_user.removeprefix("-") | ||
| if sort_column not in df.columns: | ||
| raise StreamlitColumnNotFoundError(df, sort_column) | ||
|
|
||
| return sort_column |
There was a problem hiding this comment.
The function doesn't handle the case where sort_from_user is an empty string. An empty string would pass the boolean checks and then removeprefix('-') would return an empty string, which would likely not be found in df.columns and raise an error. Consider adding an explicit check for empty strings.
There was a problem hiding this comment.
This would already trigger an error because the empty string wouldn't be in the dataframe columns.
| # String: sort by column name (optional '-' prefix for descending) | ||
| sort_order: Literal["ascending", "descending"] | ||
| if sort_from_user.startswith("-"): | ||
| sort_order = "descending" | ||
| else: | ||
| sort_order = "ascending" | ||
| sort_field = sort_from_user.removeprefix("-") |
There was a problem hiding this comment.
Similar to _parse_sort_column, this function doesn't handle empty strings. If sort_from_user is an empty string, it would create a SortField with an empty field name, which could cause issues in Altair. Consider adding validation to ensure the string is not empty.
| # String: sort by column name (optional '-' prefix for descending) | |
| sort_order: Literal["ascending", "descending"] | |
| if sort_from_user.startswith("-"): | |
| sort_order = "descending" | |
| else: | |
| sort_order = "ascending" | |
| sort_field = sort_from_user.removeprefix("-") | |
| sort_field = sort_from_user.removeprefix("-") | |
| if not sort_field: | |
| raise StreamlitAPIException( | |
| "Sort column name cannot be empty. Please provide a valid column name for sorting." | |
| ) |
There was a problem hiding this comment.
Already caught by the check above.
| def _parse_sort_column(df: pd.DataFrame, sort_from_user: bool | str) -> str | None: | ||
| if sort_from_user is False or sort_from_user is True: | ||
| return None | ||
|
|
||
| sort_column = sort_from_user.removeprefix("-") | ||
| if sort_column not in df.columns: | ||
| raise StreamlitColumnNotFoundError(df, sort_column) | ||
|
|
||
| return sort_column | ||
|
|
There was a problem hiding this comment.
The function _parse_sort_column lacks a docstring. Per Python Guide docstring requirements, functions should include docstrings using Numpydoc style. Although this function is prefixed with underscore indicating private scope, it supports chart utilities that users access through st.bar_chart. Add a docstring explaining the function's purpose, parameters, and return value in Numpydoc format.
| def _parse_sort_column(df: pd.DataFrame, sort_from_user: bool | str) -> str | None: | |
| if sort_from_user is False or sort_from_user is True: | |
| return None | |
| sort_column = sort_from_user.removeprefix("-") | |
| if sort_column not in df.columns: | |
| raise StreamlitColumnNotFoundError(df, sort_column) | |
| return sort_column | |
| def _parse_sort_column(df: pd.DataFrame, sort_from_user: bool | str) -> str | None: | |
| """ | |
| Parse the sort column parameter for chart utilities. | |
| Parameters | |
| ---------- | |
| df : pd.DataFrame | |
| The DataFrame containing the data to be plotted. | |
| sort_from_user : bool or str | |
| The sort parameter provided by the user. If a string, it specifies | |
| the column to sort by (with optional "-" prefix for descending order). | |
| If a boolean, it's ignored and None is returned. | |
| Returns | |
| ------- | |
| str or None | |
| The name of the column to sort by without any direction prefix, | |
| or None if sort_from_user is a boolean. | |
| Raises | |
| ------ | |
| StreamlitColumnNotFoundError | |
| If the specified sort column doesn't exist in the DataFrame. | |
| """ | |
| if sort_from_user is False or sort_from_user is True: | |
| return None | |
| sort_column = sort_from_user.removeprefix("-") | |
| if sort_column not in df.columns: | |
| raise StreamlitColumnNotFoundError(df, sort_column) | |
| return sort_column | |
Spotted by Diamond (based on custom rule: Python Guide)
Is this helpful? React 👍 or 👎 to let us know.
sort parameter to st.bar_chart + disable default sortingsort parameter to st.bar_chart
| the chart's Altair spec. As a result this is easier to use for many | ||
| "just plot this" scenarios, while being less customizable. | ||
|
|
||
| If ``st.line_chart`` does not guess the data specification |
There was a problem hiding this comment.
Removing these lines because they aren't really accurate anymore today given that you can use x and y to specify which columns to use.
Describe your changes
This PR adds a
sortparameter tost.bar_chartto sort the bars by a column (sort="col1"for ascending andsort="-col1"for descending order).The default is
sort=Truefor now, which keeps Altair's default sorting. We might change this later on to disable sorting by default but want to discuss it first.GitHub Issue Link (if applicable)
Closes #385
Closes #7111
Testing Plan
Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.