Fix vega chart unrecognized dataset error#11911
Conversation
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
✅ PR preview is ready!
|
There was a problem hiding this comment.
Pull Request Overview
This PR addresses issues with unknown data in the Vega chart by improving how and when the chart view is created and how datasets are updated.
- Added an
isCreatingViewflag to prevent data updates during view instantiation. - Switched from
view.data()to explicitview.remove()+view.insert()when dataset shapes change. - Updated guards in
createViewandupdateViewto respect the new creation state.
Comments suppressed due to low confidence (1)
frontend/lib/src/components/elements/ArrowVegaLiteChart/useVegaEmbed.ts:204
- New removal and insertion logic for datasets merits unit tests to verify behavior when data shapes change, ensuring regressions are caught early.
view.remove(name, truthy)
|
This pull request fixes the empty altair plot bugs 👍 |
|
Hey @lukasmasuch any chance this fix could make it into an upcoming release? It works perfectly and fixes the missing dataset and empty altair plot issue weve been having. We'd love to use the new data-editor list columns which were just added, as they are essential for our use case :) |
|
@itsToggle sorry for not finishing this up sooner. The 1.49 was already yesterday :( But I will try to get this in for 1.50. |
|
Oh thank you so much! We are really excited that the data-interaction side of streamlit has been getting a lot of attention recently :) |
|
Bump? :) |
|
|
||
| // Load the initial set of data into the chart. | ||
| const dataArrays = getDataArrays(datasetsRef.current) | ||
| const dataArrays = getDataArrays(datasets) |
There was a problem hiding this comment.
Under heavy reruns (e.g., many sidebar widgets), these refs can lag and hold the old chart’s datasets, which don’t match the new spec’s dataset names, causing Vega to throw on insert.
I don't think there is a downside to directly using the initial datasets here.
frontend/lib/src/components/elements/ArrowVegaLiteChart/useVegaEmbed.ts
Outdated
Show resolved
Hide resolved
frontend/lib/src/components/elements/ArrowVegaLiteChart/useVegaEmbed.ts
Outdated
Show resolved
Hide resolved
| try { | ||
| // Finalize the previous view so it can be garbage collected. | ||
| finalizeView() | ||
|
|
||
| const options = { | ||
| // Adds interpreter support for Vega expressions that is compliant with CSP | ||
| ast: true, | ||
| expr: expressionInterpreter, | ||
|
|
||
| // Disable default styles so that vega doesn't inject <style> tags in the | ||
| // DOM. We set these styles manually for finer control over them and to | ||
| // avoid inlining styles. | ||
| tooltip: { disableDefaultStyle: true }, | ||
| defaultStyle: false, | ||
| forceActionsMenu: true, | ||
| } | ||
|
|
||
| // Finalize the previous view so it can be garbage collected. | ||
| finalizeView() | ||
|
|
||
| const options = { | ||
| // Adds interpreter support for Vega expressions that is compliant with CSP | ||
| ast: true, | ||
| expr: expressionInterpreter, | ||
|
|
||
| // Disable default styles so that vega doesn't inject <style> tags in the | ||
| // DOM. We set these styles manually for finer control over them and to | ||
| // avoid inlining styles. | ||
| tooltip: { disableDefaultStyle: true }, | ||
| defaultStyle: false, | ||
| forceActionsMenu: true, | ||
| } | ||
|
|
||
| const { vgSpec, view, finalize } = await embed( | ||
| containerRef.current, | ||
| spec, | ||
| options | ||
| ) | ||
| const { vgSpec, view, finalize } = await embed( | ||
| containerRef.current, | ||
| spec, | ||
| options | ||
| ) | ||
|
|
||
| vegaView.current = maybeConfigureSelections(view) | ||
| vegaView.current = maybeConfigureSelections(view) | ||
|
|
||
| vegaFinalizer.current = finalize | ||
| vegaFinalizer.current = finalize | ||
|
|
||
| // Load the initial set of data into the chart. | ||
| const dataArrays = getDataArrays(datasetsRef.current) | ||
| // Load the initial set of data into the chart. | ||
| const dataArrays = getDataArrays(latestDatasetsRef.current) | ||
|
|
||
| // Heuristic to determine the default dataset name. | ||
| const datasetNames = dataArrays ? Object.keys(dataArrays) : [] | ||
| if (datasetNames.length === 1) { | ||
| const [datasetName] = datasetNames | ||
| defaultDataName.current = datasetName | ||
| } else if (datasetNames.length === 0 && vgSpec.data) { | ||
| defaultDataName.current = DEFAULT_DATA_NAME | ||
| } | ||
| // Heuristic to determine the default dataset name. | ||
| const datasetNames = dataArrays ? Object.keys(dataArrays) : [] | ||
| if (datasetNames.length === 1) { | ||
| const [datasetName] = datasetNames | ||
| defaultDataName.current = datasetName | ||
| } else if (datasetNames.length === 0 && vgSpec.data) { | ||
| defaultDataName.current = DEFAULT_DATA_NAME | ||
| } | ||
|
|
||
| const dataObj = getInlineData(dataRef.current) | ||
| if (dataObj) { | ||
| vegaView.current.insert(defaultDataName.current, dataObj) | ||
| } | ||
| if (dataArrays) { | ||
| for (const [name, dataArg] of Object.entries(dataArrays)) { | ||
| vegaView.current.insert(name, dataArg) | ||
| const dataObj = getInlineData(latestDataRef.current) | ||
| if (dataObj) { | ||
| vegaView.current.insert(defaultDataName.current, dataObj) | ||
| } | ||
| if (dataArrays) { | ||
| for (const [name, dataArg] of Object.entries(dataArrays)) { | ||
| vegaView.current.insert(name, dataArg) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| await vegaView.current.runAsync() | ||
| await vegaView.current.runAsync() | ||
|
|
||
| // Fix bug where the "..." menu button overlaps with charts where width is | ||
| // set to -1 on first load. | ||
| await vegaView.current.resize().runAsync() | ||
| // Fix bug where the "..." menu button overlaps with charts where width is | ||
| // set to -1 on first load. | ||
| await vegaView.current.resize().runAsync() | ||
|
|
||
| return vegaView.current | ||
| // Record the data used to initialize this view so subsequent updates | ||
| // have an accurate previous state to diff against. | ||
| prevDataRef.current = latestDataRef.current | ||
| prevDatasetsRef.current = latestDatasetsRef.current | ||
|
|
||
| return vegaView.current | ||
| } finally { |
There was a problem hiding this comment.
Nothing logical has changed with these lines, its just wrapped a try/finally
📈 Frontend coverage change detectedThe frontend unit test (vitest) coverage has increased by 0.0800%
🎉 Great job on improving test coverage! |
There was a problem hiding this comment.
The only change with all these charts is the side "padding". It seems like the new rendering approach fills out the full available space. I'm not exactly sure why, but it doesn't seem to be broken.
sfc-gh-nbellante
left a comment
There was a problem hiding this comment.
Any idea what caused all the snapshot diffs?
With the change, it uses the full available space with less padding which seems fine. However, the investigation into the root cause is still ongoing. |
It doesn't look bad per-say, it just makes me nervous when things unintentionally change. |
| // The finally block ensures execution flow continues even if view.remove() fails | ||
| // This allows us to safely exit the function while still propagating any errors | ||
| } | ||
| view.insert(name, getDataArray(dataArg)) |
There was a problem hiding this comment.
question: I've never seen this pattern before. Can you help me understand how the code gets executed in the instance that view.remove(name, truthy) throws an error?
There was a problem hiding this comment.
I just copied this pattern from another place in this file, but decided to revert it since it causes some issues with add_rows. But the pattern actually doesn't make a lot of sense :) I changed the other usage to a try/catch.
| void updateView(data, datasets) | ||
|
|
||
| // We only want to update the view if the data or datasets change. | ||
| // updateView isn't stable because its updated via the isCreatingView flag. | ||
| // With updateView as dependency, the chart seems to | ||
| // expand within the parent container (less left/right padding). | ||
|
|
||
| // eslint-disable-next-line react-hooks/react-compiler | ||
| // eslint-disable-next-line react-hooks/exhaustive-deps | ||
| }, [data, datasets]) |
There was a problem hiding this comment.
Having updateView as a dependency is causing the difference in snapshots. I think what happens is that updateView gets called, an additional time -> which recalculates the chart and better fills the available space.
But I'm ignoring updateView here for this PR to keep the status quo on how charts are rendered.
| // The finally block ensures execution flow continues even if view.remove() fails | ||
| // This allows us to safely exit the function while still propagating any errors | ||
| } catch { | ||
| // The dataset was already removed, so we do nothing |
There was a problem hiding this comment.
Does this change mean the the errors will not be propagated now? Is that an intended effect of the change?
There was a problem hiding this comment.
Yeah, but I think logging errors here doesn't really matter -> if the dataset doesn't exist, it's fine to just not do anything here. In the current version, if the dataset is missing, it would likely result in some kind of frontend error.
|
|
||
| // Initialize the data and datasets refs with the current data and datasets | ||
| // This is predominantly used to handle the case where we want to reference | ||
| // these in createView before the first render. |
There was a problem hiding this comment.
Should this comment say updateView? It seems the prevDataRef is used in updateView.
There was a problem hiding this comment.
I removed the comment since its not fully correct and more confusing. And correct, prevDataRef is only used in updateView. Regarding race condition see the comment below.
| // these in createView before the first render. | ||
| // Keep latest refs in sync and initialize previous refs before first view | ||
| useEffect(() => { | ||
| latestDataRef.current = data |
There was a problem hiding this comment.
Is the race condition that the ref was not updated when createView was called because vegaVew.current was null sometimes when this block was called?
There was a problem hiding this comment.
I think the issue was that if you had too many quick reruns, it could happen that it was trying to access a dataset that was already removed and/or update a dataset that wasn't fully added yet. The change enforces that no update can happen during creation and ensures that we are always using the actual latest data for creating the view. However, the code is a bit complex mainly because of our add_rows support which I'm hoping that we can deprecate at some point (it has an extremely low adoption, but adds a lot of complexity).
There was a problem hiding this comment.
That added a lot of complexity for adding width/height as well.
There was a problem hiding this comment.
OK, so using the two refs is more just for clarity it seems?
There was a problem hiding this comment.
There are two parts to this. We are using refs to keep the callbacks stable, otherwise we could just use the passed in dataset parameter but that creates some other issues since the callback identity would change. And the added pair of refs is needed to fully fix #11911 since there was a second less frequent race condition where that was caused by this line: https://github.com/streamlit/streamlit/pull/11911/files#diff-827a3eedfdd50708b3577301ac8e982adc7aaedbfc5f735e6967de2dcc6b2657R149
There was a problem hiding this comment.
I haven't fully analyzed how this race condition actually happens in detail. I hope this resolves it without any side effects, but let's see
Describe your changes
Attempts to fix an issue with vega charts showing a blank chart with an unrecognised data set error in the dev console. This is caused by two race conditions in the current implementation.
GitHub Issue Link (if applicable)
Testing Plan
Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.