Skip to content

Fix vega chart unrecognized dataset error#11911

Merged
lukasmasuch merged 20 commits intodevelopfrom
fix/altair-unknown-data-issue
Sep 18, 2025
Merged

Fix vega chart unrecognized dataset error#11911
lukasmasuch merged 20 commits intodevelopfrom
fix/altair-unknown-data-issue

Conversation

@lukasmasuch
Copy link
Copy Markdown
Collaborator

@lukasmasuch lukasmasuch commented Jul 10, 2025

Describe your changes

Attempts to fix an issue with vega charts showing a blank chart with an unrecognised data set error in the dev console. This is caused by two race conditions in the current implementation.

GitHub Issue Link (if applicable)

Testing Plan

  • Added unit tests.
  • Since replicating this is not deterministic, its a bit hard to write proper e2e tests.

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

@snyk-io
Copy link
Copy Markdown
Contributor

snyk-io bot commented Jul 10, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

@lukasmasuch lukasmasuch changed the title Fix vega chart unknown data issue [WIP] Fix vega chart unknown data issue Jul 10, 2025
@lukasmasuch lukasmasuch requested a review from Copilot July 10, 2025 21:16
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jul 10, 2025

✅ PR preview is ready!

Name Link
📦 Wheel file https://core-previews.s3-us-west-2.amazonaws.com/pr-11911/streamlit-1.49.1-py3-none-any.whl
🕹️ Preview app pr-11911.streamlit.app (☁️ Deploy here if not accessible)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses issues with unknown data in the Vega chart by improving how and when the chart view is created and how datasets are updated.

  • Added an isCreatingView flag to prevent data updates during view instantiation.
  • Switched from view.data() to explicit view.remove() + view.insert() when dataset shapes change.
  • Updated guards in createView and updateView to respect the new creation state.
Comments suppressed due to low confidence (1)

frontend/lib/src/components/elements/ArrowVegaLiteChart/useVegaEmbed.ts:204

  • New removal and insertion logic for datasets merits unit tests to verify behavior when data shapes change, ensuring regressions are caught early.
        view.remove(name, truthy)

@itsToggle
Copy link
Copy Markdown

This pull request fixes the empty altair plot bugs 👍

@itsToggle
Copy link
Copy Markdown

Hey @lukasmasuch any chance this fix could make it into an upcoming release? It works perfectly and fixes the missing dataset and empty altair plot issue weve been having. We'd love to use the new data-editor list columns which were just added, as they are essential for our use case :)

@lukasmasuch
Copy link
Copy Markdown
Collaborator Author

@itsToggle sorry for not finishing this up sooner. The 1.49 was already yesterday :( But I will try to get this in for 1.50.

@itsToggle
Copy link
Copy Markdown

itsToggle commented Aug 21, 2025

Oh thank you so much! We are really excited that the data-interaction side of streamlit has been getting a lot of attention recently :)
Happy to test or help in any way!

@itsToggle
Copy link
Copy Markdown

Bump? :)

@lukasmasuch lukasmasuch changed the title [WIP] Fix vega chart unknown data issue [WIP] Fix vega chart unrecognized data set error Sep 17, 2025

// Load the initial set of data into the chart.
const dataArrays = getDataArrays(datasetsRef.current)
const dataArrays = getDataArrays(datasets)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under heavy reruns (e.g., many sidebar widgets), these refs can lag and hold the old chart’s datasets, which don’t match the new spec’s dataset names, causing Vega to throw on insert.

I don't think there is a downside to directly using the initial datasets here.

@lukasmasuch lukasmasuch changed the title [WIP] Fix vega chart unrecognized data set error Fix vega chart unrecognized data set error Sep 17, 2025
@lukasmasuch lukasmasuch requested a review from Copilot September 17, 2025 17:49
@lukasmasuch lukasmasuch added type:bug Something isn't working as expected security-assessment-completed impact:users PR changes affect end users labels Sep 17, 2025
@lukasmasuch lukasmasuch marked this pull request as ready for review September 17, 2025 17:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment on lines +122 to +183
try {
// Finalize the previous view so it can be garbage collected.
finalizeView()

const options = {
// Adds interpreter support for Vega expressions that is compliant with CSP
ast: true,
expr: expressionInterpreter,

// Disable default styles so that vega doesn't inject <style> tags in the
// DOM. We set these styles manually for finer control over them and to
// avoid inlining styles.
tooltip: { disableDefaultStyle: true },
defaultStyle: false,
forceActionsMenu: true,
}

// Finalize the previous view so it can be garbage collected.
finalizeView()

const options = {
// Adds interpreter support for Vega expressions that is compliant with CSP
ast: true,
expr: expressionInterpreter,

// Disable default styles so that vega doesn't inject <style> tags in the
// DOM. We set these styles manually for finer control over them and to
// avoid inlining styles.
tooltip: { disableDefaultStyle: true },
defaultStyle: false,
forceActionsMenu: true,
}

const { vgSpec, view, finalize } = await embed(
containerRef.current,
spec,
options
)
const { vgSpec, view, finalize } = await embed(
containerRef.current,
spec,
options
)

vegaView.current = maybeConfigureSelections(view)
vegaView.current = maybeConfigureSelections(view)

vegaFinalizer.current = finalize
vegaFinalizer.current = finalize

// Load the initial set of data into the chart.
const dataArrays = getDataArrays(datasetsRef.current)
// Load the initial set of data into the chart.
const dataArrays = getDataArrays(latestDatasetsRef.current)

// Heuristic to determine the default dataset name.
const datasetNames = dataArrays ? Object.keys(dataArrays) : []
if (datasetNames.length === 1) {
const [datasetName] = datasetNames
defaultDataName.current = datasetName
} else if (datasetNames.length === 0 && vgSpec.data) {
defaultDataName.current = DEFAULT_DATA_NAME
}
// Heuristic to determine the default dataset name.
const datasetNames = dataArrays ? Object.keys(dataArrays) : []
if (datasetNames.length === 1) {
const [datasetName] = datasetNames
defaultDataName.current = datasetName
} else if (datasetNames.length === 0 && vgSpec.data) {
defaultDataName.current = DEFAULT_DATA_NAME
}

const dataObj = getInlineData(dataRef.current)
if (dataObj) {
vegaView.current.insert(defaultDataName.current, dataObj)
}
if (dataArrays) {
for (const [name, dataArg] of Object.entries(dataArrays)) {
vegaView.current.insert(name, dataArg)
const dataObj = getInlineData(latestDataRef.current)
if (dataObj) {
vegaView.current.insert(defaultDataName.current, dataObj)
}
if (dataArrays) {
for (const [name, dataArg] of Object.entries(dataArrays)) {
vegaView.current.insert(name, dataArg)
}
}
}

await vegaView.current.runAsync()
await vegaView.current.runAsync()

// Fix bug where the "..." menu button overlaps with charts where width is
// set to -1 on first load.
await vegaView.current.resize().runAsync()
// Fix bug where the "..." menu button overlaps with charts where width is
// set to -1 on first load.
await vegaView.current.resize().runAsync()

return vegaView.current
// Record the data used to initialize this view so subsequent updates
// have an accurate previous state to diff against.
prevDataRef.current = latestDataRef.current
prevDatasetsRef.current = latestDatasetsRef.current

return vegaView.current
} finally {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing logical has changed with these lines, its just wrapped a try/finally

@lukasmasuch lukasmasuch added change:bugfix PR contains bug fix implementation and removed type:bug Something isn't working as expected labels Sep 17, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Sep 17, 2025

📈 Frontend coverage change detected

The frontend unit test (vitest) coverage has increased by 0.0800%

  • Current PR: 84.9100% (47567 lines, 7176 missed)
  • Latest develop: 84.8300% (47546 lines, 7209 missed)

🎉 Great job on improving test coverage!

📊 View detailed coverage comparison

@lukasmasuch lukasmasuch changed the title Fix vega chart unrecognized data set error Fix vega chart unrecognized dataset error Sep 17, 2025
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change with all these charts is the side "padding". It seems like the new rendering approach fills out the full available space. I'm not exactly sure why, but it doesn't seem to be broken.

Copy link
Copy Markdown
Contributor

@sfc-gh-nbellante sfc-gh-nbellante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea what caused all the snapshot diffs?

@lukasmasuch
Copy link
Copy Markdown
Collaborator Author

Any idea what caused all the snapshot diffs?

With the change, it uses the full available space with less padding which seems fine. However, the investigation into the root cause is still ongoing.

@sfc-gh-nbellante
Copy link
Copy Markdown
Contributor

Any idea what caused all the snapshot diffs?

With the change, it uses the full available space with less padding which seems fine. However, the investigation into the root cause is still ongoing.

It doesn't look bad per-say, it just makes me nervous when things unintentionally change.

Comment on lines +221 to +224
// The finally block ensures execution flow continues even if view.remove() fails
// This allows us to safely exit the function while still propagating any errors
}
view.insert(name, getDataArray(dataArg))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I've never seen this pattern before. Can you help me understand how the code gets executed in the instance that view.remove(name, truthy) throws an error?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied this pattern from another place in this file, but decided to revert it since it causes some issues with add_rows. But the pattern actually doesn't make a lot of sense :) I changed the other usage to a try/catch.

Comment on lines +173 to +182
void updateView(data, datasets)

// We only want to update the view if the data or datasets change.
// updateView isn't stable because its updated via the isCreatingView flag.
// With updateView as dependency, the chart seems to
// expand within the parent container (less left/right padding).

// eslint-disable-next-line react-hooks/react-compiler
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [data, datasets])
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having updateView as a dependency is causing the difference in snapshots. I think what happens is that updateView gets called, an additional time -> which recalculates the chart and better fills the available space.
But I'm ignoring updateView here for this PR to keep the status quo on how charts are rendered.

// The finally block ensures execution flow continues even if view.remove() fails
// This allows us to safely exit the function while still propagating any errors
} catch {
// The dataset was already removed, so we do nothing
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change mean the the errors will not be propagated now? Is that an intended effect of the change?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I think logging errors here doesn't really matter -> if the dataset doesn't exist, it's fine to just not do anything here. In the current version, if the dataset is missing, it would likely result in some kind of frontend error.


// Initialize the data and datasets refs with the current data and datasets
// This is predominantly used to handle the case where we want to reference
// these in createView before the first render.
Copy link
Copy Markdown
Collaborator

@sfc-gh-lwilby sfc-gh-lwilby Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this comment say updateView? It seems the prevDataRef is used in updateView.

Copy link
Copy Markdown
Collaborator Author

@lukasmasuch lukasmasuch Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the comment since its not fully correct and more confusing. And correct, prevDataRef is only used in updateView. Regarding race condition see the comment below.

// these in createView before the first render.
// Keep latest refs in sync and initialize previous refs before first view
useEffect(() => {
latestDataRef.current = data
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the race condition that the ref was not updated when createView was called because vegaVew.current was null sometimes when this block was called?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue was that if you had too many quick reruns, it could happen that it was trying to access a dataset that was already removed and/or update a dataset that wasn't fully added yet. The change enforces that no update can happen during creation and ensures that we are always using the actual latest data for creating the view. However, the code is a bit complex mainly because of our add_rows support which I'm hoping that we can deprecate at some point (it has an extremely low adoption, but adds a lot of complexity).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That added a lot of complexity for adding width/height as well.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so using the two refs is more just for clarity it seems?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two parts to this. We are using refs to keep the callbacks stable, otherwise we could just use the passed in dataset parameter but that creates some other issues since the callback identity would change. And the added pair of refs is needed to fully fix #11911 since there was a second less frequent race condition where that was caused by this line: https://github.com/streamlit/streamlit/pull/11911/files#diff-827a3eedfdd50708b3577301ac8e982adc7aaedbfc5f735e6967de2dcc6b2657R149

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully analyzed how this race condition actually happens in detail. I hope this resolves it without any side effects, but let's see

@lukasmasuch lukasmasuch merged commit c4812ba into develop Sep 18, 2025
37 checks passed
@lukasmasuch lukasmasuch deleted the fix/altair-unknown-data-issue branch September 18, 2025 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:bugfix PR contains bug fix implementation impact:users PR changes affect end users

Projects

None yet

Development

Successfully merging this pull request may close these issues.

st.altair_chart() "Unrecognized data set" error interrupts chart rendering Empty Altair Plots (Unrecognized Dataset)

5 participants