Skip to content

Conversation

@raethlein
Copy link
Collaborator

@raethlein raethlein commented Dec 4, 2024

Describe your changes

Split into following PRs: #10132, #10147, #10130


Closes #9921

In issue #9921, the problem is that the st.rerun call inside of the dialog triggers a rerun. If you click on the button again before the dialog closes, another RequestRerun is sent to the server. By the time this second request is processed, the full-app rerun has already happened and cleared the fragment storage. So when the second RequestRerun is processed, the fragment id cannot be found anymore and the error is thrown.

This PR adds a previous_fragment_ids list to the FragmentStorage and only raises an error if the fragment id has never been seen. Otherwise, just log is emitted.
Discussion 1: We could always just emit a log and do not introduce the additional list.

After making this change, another issue surfaced: the error itself was gone, but the Dialog stayed open sometimes. There are two different root causes for this (both race conditions interplaying with the ScriptRunner threads etc.):

  1. In forward_msg_queue, the FinishedScript type was always set to STOPPED_EARLY_FOR_RERUN. Now, the status is not changed for full app reruns by fragment runs. This is correct since a fragment indeed does not stop full app reruns. This only happens when the fragment run is so fast that the browser queue is not cleared yet.
  2. In app_session, a fragment run might create a new ScriptRunner when the current ScriptRunner is in state STOPPED (in this case, success here is false and the new ScriptRunner is created). This will lead to all events from the previous script runner being ignored (see here). When the full app rerun ScriptRunner is done (STOPPED) but its events are not processed before the new ScriptRunner is created, its finished message is not sent to the frontend and no cleanup is happening.

GitHub Issue Link (if applicable)

Testing Plan

  • Explanation of why no additional tests are needed
  • Unit Tests (JS and/or Python): Added unit tests for the new behavior
  • E2E Tests: Reproducing the issues requires multiple tries as the issue is caused by some race conditions. This would lead to flaky E2E tests so the unit tests should be sufficient.
  • Any manual testing needed?

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

@raethlein raethlein added security-assessment-completed Security assessment has been completed for PR change:bugfix PR contains bug fix implementation impact:users PR changes affect end users labels Dec 4, 2024
@raethlein raethlein changed the title Fix showing exception and potential race condition [Draft] Fix fragment runtime errors for just removed fragments Dec 4, 2024
@raethlein raethlein requested a review from kmcgrady December 4, 2024 18:01
Copy link
Collaborator

@kmcgrady kmcgrady left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the approach seems solid. My only question is should we consider clearing the fragment id from the past after some time (or after a rerun or two without it?). Just seems like this storage can get arbitrarily large. We may also be fine with this because the storage is in the AppSession which is tied to a browser tab living (which is very unlikely). Just wondering if we can make a guarantee where we know for sure that we should never see an old fragment id.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might benefit from a comment to clarify how this relates to the fragments case (like the one from the PR description)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, I have added a comment to explain this in more detail

@raethlein
Copy link
Collaborator Author

Yeah that's a great point. I think the best options might be to use a cache counter and clear them after clear was called X times on the fragment_storage. Similar to what you wrote about clearing after 2 reruns. Tying it to the clear function might be clearer / more easy to grasp than tying it to reruns which has to be tracked somewhere else etc.
Or we don't raise the exception ever (also if we have never seen the fragment id at all) and simply emit a log which would allow us to remove the previous list completely.

@raethlein raethlein marked this pull request as ready for review December 6, 2024 23:19
@raethlein raethlein changed the title [Draft] Fix fragment runtime errors for just removed fragments Fix fragment runtime errors for just removed fragments Dec 13, 2024
@raethlein raethlein force-pushed the fix/issue-9921 branch 2 times, most recently from 9d62354 to 6ac2fbf Compare December 14, 2024 10:06
@lukasmasuch
Copy link
Collaborator

lukasmasuch commented Jan 6, 2025

PR looks good, but it seems that the PR contains two slightly independent fixes. Would it be possible to split it into two PRs that might make it easier to handle follow-up issues that might arise from the fixes.

Discussion 1: We could always just emit a log and do not introduce the additional list.

I'm leaning slightly more towards keeping it simpler, just always logging it to the console and never raising an exception that's shown to the user. I don't think it adds a lot of benefit to show this error to the user in any situation and the developer can see it in the logs -> and adding the additional complexity just to be able to show the error sometimes on UI doesn't feel to be worth it.

@raethlein
Copy link
Collaborator Author

Good points, I have started to split the PRs, here is the first one: PR-10130 to change from throwing an error to just showing a log

raethlein added a commit that referenced this pull request Jan 8, 2025
…10130)

## Describe your changes

After the discussion in context of this PR:
#9965, we want to show a log
instead of raising an error in all cases to keep it simple and not
introduce another list / more tech debt.

## GitHub Issue Link (if applicable)

## Testing Plan

- Unit Tests (JS and/or Python)
- Remove a couple of unit tests that checked for the exception to be
thrown

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
@raethlein
Copy link
Collaborator Author

Close in favor of the split PRs: #10132, #10147, #10130

@raethlein raethlein closed this Jan 10, 2025
edegp pushed a commit to edegp/streamlit that referenced this pull request Jan 19, 2025
…treamlit#10130)

## Describe your changes

After the discussion in context of this PR:
streamlit#9965, we want to show a log
instead of raising an error in all cases to keep it simple and not
introduce another list / more tech debt.

## GitHub Issue Link (if applicable)

## Testing Plan

- Unit Tests (JS and/or Python)
- Remove a couple of unit tests that checked for the exception to be
thrown

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
akramsystems added a commit to akramsystems/streamlit that referenced this pull request Apr 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:bugfix PR contains bug fix implementation impact:users PR changes affect end users security-assessment-completed Security assessment has been completed for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Calling st.rerun in inside dialog in quick succession causes crash

4 participants