Skip to content

Conversation

@jason-simmons
Copy link
Member

PipelineCacheDataPersist runs on a worker thread pool. The raster thread may be executing rendering operations while PipelineCacheDataPersist is reading the state of the pipeline cache.

PipelineCacheDataPersist calls vkGetPipelineCacheData to get the size required for the cache data buffer and then calls it again to fill the buffer. If the cache state changed between those two calls, then the count of bytes written may be less than the size returned by the first call. In that case, PipelineCacheDataPersist should only write the portion of the buffer that was filled.

This PR also adds a mutex to PipelineCacheVK::PersistCacheToDisk. PipelineLibraryVK could queue multiple PersistCacheToDisk tasks to different worker pool threads. These tasks should not run concurrently.

See #172624

PipelineCacheDataPersist runs on a worker thread pool.  The raster thread may
be executing rendering operations while PipelineCacheDataPersist is reading
the state of the pipeline cache.

PipelineCacheDataPersist calls vkGetPipelineCacheData to get the size required
for the cache data buffer and then calls it again to fill the buffer.  If the
cache state changed between those two calls, then the count of bytes written
may be less than the size returned by the first call.  In that case,
PipelineCacheDataPersist should only write the portion of the buffer that was
filled.

This PR also adds a mutex to PipelineCacheVK::PersistCacheToDisk.
PipelineLibraryVK could queue multiple PersistCacheToDisk tasks to different
worker pool threads.  These tasks should not run concurrently.

See flutter#172624
@github-actions github-actions bot added engine flutter/engine related. See also e: labels. e: impeller Impeller rendering backend issues and features requests labels Jul 31, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces improvements to the Vulkan pipeline cache persistence logic. It handles scenarios where the pipeline cache size changes between the size query and the data fetch, ensuring data integrity, which is validated by a new unit test. Additionally, it enhances thread safety by adding a mutex to PipelineCacheVK::PersistCacheToDisk, preventing race conditions from concurrent calls.

Copy link
Member

@chinmaygarde chinmaygarde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all pipeline creation and cache persistence operations happen via PipelineCacheVK, I think we can entirely avoid pipeline creation operations (that might affect the cache) while the cache is being written. We can do this using an RWMutex that where the CreatePipeline calls acquire a read lock (this is safe because our pipelines and pipeline caches are not created using the EXTERNALLY_SYNCHONIZED_BIT) while PipelineCacheDataPersist acquires a write lock.

This way, pipeline creation can be done in parallel except when the cache is being persisted to disk. We won't get into the business of the cache being fiddled with in the middle of a write operation.

Also, when using the mutexes, please also decorate the guarded resources with IPLR_GUARDED_BY.

With this layer of protection, your code should be unnecessary right? Though, from the perspective of defensive coding, I think it still makes sense.

@chinmaygarde
Copy link
Member

Ooh, I also see that we documented PipelineCacheDataPersist as needing the cache value to the externally synchronized. I think I wrote that but didn't RTFM. Sorry.

@jason-simmons
Copy link
Member Author

The PipelineCacheVK::persist_mutex_ is serializing writes to the flutter.impeller.vkcache file. Added a comment.

The changes to PipelineCacheDataPersist in this PR are necessary. Regardless of how synchronization is done, PipelineCacheDataPersist should always write the file based on the number of bytes that were actually written by vkGetPipelineCacheData.

In particular, there were reports of drivers that were often returning VK_INCOMPLETE, implying that vkGetPipelineCacheData may not be completely filling the buffer (see 95e1012)

I'd prefer not to guard the VkPipelineCache with a reader-writer lock. Guarding all of PipelineCacheDataPersist with an RWMutex could cause pipeline creation tasks to be blocked by persist calls that are writing to disk.

If we only wrap the vkGetPipelineCacheData calls with the RWMutex, then we still need another mutex to serialize PipelineCacheDataPersist so that concurrent persist tasks do not have contention over the cache file.

With this PR, if any pipelines are created during or after a call to PipelineCacheDataPersist, then PipelineLibraryVK will mark the cache as dirty and eventually queue another cache persist task. The second task will run after the first task releases the persist_mutex_ and will write the latest cache data.

Copy link
Member

@chinmaygarde chinmaygarde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

@chinmaygarde chinmaygarde added the autosubmit Merge PR when tree becomes green via auto submit App label Aug 4, 2025
@auto-submit auto-submit bot added this pull request to the merge queue Aug 4, 2025
Merged via the queue into flutter:master with commit beda687 Aug 4, 2025
178 checks passed
@flutter-dashboard flutter-dashboard bot removed the autosubmit Merge PR when tree becomes green via auto submit App label Aug 4, 2025
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Aug 4, 2025
auto-submit bot pushed a commit to flutter/packages that referenced this pull request Aug 4, 2025
Roll Flutter from 871849e4b6bf to beda687d63f2 (34 revisions)

flutter/flutter@871849e...beda687

2025-08-04 [email protected] [Impeller] Improvements to the Vulkan pipeline cache data writer (flutter/flutter#173014)
2025-08-04 [email protected] [Impeller] Terminate the fence waiter but do not reset it during ContextVK shutdown (flutter/flutter#173085)
2025-08-04 [email protected] Add the 'windowing' feature flag and use to wrap an implementation for regular windows that always throws (flutter/flutter#172478)
2025-08-04 [email protected] licenses_cpp: reland switch 4 (flutter/flutter#173139)
2025-08-04 [email protected] Roll Skia from 763bba9c33fd to dce9550a1356 (1 revision) (flutter/flutter#173219)
2025-08-04 [email protected] Improve robustness of comment detection when using flutter analyze --suggestions (flutter/flutter#172977)
2025-08-04 [email protected] Make sure that a LicensePage doesn't crash in 0x0 environment (flutter/flutter#172610)
2025-08-04 [email protected] Roll Packages from f0645d8 to 1a72287 (4 revisions) (flutter/flutter#173215)
2025-08-04 [email protected] Roll Skia from edf0f8a5bba6 to 763bba9c33fd (1 revision) (flutter/flutter#173213)
2025-08-04 [email protected] [web] Add Intl.Locale to parse browser languages. (flutter/flutter#172964)
2025-08-04 [email protected] Roll Skia from 439f80973f4a to edf0f8a5bba6 (2 revisions) (flutter/flutter#173204)
2025-08-04 [email protected] Roll Skia from 9d30cc96a3b2 to 439f80973f4a (1 revision) (flutter/flutter#173201)
2025-08-04 [email protected] Roll Skia from 39c70f883c0e to 9d30cc96a3b2 (1 revision) (flutter/flutter#173197)
2025-08-04 [email protected] Roll Skia from 09667386bcba to 39c70f883c0e (1 revision) (flutter/flutter#173194)
2025-08-04 [email protected] fix: get content hash for master on local engine branches (third attempt)  (flutter/flutter#173169)
2025-08-03 [email protected] Roll Fuchsia Linux SDK from wZuA8Dqty4scQkZuV... to ufssK8EgJ_9RpLFgu... (flutter/flutter#173190)
2025-08-03 [email protected] Roll Skia from e056e47e118b to 09667386bcba (2 revisions) (flutter/flutter#173185)
2025-08-03 [email protected] fix: tag fuchsia package after uploading (flutter/flutter#173140)
2025-08-02 [email protected] Roll Skia from 7ef888e30a28 to e056e47e118b (1 revision) (flutter/flutter#173170)
2025-08-02 [email protected] Roll Fuchsia Linux SDK from yFLr81lLXmSX7n11D... to wZuA8Dqty4scQkZuV... (flutter/flutter#173166)
2025-08-02 [email protected] Roll Skia from af6d6eb383a6 to 7ef888e30a28 (1 revision) (flutter/flutter#173157)
2025-08-02 [email protected] Add `innerRadius` to `RadioThemeData` (flutter/flutter#173120)
2025-08-02 [email protected] Roll Skia from 81dd6aa516b0 to af6d6eb383a6 (2 revisions) (flutter/flutter#173144)
2025-08-01 98614782+auto-submit[bot]@users.noreply.github.com Reverts "fix: get content hash for master on local engine branches (#173114)" (flutter/flutter#173145)
2025-08-01 [email protected] Roll Dart SDK from 6e1bb159c860 to 3f1307d72d6f (2 revisions) (flutter/flutter#173136)
2025-08-01 [email protected] Roll Skia from 9e2c59634961 to 81dd6aa516b0 (1 revision) (flutter/flutter#173135)
2025-08-01 [email protected] Reformat text.dart's code snippets (flutter/flutter#172976)
2025-08-01 [email protected] experiment with docs properties (flutter/flutter#173124)
2025-08-01 [email protected] fix: get content hash for master on local engine branches (flutter/flutter#173114)
2025-08-01 [email protected] Make sure that a RawAutocomplete doesn't crash in 0x0 environment (flutter/flutter#172812)
2025-08-01 [email protected] Add skia_ports_fontmgr_android_parser_sources (flutter/flutter#172979)
2025-08-01 [email protected] Roll Skia from 49e39cd3cf18 to 9e2c59634961 (10 revisions) (flutter/flutter#173125)
2025-08-01 [email protected] fix: 🐛 Add equality and hashCode implementations to ResizeImage (flutter/flutter#172643)
2025-08-01 [email protected] Roll Fuchsia Linux SDK from xo_bqOoFuBKFmgKxn... to yFLr81lLXmSX7n11D... (flutter/flutter#173117)

If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/flutter-packages
Please CC [email protected],[email protected] on the revert to ensure that a human
is aware of the problem.

To file a bug in Packages: https://github.com/flutter/flutter/issues/new/choose

To report a problem with the AutoRoller itself, please file a bug:
https://issues.skia.org/issues/new?component=1389291&template=1850622

...
danilozhang pushed a commit to danilozhang/flutter that referenced this pull request Aug 6, 2025
…tter#173014)

PipelineCacheDataPersist runs on a worker thread pool. The raster thread
may be executing rendering operations while PipelineCacheDataPersist is
reading the state of the pipeline cache.

PipelineCacheDataPersist calls vkGetPipelineCacheData to get the size
required for the cache data buffer and then calls it again to fill the
buffer. If the cache state changed between those two calls, then the
count of bytes written may be less than the size returned by the first
call. In that case, PipelineCacheDataPersist should only write the
portion of the buffer that was filled.

This PR also adds a mutex to PipelineCacheVK::PersistCacheToDisk.
PipelineLibraryVK could queue multiple PersistCacheToDisk tasks to
different worker pool threads. These tasks should not run concurrently.

See flutter#172624
SydneyBao pushed a commit to SydneyBao/flutter that referenced this pull request Aug 6, 2025
…tter#173014)

PipelineCacheDataPersist runs on a worker thread pool. The raster thread
may be executing rendering operations while PipelineCacheDataPersist is
reading the state of the pipeline cache.

PipelineCacheDataPersist calls vkGetPipelineCacheData to get the size
required for the cache data buffer and then calls it again to fill the
buffer. If the cache state changed between those two calls, then the
count of bytes written may be less than the size returned by the first
call. In that case, PipelineCacheDataPersist should only write the
portion of the buffer that was filled.

This PR also adds a mutex to PipelineCacheVK::PersistCacheToDisk.
PipelineLibraryVK could queue multiple PersistCacheToDisk tasks to
different worker pool threads. These tasks should not run concurrently.

See flutter#172624
ksokolovskyi pushed a commit to ksokolovskyi/flutter that referenced this pull request Aug 19, 2025
…tter#173014)

PipelineCacheDataPersist runs on a worker thread pool. The raster thread
may be executing rendering operations while PipelineCacheDataPersist is
reading the state of the pipeline cache.

PipelineCacheDataPersist calls vkGetPipelineCacheData to get the size
required for the cache data buffer and then calls it again to fill the
buffer. If the cache state changed between those two calls, then the
count of bytes written may be less than the size returned by the first
call. In that case, PipelineCacheDataPersist should only write the
portion of the buffer that was filled.

This PR also adds a mutex to PipelineCacheVK::PersistCacheToDisk.
PipelineLibraryVK could queue multiple PersistCacheToDisk tasks to
different worker pool threads. These tasks should not run concurrently.

See flutter#172624
mboetger pushed a commit to mboetger/flutter that referenced this pull request Sep 18, 2025
…tter#173014)

PipelineCacheDataPersist runs on a worker thread pool. The raster thread
may be executing rendering operations while PipelineCacheDataPersist is
reading the state of the pipeline cache.

PipelineCacheDataPersist calls vkGetPipelineCacheData to get the size
required for the cache data buffer and then calls it again to fill the
buffer. If the cache state changed between those two calls, then the
count of bytes written may be less than the size returned by the first
call. In that case, PipelineCacheDataPersist should only write the
portion of the buffer that was filled.

This PR also adds a mutex to PipelineCacheVK::PersistCacheToDisk.
PipelineLibraryVK could queue multiple PersistCacheToDisk tasks to
different worker pool threads. These tasks should not run concurrently.

See flutter#172624
korca0220 pushed a commit to korca0220/flutter that referenced this pull request Sep 22, 2025
…tter#173014)

PipelineCacheDataPersist runs on a worker thread pool. The raster thread
may be executing rendering operations while PipelineCacheDataPersist is
reading the state of the pipeline cache.

PipelineCacheDataPersist calls vkGetPipelineCacheData to get the size
required for the cache data buffer and then calls it again to fill the
buffer. If the cache state changed between those two calls, then the
count of bytes written may be less than the size returned by the first
call. In that case, PipelineCacheDataPersist should only write the
portion of the buffer that was filled.

This PR also adds a mutex to PipelineCacheVK::PersistCacheToDisk.
PipelineLibraryVK could queue multiple PersistCacheToDisk tasks to
different worker pool threads. These tasks should not run concurrently.

See flutter#172624
auto-submit bot pushed a commit that referenced this pull request Oct 7, 2025
…ache corruption (#176520)

## Stable Cherry Pick
Cherry-picks #173014 to stable to fix unhandled crash on Android Snapdragon 845 devices due to a corrupted cache, preventing apps from starting. 

Fixes #172624

Impacted Users: All users with Android Snapdragon 845
Impact Description: the app can crash on start because of a corrupted cache.
Workaround: None
Risk: Low
Test Coverage: A test is included that checks succesful reads of an incomplete vk cache 
Validation Steps: #172624 (comment)

This crash has unfortunately prevented our app from using Flutter 3.35 and even 3.32, missing out a whole new set of fixes from these last Flutter releases for quite some time now.

cc @jason-simmons (PR author) @chinmaygarde (PR reviewer) in case there is something I could have missed with the CP request and description
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Nov 12, 2025
lucaantonelli pushed a commit to lucaantonelli/flutter that referenced this pull request Nov 21, 2025
…tter#173014)

PipelineCacheDataPersist runs on a worker thread pool. The raster thread
may be executing rendering operations while PipelineCacheDataPersist is
reading the state of the pipeline cache.

PipelineCacheDataPersist calls vkGetPipelineCacheData to get the size
required for the cache data buffer and then calls it again to fill the
buffer. If the cache state changed between those two calls, then the
count of bytes written may be less than the size returned by the first
call. In that case, PipelineCacheDataPersist should only write the
portion of the buffer that was filled.

This PR also adds a mutex to PipelineCacheVK::PersistCacheToDisk.
PipelineLibraryVK could queue multiple PersistCacheToDisk tasks to
different worker pool threads. These tasks should not run concurrently.

See flutter#172624
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

e: impeller Impeller rendering backend issues and features requests engine flutter/engine related. See also e: labels.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants