Skip to content

Better control of cached object lifecycles #364

@treuille

Description

@treuille

Problem

Cached objects have the lifetime of the server itself, which is oftem more of a memory leak than a desired feaure. In particular, I have observed several interrelated problems.

  1. In most cases, cached objects shouldn't persist past the current session (example).
  2. We've been asked for months for an option so that cache entries can expire after a certain time (example).
  3. When a cached object represents an OS resource, like an open database connnectionn (example), we need to have a way of finalizing that obect, e.g. closing that connection.
  4. We've also been asked to add numerical limits to the cache (I can't find an example right now.)

Solution

MVP

The simplest solution would be to add the following keyword options to
st.cache:

Option Default Meaning
global False Whether the object is cached across sessions.
ttl None The number of seconds to keep the object cached
max_entries None The entries to keep for that function

Reference Implementation

A starting point for implementing these ideas is this gist. Note that the difference between these ideas and this gist is that we should let cached items expire when they're no longer needed!

Why is max_entries None?

A basic goal of Streamlit is that the default options should "do the right thing." It's not clear what the "right thing" is in this case but I think there are two possible failure modes:

  1. If max_entries is a particular number, then the user might experience sudden, discontinuous drop in performance as the cache starts evicting entries
  2. But if max_entries is None, then the user might see a slow degradation in performance due to a continuous increase in the number of cached items.

I think that (1) is more mysterious than (2) and thus less desirable.

Next Step: Explicit Finalizing

The next past the MVP would be to allow a function to be called on cache eviction:

Option Default Meaning
finalize_func None Function called on the object on cached eviction

as follows:

from db import Connection # <- making this example up

# close() is a method of Connnection
@st.cache(finalize_func=Connection.close) 
def get_db_connection():
    return Connectin(...)

Next Step Part Deux: Fancy Finalizing

Sometimes the user might want to write their own finalizer after the st.cache, which looks prettier, which we could enable as follows:

from db import Connection # <- making this example up

# close() is a method of Connnection
@st.cache(finalize_func=Connection.close) 
def get_db_connection():
    return Connectin(...)

@get_db_connection.finalizer
def finalize_db_connection(conn):
    conn.close()

which would be equivalent to the previous code snippet.

Metadata

Metadata

Assignees

Labels

feature:cacheRelated to `st.cache_data` and `st.cache_resource`type:enhancementRequests for feature enhancements or new features

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions