Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Dec 18, 2020

Fixes #2305.

The reason this is happening is:

  • When we hash a function, we're saving its co_cellvars and co_freevars in a table, and later looking them up when we resolve references.
  • List comprehensions are their own code objects, nested inside the function's co_consts, with their own co_cellvars and co_freevars not listed in the outer function.
  • We're only saving the outermost function's co_cellvars and co_freevars, so when we try to resolve references found inside the listcomp, it can't find them in our table, and crashes.

What we need to do is save all references as we recursively traverse our "tree" of nested code objects. In pseudocode, it looks something like this:

def hash_code_object(code_object):
    push_vars(code_object)

    hash.update(...)
    for child in code_object.nested_objects:
        hash_code_object(child)

    pop_vars()  # pop all the vars we pushed at the beginning, restoring old values

For example, this code

@st.cache()
def create_raw_data():
    production = [
        [outer + inner for inner in range(3)]
        for outer in range(5)
    ]
    return production

would produce (roughly) the following object hierarchy:

create_raw_data (vars: nothing)
└─ outer list comprehension (line 3) (vars: `outer`)
   └─ inner list comprehension (line 4) (vars: `inner`)

So if we're hashing create_raw_data, our cell table state would look like this over time.

  • Start hashing create_raw_data: {}
  • Start hashing outer listcomp: {'outer': ...}
  • Start hashing inner listcomp: {'outer': ..., 'inner': ...}
  • Finish hashing inner listcomp: {'outer': ...}
  • Finish hashing outer listcomp: {}
  • Finish hashing create_raw_data: {}

I basically replaced Context.cells, which was previously a simple dict, with a custom class that achieves this. Also, I added a test.

(Note: I recommend reading the diff in split view, it's a lot more readable than unified.)

@ghost ghost self-requested a review December 18, 2020 19:05
@ghost
Copy link
Author

ghost commented Dec 21, 2020

@nthmost Looks like tests are finally passing!

@nthmost
Copy link
Contributor

nthmost commented Dec 23, 2020

Hey @bh-streamlit , I've been taking my time reading this code since there's some refactoring of the cacheing engine entailed in your fix.

My overall read is that it looks really good and I appreciate that you thought deeply about the root cause of the issue! I'm gonna do another once-over for details, but regardless -- and since the tests pass :) -- I think we're good to go here! Nice work!

Copy link
Contributor

@nthmost nthmost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

I think i'm gonna start referring to our current cacheing engine as "The Stacks". 😆

@ghost
Copy link
Author

ghost commented Dec 23, 2020

Thanks!

@ghost ghost merged commit 7f8b812 into streamlit:develop Dec 23, 2020
@ghost ghost deleted the fix-caching-listcomp branch December 23, 2020 19:04
tconkling added a commit to tconkling/streamlit that referenced this pull request Jan 4, 2021
* develop:
  Bump vega from 5.17.1 to 5.17.3 in /frontend (streamlit#2541)
  Pick a random emoji on `st.set_page_config(emoji="random")` (streamlit#2020)
  Fix Jest warnings (streamlit#2523)
  Upgrade react-markdown (streamlit#2527)
  Upgrade react-hotkeys (streamlit#2525)
  Close streamlit#2495 (streamlit#2524)
  Remove unnecessary case statement (streamlit#2522)
  Bump @types/node from 12.19.9 to 14.14.16 in /frontend (streamlit#2526)
  Bump fetch-mock from 7.7.3 to 9.11.0 in /frontend (streamlit#2505)
  st.markdown now shows a link title (streamlit#2518)
  Bump @types/react-dom from 16.9.10 to 17.0.0 in /frontend (streamlit#2503)
  Fix caching list comprehensions (streamlit#2484)
  Add validation to st.slider ensuring `step` cannot be 0 (streamlit#2502)
  Ensure st.image works with UploadedFiles (streamlit#2512)
  Fix dataframe column sort (streamlit#2511)
  File uploader session check (streamlit#2498)
  Upgrade node-notifier to 8.0.1 or later (streamlit#2507)
  Fix st.number_input not using min_value as default value (streamlit#2499)
  Unblock Core patches, and add Marisa as a docs owner (streamlit#2501)
  Patch 0.73.1 (streamlit#2500)
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache a function with an nested list comprehension

1 participant