Skip to content

Can't cache a dataframe that contains tuples #1224

@mysterefrank

Description

@mysterefrank

Summary

I got an internal hash error when I wrote a cached function that returns a dataframe containing tuples.

This is the error message:

InternalHashError: setting an array element with a sequence

Usually this means you found a Streamlit bug! If you think that's the case, please file a bug report here.

In the meantime, you can try bypassing this error by registering a custom hash function via the hash_funcs keyword in @st.cache(). For example:

@st.cache(hash_funcs={pandas.core.frame.DataFrame: my_hash_func})
def my_func(...):
...
Please see the hash_funcs documentation for more details.

Traceback:
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/streamlit/ScriptRunner.py", line 314, in _run_script
exec(code, module.dict)
File "/Users/mysterefrank/surge_sim/vis/sample_data/streamlit_v1.py", line 42, in
df = load_data(file_path)
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/streamlit/caching.py", line 464, in wrapped_func
return get_or_set_cache()
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/streamlit/caching.py", line 457, in get_or_set_cache
hash_funcs=hash_funcs,
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/streamlit/caching.py", line 294, in _write_to_cache
_write_to_mem_cache(key, value, allow_output_mutation, hash_funcs)
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/streamlit/caching.py", line 231, in _write_to_mem_cache
hash = get_hash(value, hash_funcs=hash_funcs)
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/streamlit/hashing.py", line 134, in get_hash
hasher.update(f, context)
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/streamlit/hashing.py", line 280, in update
self._update(self.hasher, obj, context)
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/streamlit/hashing.py", line 323, in _update
b = self.to_bytes(obj, context)
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/streamlit/hashing.py", line 307, in to_bytes
b = self._to_bytes(obj, context)
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/streamlit/hashing.py", line 500, in _to_bytes
raise InternalHashError(msg).with_traceback(e.traceback)
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/streamlit/hashing.py", line 398, in _to_bytes
return pd.util.hash_pandas_object(obj).sum()
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/pandas/core/util/hashing.py", line 115, in hash_pandas_object
h = _combine_hash_arrays(hashes, num_items)
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/pandas/core/util/hashing.py", line 33, in _combine_hash_arrays
first = next(arrays)
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/pandas/core/util/hashing.py", line 104, in
hashes = (hash_array(series.values) for _, series in obj.iteritems())
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/pandas/core/util/hashing.py", line 289, in hash_array
return _hash_categorical(cat, encoding, hash_key)
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/pandas/core/util/hashing.py", line 209, in _hash_categorical
categorize=False)
File "/Users/mysterefrank/miniconda3/lib/python3.6/site-packages/pandas/core/util/hashing.py", line 295, in hash_array
vals = hashing.hash_object_array(vals.astype(str).astype(object),

The error goes away when I removed the tuple columns in my df.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature:cacheRelated to `st.cache_data` and `st.cache_resource`feature:st.dataframeRelated to the `st.dataframe` elementtype:bugSomething isn't working as expected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions