Fix sliding window eviction crash in ModelDataSet#3389
Conversation
|
Performance benchmarks:
|
b3da951 to
4a3b3ff
Compare
|
@EwoutH Tests are passing now — updated the dict storage to (time, data). |
|
@codebreaker32 could you take a look at this? |
codebreaker32
left a comment
There was a problem hiding this comment.
Good catch and thanks for this PR @ShreyasN707 However I have some minor requests:
- Please remove "time" from the storage.metadata["columns"] as well, keeping it in the metadata columns will cause empty DataFrames to have an inconsistent schema.
- Could you add a specific test that explicitly guards against the ValueError crash?
I will give it a final look once you push the changes
01dddc0 to
d4cd210
Compare
6c05284 to
0985233
Compare
| storage.metadata["type"] = "custom" | ||
|
|
||
| # Update bookkeeping for evicted data | ||
| """Update bookkeeping for evicted data""" |
There was a problem hiding this comment.
Please don't use triple quotes for inline comments
|
|
||
| if not storage.blocks: | ||
| # Empty DataFrame with correct columns | ||
| """ Empty DataFrame with correct columns""" |
There was a problem hiding this comment.
Please don't use triple quotes for inline comments
| return self._convert_modelDataSet(storage) | ||
| case _: | ||
| # Fallback | ||
| """Fallback""" |
|
|
||
| def test_data_recorder_window_eviction_dict(): | ||
| """Test window eviction bookkeeping for dict data.""" | ||
| def test_modeldataset_window_eviction_crash(): |
There was a problem hiding this comment.
Why did you change the name?
| # Fill window | ||
| recorder.collect() | ||
| recorder.collect() | ||
| recorder.collect() # Should evict first |
There was a problem hiding this comment.
Looks like I removed a few comments unintentionally during conflict resolution. My bad I’ll correct that and update the PR.
|
I still couldn't see the test that explicitly guards against the |
I’ll recheck the branch it’s possible the test didn’t get pushed properly. I’ll confirm and fix it right away. |
d364219 to
32654d6
Compare
| if "time" not in columns: | ||
| columns = [*columns, "time"] |
There was a problem hiding this comment.
Yes adding "time" there was unnecessary; I'll remove it to keep the schema aligned with metadata-defined columns.
| """Convert model dict blocks to DataFrame.""" | ||
| if not storage.blocks: | ||
| return pd.DataFrame(columns=storage.metadata.get("columns", [])) | ||
| return pd.DataFrame(columns=[*storage.metadata.get("columns", []), "time"]) |
| def test_modeldataset_window_eviction_no_valueerror(): | ||
| """Explicit regression test for ValueError during sliding window eviction (#3378).""" |
There was a problem hiding this comment.
It seems repetitive to me. Isn't it similar to above test? I am fine with above tests only
There was a problem hiding this comment.
It overlapped with the existing dict eviction test. The only difference was an explicit try/except for the ValueError, but pytest already fails on uncaught exceptions, so it was redundant.
6683a8f to
389e1db
Compare
d1221d9 to
805384a
Compare
|
tests are failing, so that needs to be addressed before merging this PR. |
Fixes a crash in
ModelDataSetwhenwindow_sizeis enabled by standardizing how blocks are stored internally.Issue: #3378
When the sliding window eviction runs,
_evict_blocksexpects blocks in the format(time, data).However, dict-based data was being stored as a merged dictionary (
{"time": time, **data}), which caused aValueErrorduring unpacking and crashed the data collection process.Implementation
Unified the storage format so all blocks (including dict data) are stored as
(time, data)tuples.The merge into
{"time": time, **data}is now handled only when building the final DataFrame.Core change: