Conversation
chahank
left a comment
There was a problem hiding this comment.
Good changes, although some tests are failing now.
As a note, centroids are currently never supposed to be anything else than points, as the impact calculations cannot deal with other geometries in the centroids at the moment. I think it does not hurt to have the code ready.
peanutfun
left a comment
There was a problem hiding this comment.
Thank you for tackling this issue! I have some concerns regarding the readability and performance of the code, especially because the entire data needs to be copied. I also think the tests need to be adapted a bit. See my comments.
| store.close() | ||
| xycols = [] | ||
| wkbcols = [] | ||
| store = pd.HDFStore(file_name, mode=mode, complevel=9) |
There was a problem hiding this comment.
For zlib, it seems like looks like high compression levels only slightly reduce the file size while costing much performance. A lower value seems more advisable to me. See https://www.pytables.org/usersguide/optimization.html#compression-issues
| store = pd.HDFStore(file_name, mode=mode, complevel=9) | |
| store = pd.HDFStore(file_name, mode=mode, complevel=3) |
There was a problem hiding this comment.
I leave it as it is: in an arbitrary test, the cpu decrease was 0.4 seconds, about 15%, the size increase 2M, about 10%. From a $ point of view complevel 9 seems justified.
There was a problem hiding this comment.
15% decrease vs. 10% increase seems like an argument for a lower complevel, from my point of view. But I guess it's not that relevant. In case we run into some issues, we might consider making this a method kwarg in the future.
| for col in pandas_df.columns: | ||
| if str(pandas_df[col].dtype) == "geometry": |
There was a problem hiding this comment.
Make clear that you do not want to iterate over all columns:
| for col in pandas_df.columns: | |
| if str(pandas_df[col].dtype) == "geometry": | |
| for col in filter(lambda x: str(x.dtype) == "geometry", pandas_df.columns): |
(Suggestion won't work because the following code needs to be indented less)
There was a problem hiding this comment.
elegant suggestion - but I leave it as it is. it's more "climada style" like that.
There was a problem hiding this comment.
Then please add a comment, what you call "climada style" confused me quite a bit 😕
| for col in pandas_df.columns: | |
| if str(pandas_df[col].dtype) == "geometry": | |
| # Iterate over geometry columns (only) | |
| for col in pandas_df.columns: | |
| if str(pandas_df[col].dtype) == "geometry": |
climada/hazard/centroids/centr.py
Outdated
| crs = metadata.get("crs") | ||
| gdf = gpd.GeoDataFrame(store["centroids"], crs=crs) | ||
| gdf = gpd.GeoDataFrame(store["centroids"]) | ||
| for xycol in metadata.get("xy_columns", []): |
There was a problem hiding this comment.
Please also add a test with multiple xy_columns/wkb_columns to be stored and read.
| ) | ||
| centroids_w.write_hdf5(tmpfile) | ||
| centroids_r = Centroids.from_hdf5(tmpfile) | ||
| self.assertTrue(centroids_w == centroids_r) |
There was a problem hiding this comment.
| self.assertTrue(centroids_w == centroids_r) | |
| self.assertEqual(centroids_w, centroids_r) |
There was a problem hiding this comment.
(this was actually done in purpose - the idea was to make sure the overridden equality operator does what it ought to do, regardless of what exactly happens inside assertEqual, about which I have no clue)
Changes proposed in this PR:
This PR is a pragmatic workaround of the problem described in #1055
PR Author Checklist
develop)PR Reviewer Checklist