-
-
Notifications
You must be signed in to change notification settings - Fork 114
Optimize handling of wide datasets and Pandas indexes #1350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1350 +/- ##
==========================================
+ Coverage 87.39% 88.87% +1.47%
==========================================
Files 50 51 +1
Lines 7490 7479 -11
==========================================
+ Hits 6546 6647 +101
+ Misses 944 832 -112 ☔ View full report in Codecov by Sentry. |
|
Relevant to many things, but especially #1160 |
|
issue noted in meeting: "When printing the ndoverlay, the vdim is not a value label.. it's the label of the last element in the ndoverlay" |
|
With holoviz/holoviews#6262 this should now behave well. |
I confirm this is fixed, i.e. printing import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
df = pd.DataFrame(np.random.random((5, 4)), columns=list('ABCD'))
value_label = 'value'
group_label = 'Variable'
charts = []
for column in df.columns:
kdims = ['index']
vdims = hv.Dimension(column, label=value_label)
chart = hv.Curve(df, kdims, vdims)
charts.append((column, chart))
out = hv.NdOverlay(charts, group_label, sort=False)
outnow displays: |
|
The test suite ran fine (https://github.com/holoviz/hvplot/actions/runs/9448341262) with the latest dev version of HoloViews that includes holoviz/holoviews#6262. |
hvplot/converter.py
Outdated
| ydim = hv.Dimension(c, label=self.value_label) | ||
| kdims, vdims = self._get_dimensions([x], [ydim]) | ||
| chart = element(data, kdims, vdims) | ||
| charts.append((c, chart.relabel(**self._relabel).redim(**self._redim))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe let's also skip relabel and redim if self._relabel and self._redim are empty respectively.
|
ready to merge? |
|
OP updated with a description of the changes made #1350 (comment). This is ready for review. |
|
Great work, all looked good to me and thanks for the new tests! Let's test this as much as possible, would love to see a new alpha release for that purpose. |
By avoiding calling
redimon each element of anNdOverlaywhich meant that every element had its own copy of the dataEdit: This PR bundles a few optimizations:
.redimis no longer called the elements of anNdOverlayso they can share the same data. This is demonstrated with the example below, thedatawrapped by each element of theNdOverlayis now the same object.326ee64: HoloViews added in Implement support for retaining Pandas index holoviews#6061 support for retaining
Pandasindexes, which means that hvPlot no longer needs to callreset_indexin a few cases, avoiding creating internal copies of the data. This change was made in 326ee64, by skipping somereset_indexcalls when the type of the dataset is a Pandas DataFrame (so far the Pandas Interface in HoloViews is the only one to have this support, so not Dask, not Geopandas, etc. which still require thereset_indexcalls). The preceding commits add various tests to make sure all thereset_indexcalls are at least called once in the unit tests. This commit adbddf7 updates the instantiation of aDatasetobject with an explicit list of kdims instead letting HoloViews infer them, as HoloViews when given a Pandas DataFrame doesn't include its indexes in the list of automatically inferred kdims.abb485a: Only redim and/or relabel a HoloViews element when needed, as suggested in Optimize handling of wide datasets and Pandas indexes #1350 (comment) since apparently these operations return a clone even when not given any argument.
I hope these optimizations will help/fix this issue #501.
edit: closes #1292