Skip to content

Poor performance of repr of large arrays, particularly jupyter repr #4789

@max-sixty

Description

@max-sixty

What happened:

The _repr_html_ method of large arrays seems very slow — 4.78s in the case of a 100m value array; and the general repr seems fairly slow — 1.87s. Here's a quick example. I haven't yet investigated how dependent it is on there being a MultiIndex.

What you expected to happen:

We should really focus on having good repr performance, given how essential it is to any REPL workflow.

Minimal Complete Verifiable Example:

In [10]: import xarray as xr
    ...: import numpy as np
    ...: import pandas as pd

In [11]: idx = pd.MultiIndex.from_product([range(10_000), range(10_000)])

In [12]: df = pd.DataFrame(range(100_000_000), index=idx)

In [13]: da = xr.DataArray(df)

In [14]: da
Out[14]:
<xarray.DataArray (dim_0: 100000000, dim_1: 1)>
array([[       0],
       [       1],
       [       2],
       ...,
       [99999997],
       [99999998],
       [99999999]])
Coordinates:
  * dim_0          (dim_0) MultiIndex
  - dim_0_level_0  (dim_0) int64 0 0 0 0 0 0 0 ... 9999 9999 9999 9999 9999 9999
  - dim_0_level_1  (dim_0) int64 0 1 2 3 4 5 6 ... 9994 9995 9996 9997 9998 9999
  * dim_1          (dim_1) int64 0


In [26]: %timeit repr(da)
1.87 s ± 7.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [27]: %timeit da._repr_html_()
4.78 s ± 1.8 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.7 (default, Dec 30 2020, 10:13:08)
[Clang 12.0.0 (clang-1200.0.32.28)]
python-bits: 64
OS: Darwin
OS-release: 19.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: None
libnetcdf: None

xarray: 0.16.3.dev48+gbf0fe2ca
pandas: 1.1.3
numpy: 1.19.2
scipy: 1.5.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.5.0
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.30.0
distributed: None
matplotlib: 3.3.2
cartopy: None
seaborn: 0.11.0
numbagg: installed
pint: 0.16.1
setuptools: 51.1.1
pip: 20.3.3
conda: None
pytest: 6.1.1
IPython: 7.19.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions