Skip to content

Extracting a time range from a cube is slow #4957

@bouweandela

Description

@bouweandela

📰 Custom Issue

Extracting a time range as described in the documentation is quite slow if you want to do it for many cubes and/or cubes with many time points. For a single cube with 10000 time points it already takes 2 seconds on my computer, so if I want to subset a few hundred cubes that becomes quite slow.

Here is a script that demonstrates this:

import cf_units
import iris.cube
import iris.coords
import iris.time
import numpy as np

time_units = cf_units.Unit('days since 1850-01-01', calendar='standard')
time = iris.coords.DimCoord(np.arange(10000, dtype=np.float64), standard_name='time', units=time_units)
cube = iris.cube.Cube(np.arange(10000, dtype=np.float32))
cube.add_dim_coord(time, 0)
pdt1 = iris.time.PartialDateTime(year=1852)
pdt2 = iris.time.PartialDateTime(year=1854)
constraint = iris.Constraint(time=lambda cell: pdt1 <= cell.point < pdt2)

%timeit cube.extract(constraint)

Result:

1.83 s ± 28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

From looking at the code in iris.coords, it looks like the slow behaviour is caused by converting all time points to datetimes individually for each cell, instead of converting them once and then generating the cells.

Here is some code with timings:

%timeit time.units.num2date(time.points)
27.3 ms ± 3.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

and

%timeit list(time.units.num2date(p) for p in time.points)
1.53 s ± 29.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

If this is an interesting feature, I can make a pull request to change the code so it first converts all the time points and then generates the cells?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions