-
Notifications
You must be signed in to change notification settings - Fork 300
Closed
Description
📰 Custom Issue
Extracting a time range as described in the documentation is quite slow if you want to do it for many cubes and/or cubes with many time points. For a single cube with 10000 time points it already takes 2 seconds on my computer, so if I want to subset a few hundred cubes that becomes quite slow.
Here is a script that demonstrates this:
import cf_units
import iris.cube
import iris.coords
import iris.time
import numpy as np
time_units = cf_units.Unit('days since 1850-01-01', calendar='standard')
time = iris.coords.DimCoord(np.arange(10000, dtype=np.float64), standard_name='time', units=time_units)
cube = iris.cube.Cube(np.arange(10000, dtype=np.float32))
cube.add_dim_coord(time, 0)
pdt1 = iris.time.PartialDateTime(year=1852)
pdt2 = iris.time.PartialDateTime(year=1854)
constraint = iris.Constraint(time=lambda cell: pdt1 <= cell.point < pdt2)
%timeit cube.extract(constraint)Result:
1.83 s ± 28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
From looking at the code in iris.coords, it looks like the slow behaviour is caused by converting all time points to datetimes individually for each cell, instead of converting them once and then generating the cells.
Here is some code with timings:
%timeit time.units.num2date(time.points)27.3 ms ± 3.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
and
%timeit list(time.units.num2date(p) for p in time.points)
1.53 s ± 29.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If this is an interesting feature, I can make a pull request to change the code so it first converts all the time points and then generates the cells?
Metadata
Metadata
Assignees
Labels
No labels