Skip to content

Reduce the time taken to import cfdm #361

@davidhassell

Description

@davidhassell

Importing cfdm is quite slow ... on my local computer it takes 1.4 seconds.

Another system I have tried (JASMIN) can take between 6 and 10 seconds!

Having had a look at this, there seem to be two main culprits for the slow import:

  • Doc string rewriting
    • At import time, every docstring (of which there are currently 5569) is inspected for doc string substitutions, replacing any that are found
  • Importing external modules that themselves have a slow import
    • The main problems here are dask, scipy, s3fs, zarr, h5netcdf, uritools, netCDF4

These can be improved on:

  • Doc string rewriting
    • Only apply substitutions when necessary, rather than trying every possible substitution for every doc string. Only 3009 of the doc strings need rewriting, and each one of those only utilises a small number of the 110 possible substitutions.
  • Importing external modules that themselves have a slow import
    • Move the dask, scipy, s3fs, zarr, h5netcdf, uritools to run time, rather than import time. Many will not ever get imported, and when they do, the time is usually negligible compared to the operation being run.

Results

By applying these two changes, my local import time reduces to 0.2 seconds (from 1.4 seconds - a factor of 7 speed-up). On the other system I tried, the time reduces to between 1.5 and 2.5 seconds (from between 6 and 10 seconds).

These are good enough improvements for a PR, I think ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestperformanceRelating to speed and memory performance

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions