Skip to content

Reading a DL-NIRSP ASDF is very slow #500

@Cadair

Description

@Cadair

With a test file on my local system (with the profiler enabled) it took 600s, which is insane (it does take significantly less without the profiler).

Here are some excerpts from the profile (which is insanely large)

612.109 <module>  <ipython-input-10-8499de7ed805>:1
└─ 612.109 wrapper  functools.py:927
   └─ 612.109 _load_from_string  /home/stuart/Git/DKIST/dkist/dkist/dataset/loader.py:116
      └─ 612.109 _load_from_path  /home/stuart/Git/DKIST/dkist/dkist/dataset/loader.py:125
         ├─ 612.109 _load_from_asdf  /home/stuart/Git/DKIST/dkist/dkist/dataset/loader.py:158
         │  ├─ 611.866 open_asdf  asdf/_asdf.py:1622
         │  │  ├─ 611.861 AsdfFile._open_impl  asdf/_asdf.py:1006
         │  │  │  └─ 611.861 AsdfFile._open_asdf  asdf/_asdf.py:890
         │  │  │     ├─ 360.544 AsdfFile._validate  asdf/_asdf.py:670
         │  │  │     ├─ 114.634 tagged_tree_to_custom_tree  asdf/yamlutil.py:329
         │  │  │     ├─ 88.697 load_tree  asdf/yamlutil.py:373
         │  │  │     ├─ 39.880 find_references  asdf/reference.py:108
         │  │  │     ├─ 7.834 Manager.read  asdf/_block/manager.py:337

So a significant amount of time is in the validation of the file on read, followed by the conversion of the tree to high-level objects and a good chunk in parsing the yaml and finding all the references in the yaml.

The obvious win would be to disable validation on read, but we should think about the trade off more.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions