Skip to content

Refactor to improve spec format speed#43712

Merged
tgamblin merged 28 commits intodevelopfrom
fast-format
Apr 23, 2024
Merged

Refactor to improve spec format speed#43712
tgamblin merged 28 commits intodevelopfrom
fast-format

Conversation

@tgamblin
Copy link
Copy Markdown
Member

@tgamblin tgamblin commented Apr 18, 2024

When looking at where we spend our time in solver setup, I noticed a fair bit of time is spent in Spec.format(), and Spec.format() is a pretty old, slow, convoluted method.

This PR does a number of things:

  • Consolidate most of what was being done manually with a character loop and several regexes into a single regex.
  • Precompile regexes where we keep them
  • Remove the transform= argument to Spec.format() which was only used in one place in the code (modules) to uppercase env var names, but added a lot of complexity
  • Avoid escaping and colorizing specs unless necessary
  • Refactor a lot of the colorization logic to avoid unnecessary object construction
  • Add type hints and remove some spots in the code where we were using nonexistent arguments to format().
  • Add trivial cases to __str__ in VariantMap and VersionList to avoid sorting
  • Avoid calling isinstance() in the main loop of Spec.format()
  • Don't bother constructing a string representation for the result of _prev_versionas it is only used for comparisons.

In my timings (on all the specs formatted in a solve of hdf5), this is over 2.67x faster than the original format(), and it seems to reduce setup time by around a second (for hdf5).

@tgamblin tgamblin requested review from alalazo and haampie April 18, 2024 07:04
@spackbot-app spackbot-app bot added commands core PR affects Spack core functionality modules stand-alone-tests Stand-alone (or smoke) tests for installed packages tests General test capability(ies) utilities versions labels Apr 18, 2024
@haampie
Copy link
Copy Markdown
Member

haampie commented Apr 18, 2024

Note that this does not memoize the result of Spec.format(), as @haampie had suggested to do

my suggestion was to memoize the result of parsing the format string, although not 100% clear if better.

Like parsing {name}-{version} and saving lambda s: f"{s.name}-{s.version}", but that sounds like codegen that might be tricky in python.

@tgamblin
Copy link
Copy Markdown
Member Author

If you want to try this out, I made a little benchmark here: https://github.com/tgamblin/spec-format-perf

@tgamblin
Copy link
Copy Markdown
Member Author

Ok that makes more sense. I think actually memoizing the format result during setup is probably going to be a bigger win.

This removes a bunch of old methods and makes the color formatting logic a lot clearer.

- [x] Remove `colorize_spec()`, which uses older, now incorrect colorization logic. It
      is older than the current `Spec.format()` implementation and it doesn't work with
      `@=`. The only usages in `cmd/info.py` can be replaced wtih `Spec.cformat()`.
- [x] Remove `_SEPARATORS` AND `COLOR_FORMATS`, which are no longer needed.
- [x] Simplify `Spec.format()` code to use more readable color constants directly.
- [x] Remove unused color constants.
- [x] Remove unused parameter for `write_attribute()`
- [x] Remove unused `Spec.colorized()` method
Further performance improvement by avoiding object construction

`match_to_ansi` was being constructed every time. We can avoid an unnecessary object
creation by not using a class here.
Optimize format speed by avoiding object construction.
We can get the regex engine to do more work and to avoid string comparisons.
Avoid a constructor call if it's not needed.
@tgamblin
Copy link
Copy Markdown
Member Author

@alalazo can you look at 30d85cb (spec format: remove unnecessary transform= argument) and make sure you agree?

@tgamblin
Copy link
Copy Markdown
Member Author

tgamblin commented Apr 21, 2024

As of now I'm getting:

  • 1.155e-4 s/format (this), vs
  • 2.655e-4 s/format (develop)

So this is ~2.3x faster at least on an m1 mac.

`transform=` adds a whole lot of extra complexity to `Spec.format()`, and the thing
it was originally intended for (the `decorator=` arg to `spack.cmd.display_specs()`) is
now handled entirely in `display_specs()`.

The only place it is still used is in the common modules code, and we can handle what
it's used for there with a simple `str.upper()`. If people want to format different
components of specs differently, they can call `format()` multiple times.
- [x] raise an exception for an unterminated escape character ('\')
- [x] format_path preserves leading '/' as '\' when on windows
- [x] trivial cases for VariantMap and VersionList
- [x] don't bother making a string in `_prev_version` (which is used only in comparisons)
@tgamblin
Copy link
Copy Markdown
Member Author

Ok now it's 2.59x faster. I think I have wrung all I can out of this. If we can avoid somehow constructing a StandardVersion in _prev_version, it might help a bit more, but I do not see an easy way to do that.

I think this is ready to go

@tgamblin tgamblin changed the title Refactor and improve spec format speed Refactor to improve spec format speed Apr 23, 2024
Copy link
Copy Markdown
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a few comments. This commit 0771aca LGTM

@tgamblin
Copy link
Copy Markdown
Member Author

As of now, this is ~2.67x faster than develop and cuts a second off the hdf5 concretization time for me on an m1.

@tgamblin tgamblin merged commit aa0825d into develop Apr 23, 2024
@tgamblin tgamblin deleted the fast-format branch April 23, 2024 17:52
tldahlgren pushed a commit to tldahlgren/spack that referenced this pull request Apr 23, 2024
When looking at where we spend our time in solver setup, I noticed a fair bit of time is spent
in `Spec.format()`, and `Spec.format()` is a pretty old, slow, convoluted method.

This PR does a number of things:
- [x] Consolidate most of what was being done manually with a character loop and several
      regexes into a single regex.
- [x] Precompile regexes where we keep them 
- [x] Remove the `transform=` argument to `Spec.format()` which was only used in one 
      place in the code (modules) to uppercase env var names, but added a lot of complexity
- [x] Avoid escaping and colorizing specs unless necessary
- [x] Refactor a lot of the colorization logic to avoid unnecessary object construction
- [x] Add type hints and remove some spots in the code where we were using nonexistent
      arguments to `format()`.
- [x] Add trivial cases to `__str__` in `VariantMap` and `VersionList` to avoid sorting
- [x] Avoid calling `isinstance()` in the main loop of `Spec.format()`
- [x] Don't bother constructing a `string` representation for the result of `_prev_version`
      as it is only used for comparisons.

In my timings (on all the specs formatted in a solve of `hdf5`), this is over 2.67x faster than the 
original `format()`, and it seems to reduce setup time by around a second (for `hdf5`).
stephenmsachs pushed a commit to stephenmsachs/spack that referenced this pull request Apr 29, 2024
When looking at where we spend our time in solver setup, I noticed a fair bit of time is spent
in `Spec.format()`, and `Spec.format()` is a pretty old, slow, convoluted method.

This PR does a number of things:
- [x] Consolidate most of what was being done manually with a character loop and several
      regexes into a single regex.
- [x] Precompile regexes where we keep them 
- [x] Remove the `transform=` argument to `Spec.format()` which was only used in one 
      place in the code (modules) to uppercase env var names, but added a lot of complexity
- [x] Avoid escaping and colorizing specs unless necessary
- [x] Refactor a lot of the colorization logic to avoid unnecessary object construction
- [x] Add type hints and remove some spots in the code where we were using nonexistent
      arguments to `format()`.
- [x] Add trivial cases to `__str__` in `VariantMap` and `VersionList` to avoid sorting
- [x] Avoid calling `isinstance()` in the main loop of `Spec.format()`
- [x] Don't bother constructing a `string` representation for the result of `_prev_version`
      as it is only used for comparisons.

In my timings (on all the specs formatted in a solve of `hdf5`), this is over 2.67x faster than the 
original `format()`, and it seems to reduce setup time by around a second (for `hdf5`).
teaguesterling pushed a commit to teaguesterling/spack that referenced this pull request Jun 15, 2024
When looking at where we spend our time in solver setup, I noticed a fair bit of time is spent
in `Spec.format()`, and `Spec.format()` is a pretty old, slow, convoluted method.

This PR does a number of things:
- [x] Consolidate most of what was being done manually with a character loop and several
      regexes into a single regex.
- [x] Precompile regexes where we keep them 
- [x] Remove the `transform=` argument to `Spec.format()` which was only used in one 
      place in the code (modules) to uppercase env var names, but added a lot of complexity
- [x] Avoid escaping and colorizing specs unless necessary
- [x] Refactor a lot of the colorization logic to avoid unnecessary object construction
- [x] Add type hints and remove some spots in the code where we were using nonexistent
      arguments to `format()`.
- [x] Add trivial cases to `__str__` in `VariantMap` and `VersionList` to avoid sorting
- [x] Avoid calling `isinstance()` in the main loop of `Spec.format()`
- [x] Don't bother constructing a `string` representation for the result of `_prev_version`
      as it is only used for comparisons.

In my timings (on all the specs formatted in a solve of `hdf5`), this is over 2.67x faster than the 
original `format()`, and it seems to reduce setup time by around a second (for `hdf5`).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commands core PR affects Spack core functionality modules specs stand-alone-tests Stand-alone (or smoke) tests for installed packages tests General test capability(ies) utilities versions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants