Skip to content

Add a SEP to extend the spec syntax#5

Closed
alalazo wants to merge 2 commits intospack:mainfrom
alalazo:sep/extend_spec_syntax
Closed

Add a SEP to extend the spec syntax#5
alalazo wants to merge 2 commits intospack:mainfrom
alalazo:sep/extend_spec_syntax

Conversation

@alalazo
Copy link
Copy Markdown
Member

@alalazo alalazo commented Nov 24, 2021

Proposal to extend the spec syntax:

  • To allow specifying arbitrary key=value attributes on the edges
  • To allow parsing non-flat DAGs

This is needed to deal with use cases introduced by separate concretization of build dependencies and compiler as dependencies. Details in the SEP.

tgamblin pushed a commit to spack/spack that referenced this pull request Dec 7, 2022
## Motivation

Our parser grew to be quite complex, with a 2-state lexer and logic in the parser
that has up to 5 levels of nested conditionals. In the future, to turn compilers into
proper dependencies, we'll have to increase the complexity further as we foresee
the need to add:
1. Edge attributes
2. Spec nesting

to the spec syntax (see spack/seps#5 for an initial discussion of
those changes).  The main attempt here is thus to _simplify the existing code_ before
we start extending it later. We try to do that by adopting a different token granularity,
and by using more complex regexes for tokenization. This allow us to a have a "flatter"
encoding for the parser. i.e., it has fewer nested conditionals and a near-trivial lexer.

There are places, namely in `VERSION`, where we have to use negative lookahead judiciously
to avoid ambiguity.  Specifically, this parse is ambiguous without `(?!\s*=)` in `VERSION_RANGE`
and an extra final `\b` in `VERSION`:

```
@ 1.2.3     :        develop  # This is a version range 1.2.3:develop
@ 1.2.3     :        develop=foo  # This is a version range 1.2.3: followed by a key-value pair
```

## Differences with the previous parser

~There are currently 2 known differences with the previous parser, which have been added on purpose:~

- ~No spaces allowed after a sigil (e.g. `foo @ 1.2.3` is invalid while `foo @1.2.3` is valid)~
- ~`/<hash> @1.2.3` can be parsed as a concrete spec followed by an anonymous spec (before was invalid)~

~We can recover the previous behavior on both ones but, especially for the second one, it seems the current behavior in the PR is more consistent.~

The parser is currently 100% backward compatible.

## Error handling

Being based on more complex regexes, we can possibly improve error
handling by adding regexes for common issues and hint users on that.
I'll leave that for a following PR, but there's a stub for this approach in the PR.

## Performance

To be sure we don't add any performance penalty with this new encoding, I measured:
```console
$ spack python -m timeit -s "import spack.spec" -c "spack.spec.Spec(<spec>)"
```
for different specs on my machine:

* **Spack:** 0.20.0.dev0 (c9db4e5)
* **Python:** 3.8.10
* **Platform:** linux-ubuntu20.04-icelake
* **Concretizer:** clingo

results are:

| Spec          | develop       | this PR |
| ------------- | ------------- | ------- |
| `trilinos`  |  28.9 usec | 13.1 usec |
| `trilinos @1.2.10:1.4.20,2.0.1`  | 131 usec  | 120 usec |
| `trilinos %gcc`  | 44.9 usec  | 20.9 usec |
| `trilinos +foo`  | 44.1 usec  | 21.3 usec |
| `trilinos foo=bar`  | 59.5 usec  | 25.6 usec |
| `trilinos foo=bar ^ mpich foo=baz`  | 120 usec  | 82.1 usec |

so this new parser seems to be consistently faster than the previous one.

## Modifications

In this PR we just substituted the Spec parser, which means:
- [x] Deleted in `spec.py` the `SpecParser` and `SpecLexer` classes. deleted `spack/parse.py`
- [x] Added a new parser in `spack/parser.py`
- [x] Hooked the new parser in all the places the previous one was used
- [x] Adapted unit tests in `test/spec_syntax.py`


## Possible future improvements

Random thoughts while working on the PR:
- Currently we transform hashes and files into specs during parsing. I think
we might want to introduce an additional step and parse special objects like
a `FileSpec` etc. in-between parsing and concretization.
amd-toolchain-support pushed a commit to amd-toolchain-support/spack that referenced this pull request Feb 16, 2023
## Motivation

Our parser grew to be quite complex, with a 2-state lexer and logic in the parser
that has up to 5 levels of nested conditionals. In the future, to turn compilers into
proper dependencies, we'll have to increase the complexity further as we foresee
the need to add:
1. Edge attributes
2. Spec nesting

to the spec syntax (see spack/seps#5 for an initial discussion of
those changes).  The main attempt here is thus to _simplify the existing code_ before
we start extending it later. We try to do that by adopting a different token granularity,
and by using more complex regexes for tokenization. This allow us to a have a "flatter"
encoding for the parser. i.e., it has fewer nested conditionals and a near-trivial lexer.

There are places, namely in `VERSION`, where we have to use negative lookahead judiciously
to avoid ambiguity.  Specifically, this parse is ambiguous without `(?!\s*=)` in `VERSION_RANGE`
and an extra final `\b` in `VERSION`:

```
@ 1.2.3     :        develop  # This is a version range 1.2.3:develop
@ 1.2.3     :        develop=foo  # This is a version range 1.2.3: followed by a key-value pair
```

## Differences with the previous parser

~There are currently 2 known differences with the previous parser, which have been added on purpose:~

- ~No spaces allowed after a sigil (e.g. `foo @ 1.2.3` is invalid while `foo @1.2.3` is valid)~
- ~`/<hash> @1.2.3` can be parsed as a concrete spec followed by an anonymous spec (before was invalid)~

~We can recover the previous behavior on both ones but, especially for the second one, it seems the current behavior in the PR is more consistent.~

The parser is currently 100% backward compatible.

## Error handling

Being based on more complex regexes, we can possibly improve error
handling by adding regexes for common issues and hint users on that.
I'll leave that for a following PR, but there's a stub for this approach in the PR.

## Performance

To be sure we don't add any performance penalty with this new encoding, I measured:
```console
$ spack python -m timeit -s "import spack.spec" -c "spack.spec.Spec(<spec>)"
```
for different specs on my machine:

* **Spack:** 0.20.0.dev0 (c9db4e5)
* **Python:** 3.8.10
* **Platform:** linux-ubuntu20.04-icelake
* **Concretizer:** clingo

results are:

| Spec          | develop       | this PR |
| ------------- | ------------- | ------- |
| `trilinos`  |  28.9 usec | 13.1 usec |
| `trilinos @1.2.10:1.4.20,2.0.1`  | 131 usec  | 120 usec |
| `trilinos %gcc`  | 44.9 usec  | 20.9 usec |
| `trilinos +foo`  | 44.1 usec  | 21.3 usec |
| `trilinos foo=bar`  | 59.5 usec  | 25.6 usec |
| `trilinos foo=bar ^ mpich foo=baz`  | 120 usec  | 82.1 usec |

so this new parser seems to be consistently faster than the previous one.

## Modifications

In this PR we just substituted the Spec parser, which means:
- [x] Deleted in `spec.py` the `SpecParser` and `SpecLexer` classes. deleted `spack/parse.py`
- [x] Added a new parser in `spack/parser.py`
- [x] Hooked the new parser in all the places the previous one was used
- [x] Adapted unit tests in `test/spec_syntax.py`


## Possible future improvements

Random thoughts while working on the PR:
- Currently we transform hashes and files into specs during parsing. I think
we might want to introduce an additional step and parse special objects like
a `FileSpec` etc. in-between parsing and concretization.
@alalazo alalazo closed this Oct 7, 2025
@alalazo alalazo deleted the sep/extend_spec_syntax branch October 7, 2025 04:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant