Closed
Conversation
4 tasks
tgamblin
pushed a commit
to spack/spack
that referenced
this pull request
Dec 7, 2022
## Motivation Our parser grew to be quite complex, with a 2-state lexer and logic in the parser that has up to 5 levels of nested conditionals. In the future, to turn compilers into proper dependencies, we'll have to increase the complexity further as we foresee the need to add: 1. Edge attributes 2. Spec nesting to the spec syntax (see spack/seps#5 for an initial discussion of those changes). The main attempt here is thus to _simplify the existing code_ before we start extending it later. We try to do that by adopting a different token granularity, and by using more complex regexes for tokenization. This allow us to a have a "flatter" encoding for the parser. i.e., it has fewer nested conditionals and a near-trivial lexer. There are places, namely in `VERSION`, where we have to use negative lookahead judiciously to avoid ambiguity. Specifically, this parse is ambiguous without `(?!\s*=)` in `VERSION_RANGE` and an extra final `\b` in `VERSION`: ``` @ 1.2.3 : develop # This is a version range 1.2.3:develop @ 1.2.3 : develop=foo # This is a version range 1.2.3: followed by a key-value pair ``` ## Differences with the previous parser ~There are currently 2 known differences with the previous parser, which have been added on purpose:~ - ~No spaces allowed after a sigil (e.g. `foo @ 1.2.3` is invalid while `foo @1.2.3` is valid)~ - ~`/<hash> @1.2.3` can be parsed as a concrete spec followed by an anonymous spec (before was invalid)~ ~We can recover the previous behavior on both ones but, especially for the second one, it seems the current behavior in the PR is more consistent.~ The parser is currently 100% backward compatible. ## Error handling Being based on more complex regexes, we can possibly improve error handling by adding regexes for common issues and hint users on that. I'll leave that for a following PR, but there's a stub for this approach in the PR. ## Performance To be sure we don't add any performance penalty with this new encoding, I measured: ```console $ spack python -m timeit -s "import spack.spec" -c "spack.spec.Spec(<spec>)" ``` for different specs on my machine: * **Spack:** 0.20.0.dev0 (c9db4e5) * **Python:** 3.8.10 * **Platform:** linux-ubuntu20.04-icelake * **Concretizer:** clingo results are: | Spec | develop | this PR | | ------------- | ------------- | ------- | | `trilinos` | 28.9 usec | 13.1 usec | | `trilinos @1.2.10:1.4.20,2.0.1` | 131 usec | 120 usec | | `trilinos %gcc` | 44.9 usec | 20.9 usec | | `trilinos +foo` | 44.1 usec | 21.3 usec | | `trilinos foo=bar` | 59.5 usec | 25.6 usec | | `trilinos foo=bar ^ mpich foo=baz` | 120 usec | 82.1 usec | so this new parser seems to be consistently faster than the previous one. ## Modifications In this PR we just substituted the Spec parser, which means: - [x] Deleted in `spec.py` the `SpecParser` and `SpecLexer` classes. deleted `spack/parse.py` - [x] Added a new parser in `spack/parser.py` - [x] Hooked the new parser in all the places the previous one was used - [x] Adapted unit tests in `test/spec_syntax.py` ## Possible future improvements Random thoughts while working on the PR: - Currently we transform hashes and files into specs during parsing. I think we might want to introduce an additional step and parse special objects like a `FileSpec` etc. in-between parsing and concretization.
amd-toolchain-support
pushed a commit
to amd-toolchain-support/spack
that referenced
this pull request
Feb 16, 2023
## Motivation Our parser grew to be quite complex, with a 2-state lexer and logic in the parser that has up to 5 levels of nested conditionals. In the future, to turn compilers into proper dependencies, we'll have to increase the complexity further as we foresee the need to add: 1. Edge attributes 2. Spec nesting to the spec syntax (see spack/seps#5 for an initial discussion of those changes). The main attempt here is thus to _simplify the existing code_ before we start extending it later. We try to do that by adopting a different token granularity, and by using more complex regexes for tokenization. This allow us to a have a "flatter" encoding for the parser. i.e., it has fewer nested conditionals and a near-trivial lexer. There are places, namely in `VERSION`, where we have to use negative lookahead judiciously to avoid ambiguity. Specifically, this parse is ambiguous without `(?!\s*=)` in `VERSION_RANGE` and an extra final `\b` in `VERSION`: ``` @ 1.2.3 : develop # This is a version range 1.2.3:develop @ 1.2.3 : develop=foo # This is a version range 1.2.3: followed by a key-value pair ``` ## Differences with the previous parser ~There are currently 2 known differences with the previous parser, which have been added on purpose:~ - ~No spaces allowed after a sigil (e.g. `foo @ 1.2.3` is invalid while `foo @1.2.3` is valid)~ - ~`/<hash> @1.2.3` can be parsed as a concrete spec followed by an anonymous spec (before was invalid)~ ~We can recover the previous behavior on both ones but, especially for the second one, it seems the current behavior in the PR is more consistent.~ The parser is currently 100% backward compatible. ## Error handling Being based on more complex regexes, we can possibly improve error handling by adding regexes for common issues and hint users on that. I'll leave that for a following PR, but there's a stub for this approach in the PR. ## Performance To be sure we don't add any performance penalty with this new encoding, I measured: ```console $ spack python -m timeit -s "import spack.spec" -c "spack.spec.Spec(<spec>)" ``` for different specs on my machine: * **Spack:** 0.20.0.dev0 (c9db4e5) * **Python:** 3.8.10 * **Platform:** linux-ubuntu20.04-icelake * **Concretizer:** clingo results are: | Spec | develop | this PR | | ------------- | ------------- | ------- | | `trilinos` | 28.9 usec | 13.1 usec | | `trilinos @1.2.10:1.4.20,2.0.1` | 131 usec | 120 usec | | `trilinos %gcc` | 44.9 usec | 20.9 usec | | `trilinos +foo` | 44.1 usec | 21.3 usec | | `trilinos foo=bar` | 59.5 usec | 25.6 usec | | `trilinos foo=bar ^ mpich foo=baz` | 120 usec | 82.1 usec | so this new parser seems to be consistently faster than the previous one. ## Modifications In this PR we just substituted the Spec parser, which means: - [x] Deleted in `spec.py` the `SpecParser` and `SpecLexer` classes. deleted `spack/parse.py` - [x] Added a new parser in `spack/parser.py` - [x] Hooked the new parser in all the places the previous one was used - [x] Adapted unit tests in `test/spec_syntax.py` ## Possible future improvements Random thoughts while working on the PR: - Currently we transform hashes and files into specs during parsing. I think we might want to introduce an additional step and parse special objects like a `FileSpec` etc. in-between parsing and concretization.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposal to extend the spec syntax:
key=valueattributes on the edgesThis is needed to deal with use cases introduced by separate concretization of build dependencies and compiler as dependencies. Details in the SEP.