remove shlex.split() preprocessing from the spec lexer/parser#33016
remove shlex.split() preprocessing from the spec lexer/parser#33016cosmicexplorer wants to merge 15 commits intospack:developfrom
Conversation
|
cc @psakievich @trws: would love your input on any further annoyances with spec parsing that I may have missed. I kinda punted on trying to interpret inner quotes, so any examples of that would be useful as well. My goal is to demonstrate that this approach can be extended to solve lingering UX issues with spec parsing. |
|
@trws noted that we still join specs from the command line with
I'll make test cases out of these. |
|
@cosmicexplorer I'm a little late at getting up to speed on all your recent work. Looks good so far from what I can tell. I see you've absorbed the It seemed to work with a single string (passed the tests for |
|
@psakievich thank you! I had actually removed the
There has definitely been an explosion of lexer modes in this PR (e.g. four modes for different types of single and double quoting), and it would probably improve readability and performance if we could slim those down somewhat, but I think I'm focusing on completing the functionality first (see Tom's spec inputs above). |
|
That makes sense. I'll try to give this a closer look later today and see if I can come up with any more suggestions. Personally, I need a quiet room and a security blanket close at hand when dealing with Spack's parser. 😁 |
psakievich
left a comment
There was a problem hiding this comment.
I really like these changes, but I am finding myself getting a little lost in the design. A couple of thoughts to improve readability and testability of the code here.
| __slots__ = "scanners", "original_mode", "mode", "switchbook" | ||
|
|
||
| def __init__(self, lexicon_and_mode_switches): | ||
| # type: (List[Tuple[str, _Lexicon, Dict[str, List[Any]]]]) -> None |
There was a problem hiding this comment.
Is there any reason to not make these commented type definitions into actual types? From a code readability stand point trying to follow the construction of this object, and the implementation in spec.py would be simplified if this conglomeration of types were members inside a class.
We could also test each phase individually in spec.py if we instantiated the objects outside the Lexer.
|
|
||
| try: | ||
| while self.next: | ||
| if self.accept(SINGLE_QUOTE): |
There was a problem hiding this comment.
I've always found this section of the code really difficult to follow because of all the nested if/else's.
I'd suggest any new additions should be functions so we get a better description of what is going on through the name. Maybe something like:
def starts_with_quotes(self, quote_type, specs):
# same code for single and double
def process_initial_quotes(self, specs):
# replace both new if statements with this function
starts_with_quotes(SINGLE_QUOTE, specs)
starts_with_quotes(DOUBLE_QUOTE, specs)|
Superseded by #34151 |
This change is on top of #29093; see the diff against it at https://github.com/cosmicexplorer/spack/compare/arbitrary-lexer-modes...remove-shlex-lex-preprocessing?expand=1.
Problem
See discussion at #29093 (comment). We currently call
shlex.split()before lexing spec strings, and this is the main reason why callingstr()on a spec instance often produces an unparseable string. We would like to be able to call e.g.Spec(str(Spec('a b="c d"')))and get the same spec. Unfortunately, ondevelopthat produces the following:Solution
shlex.split().shlex.split()in our spec parsing entry point, and move the work of interpreting quotes to the lexer.Result
As discussed in #29093 (comment), this change induces a minor performance regression (this PR even more so than #29093). The regression seems very slight, so I'd like to ignore it, but we'd need to get consensus that this change is enough of a UX improvement to justify that.
It also may be possible to optimize this further; I have not focused on optimizing this approach at all yet.