Environment/depfile: fix bug with Git hash versions#36642
Environment/depfile: fix bug with Git hash versions#36642scheibelp wants to merge 12 commits intospack:developfrom
Conversation
…rmed Makefiles generated by 'spack env depfile' (the target cannot include an un-escaped '=' character); for now just remove the version from these targets altogether
|
Alternatively, can you map all forbidden characters to
So for example (Edit: variables are not targets/prerequisites, also |
Yes: although IMO that is of limited use: the resulting string would be something like Do you think a suitable compromise would be to augment those comments with a clear search-able identifier (like "zlib:first7-characters-of-spec-hash") at the beginning of the comment which includes the formatted spec string? Or maybe this can just skip formatting the version for git-hash-based versions? |
Targets are user facing in tab completion, so putting a version there is helpful: This is also documented |
…; unfortunately the .adjacency list is directly rendered to template
…ed directly into the template: headache
… to create distinct name for test
… format function to account for lack of clarity on the allowed syntax of makefile targets
To be precise, the notion of "forbidden" characters is more appropriate for makefile variable names, which have a whitelist policy (see here). The main problem this is trying to solve is To be safe, I updated the logic to whitelist, but only for the component that is dynamically derived from the spec (i.e I don't format the whole target, just the piece of it that comes from the spec). I also added a test for this. |
|
whitelisting sounds good to me 👍 |
lib/spack/spack/cmd/env.py
Outdated
| if not isinstance(spec, spack.spec.Spec): | ||
| raise ValueError("Internal error: got non-spec object: {0}".format(spec)) | ||
| tgt = spec.format("{name}-{version}-{hash}") | ||
| return re.sub(r"[^A-Za-z0-9_.-]", "_", tgt) |
There was a problem hiding this comment.
move to constructor and compile regex once?
There was a problem hiding this comment.
There is no unified constructor so I created a singleton object for this. I think without the singleton I'd need to modify the MakeTargetVisitor.__init__ to add another attribute (which I'd be fine with).
In general, Python will cache patterns passed to re.compile so these changes feel excessive to me, but I don't think it overcomplicates things to reorganize this way.
|
|
||
| monkeypatch.setattr(spack.spec.Spec, "format", _contrived_spec_format) | ||
|
|
||
| # Disable jobserver so we can do a dry run. |
There was a problem hiding this comment.
You're not using dry run 🤔 tbh, dry run is the best way to validate the makefile, and this is an integration test already, so it might even be better than manually validating the generated output
There was a problem hiding this comment.
Dry run could be useful to do in addition to this, but I strongly feel that it is not enough: the problem I ran into involved a makefile which you could use (no syntax errors, and running make proceeds), but was entirely wrong (e.g. would perform installations of packages in the wrong order); I don't think a dry run could catch this. It could be useful for detecting other problems with generating the makefile though.
There was a problem hiding this comment.
Can you show an example? I don't entirely see how it would not result in make error 🤔
I see now (commented in the wrong thread before, sorry)
abc=def: prereq # ordinary definition of `abc`
target: abc=def # target-specific variable value for variable `abc` in target `target`Notice that a dry-run is more than "it runs", since it will print what it would do in order, which is better than parsing a Makefile. See other tests
There was a problem hiding this comment.
Is https://github.com/spack/spack/pull/36642/files#r1160315876 an answer to this? It is to me. The use case I worked from is enormous so I can't post the full file here but:
- It used a
hash=numberversion for a build dependency make"skipped" installation of that and proceeded directly to installing packages that depended on this build dependency
That's why I think a dry run would not expose the error: you have to actually execute the task.
There was a problem hiding this comment.
I would need to check but I think the makefile generated by this test should successfully execute a dry run without the changes outside of the unit-tests (I didn't check that as part of this PR because (a) the special characters were the source of the problem in the larger example and (b) I removed those as part of the PR).
There was a problem hiding this comment.
The minimal change is 2 insertions(+), 1 deletion(-), and could've been a oneliner if spack wasn't so insistent on needing a git prop in the mock package:
I want to be clear that I strongly recommend against doing anything like what this diff suggests: ultimately it does not matter what properties inside of specs give rise to special characters, only that they could be generated as part of using Spec functionality to generate a makefile target.
There is a better way to do what I did (and I'm happy to try to make that happen), but it is not this particular diff.
instead of verifying the makefile behavior I think it is more-appropriate to verify that no unexpected special characters are introduced.
Sure, but imo an actual unit test would be best then: concrete specs in, adjacency list out; verify nodes, their names, and edges. That requires a bit of refactoring.
My initial interpretation of what you said is to examine something like MakeTargetVisitor.adjacency_list, but really any test that examines entities before the final makefile will be fragile: you have to know which properties to check; if the makefile generation introduces an extra reference to the spec anywhere then that would be another potential source of unwanted special characters.
I could test _fmt_spec_make_target directly (I think this would be closer to what you consider a unit test vs. an integration test), although if that isn't called in some spot where it should be, the makefile will again end up with special characters.
I could refactor the env_depfile function so that it only ever interacts with a new class like MakefileSpec, which could be mocked by the tests to ensure that users never call str on it, and only get special characters if they call an explicit function like MakefileSpec.unsafe_format(). Overall something like.
def env_depfile():
roots = [s for _, s in env.concretized_specs()]
all_specs = list(traverse(roots)) # No visitor here: need to convert type
roots = [MakefileSpec(x) for x in roots]
all_specs = [MakefileSpec(x) for x in all_specs]
# At this point we have more control over how the makefile could pull details from the spec
generate_makefile(roots, all_specs)
does that seem better?
There was a problem hiding this comment.
If you have an integration test for git versions, then this
if the makefile generation introduces an extra reference to the spec anywhere then that would be another potential source of unwanted special characters.
is not a concern in the unit test.
If I would write it I would just make a small class (the model) that has properties that are passed to template.render (the view) and then unit test the model properties without dealing with the view at all. Also, no SpackCommand (the controller) would be necessary in that test.
There was a problem hiding this comment.
At the same time, I also don't care too much at this point 😆 as long as the bug fix gets in
There was a problem hiding this comment.
If I would write it I would just make a small class (the model) that has properties that are passed to template.render (the view)
I think you're talking about taking the dictionary passed to template.render and turning it into an object:
rendered = template.render(
{
"all_target": get_target("all"),
"env_target": get_target("env"),
Validating such an object would be complex: the targets in general can have several different special characters, and while I don't think = is among them (which makes it easy to look for that specifically as a problem), there are other special characters that can be generated as part of Spec.format which are used in targets (e.g. %). Therefore, this object would need to differentiate between the spec and non-spec portions of targets, i.e. have functions like:
class Depfile(object):
def add_spec_target(self, preamble: str, spec: Spec):
# the preamble can contain "%" or anything we want, and isn't validated
...
def add_target(self, target: str):
# Hopefully no one called str() on a Spec to get the target
....
I think my suggestion from #36642 (comment) would simplify the matter:
I could refactor the env_depfile function so that it only ever interacts with a new class like MakefileSpec (rather than Spec),
The MakefileSpecs (which are not full specs and so can be mocked without monkeypatching) can be fed into the functions more or less the same as before to generate the template input dictionary; I could split the logic to then stop before rendering the template (to me, creating the template does not make this an integration test, although from discussion here I gather you think it does).
There was a problem hiding this comment.
Validating such an object would be complex
I'm not sure I follow 🤔 I've implemented
If I would write it I would just make a small class (the model) that has properties that are passed to template.render (the view)
in #36911, and it's not complex to do data = MakefileModel(...).to_dict() and test correctness of data. You could mock MakefileModel's Spec -> str function to produce something distinct and verify that it pops up in data for entries that are targets, etc, so you know the format function is called in the right places.
| env( | ||
| "depfile", | ||
| "-o", | ||
| makefile, |
There was a problem hiding this comment.
You can save a few lines by dropping -o makefile and use what would be stdout (if not dry run)
| with open(makefile, "r") as f: | ||
| makefile_lines = f.readlines() | ||
|
|
||
| makefile_contents = "\n".join(makefile_lines) |
There was a problem hiding this comment.
| with open(makefile, "r") as f: | |
| makefile_lines = f.readlines() | |
| makefile_contents = "\n".join(makefile_lines) |
see above
| orig_format = spack.spec.Spec.format | ||
|
|
||
| def _contrived_spec_format(obj, format_str): | ||
| if obj.name == "dtlink1": | ||
| return "dtlink1-githash=version-spackhash" | ||
| else: | ||
| return orig_format(obj, format_str) | ||
|
|
||
| monkeypatch.setattr(spack.spec.Spec, "format", _contrived_spec_format) |
There was a problem hiding this comment.
Since you already have mock_packages, can't you actually use a normal git version with user specified spack version to the same effect? That would be less contrived
There was a problem hiding this comment.
I wanted to avoid patching the packages to achieve this condition: IMO the source of all errors like the one this PR aims to correct is specially-interpreted characters via Spec.format; therefore I argue that this is the least-contrived way to test the outcome (changing things about the versions etc. would silently render this test useless if spec.format happened to behave differently in some scenarios vs. others).
There was a problem hiding this comment.
Hm, it's a rule of thumb not to mock in integration tests, cause assumptions on the implementation make the test redundant on refactor. If you want an integration test for git versions, then just use a git version. If you want to test normalization of targets, then write a unit test. In both cases you don't need to use monkeypatch
There was a problem hiding this comment.
If you want to test normalization of targets
I don't want to normalize the entire target: the depfile uses special characters in targets for their intended effect. The issue is when those targets are partially defined using Spec.format, which introduces unexpected special characters.
If you want an integration test for git versions, then just use a git version.
I don't think git versions are the (real) problem. For a system that generates makefiles of spack installs and needs to create targets based on the Specs, each such target is composed of:
- Static components relevant to makefile processing
- Details extracted from the Spec itself
The latter is the only place that can surprise us. The current logic has no formalized abstraction for the latter and directly uses .format. I considered it appropriate to just alter Spec.format, but it would be more-robust to mock the Spec object entirely.
Hm, it's a rule of thumb not to mock in integration tests, cause assumptions on the implementation make the test redundant on refactor.
given the above, while I'd say it is strictly true that this mocking makes an assumption in the implementation, setting a git version is even more a violation of this rule (conditional changes to str() of version could render this test useless: why not target the point where the depfile logic directly interfaces with the Spec API?).
There was a problem hiding this comment.
I'm sorry for being so insistent, but I'd rather see monkeypatch dropped. The other point is that Spec.format is so commonly called, mocking it will likely break invariants elsewhere in the future.
Please just add an integration/regression test with depfile and git versions which is the problem you're solving in the PR (you can just adapt the test above the one you've added), and if you don't like integration tests, refactor so that you can unit test the adjacency list
…on module import while still only compiling the pattern once
If a spec contains a
hash=numberversion, this was resulting in malformed Makefiles generated byspack env depfile(the target cannot include an un-escaped '=' character); for now just remove the version from these targets altogether