Skip to content

Environment/depfile: fix bug with Git hash versions#36642

Closed
scheibelp wants to merge 12 commits intospack:developfrom
scheibelp:bugfix/depfile-hash-eq-ver
Closed

Environment/depfile: fix bug with Git hash versions#36642
scheibelp wants to merge 12 commits intospack:developfrom
scheibelp:bugfix/depfile-hash-eq-ver

Conversation

@scheibelp
Copy link
Copy Markdown
Member

If a spec contains a hash=number version, this was resulting in malformed Makefiles generated by spack env depfile (the target cannot include an un-escaped '=' character); for now just remove the version from these targets altogether

…rmed Makefiles generated by 'spack env depfile' (the target cannot include an un-escaped '=' character); for now just remove the version from these targets altogether
@spackbot-app spackbot-app bot added commands core PR affects Spack core functionality labels Apr 4, 2023
@haampie
Copy link
Copy Markdown
Member

haampie commented Apr 4, 2023

Alternatively, can you map all forbidden characters to _?

From https://www.gnu.org/software/make/manual/html_node/Using-Variables.html

A variable name may be any sequence of characters not containing ‘:’, ‘#’, ‘=’, or whitespace.

So for example forbidden_chars = re.compile("[:#=\s]"); sanitize = lambda s: forbidden_chars.sub("_", s)

(Edit: variables are not targets/prerequisites, also $% are not allowed... can you look up what the full list is?)

@scheibelp
Copy link
Copy Markdown
Member Author

Alternatively, can you map all forbidden characters to _?

Yes: although IMO that is of limited use: the resulting string would be something like packagename-longhash_number-spackhash which I consider hard to read (i.e. 72 characters of hashes). I figured since you already went through the effort of adding comments with the full spec, that this part of the identifier was unnecessary (the version is useful in spaces where the same package is being built multiple times, but I'm not sure a reader of the autogenerated makefile would want to be checking in that level of detail).

Do you think a suitable compromise would be to augment those comments with a clear search-able identifier (like "zlib:first7-characters-of-spec-hash") at the beginning of the comment which includes the formatted spec string? Or maybe this can just skip formatting the version for git-hash-based versions?

@haampie
Copy link
Copy Markdown
Member

haampie commented Apr 5, 2023

the version is useful in spaces where the same package is being built multiple times, but I'm not sure a reader of the autogenerated makefile would want to be checking in that level of detail).

Targets are user facing in tab completion, so putting a version there is helpful:

$ spack -e . env depfile -o Makefile --make-prefix x
$ make x/install/p<tab>
x/install/perl-5.36.0-t2y654th7mwtzwforeiiqcl7qgvmnazx
x/install/pkgconf-1.8.0-glgyj6nugckh4btiyi5bcwnisiofhgmn

This is also documented

…; unfortunately the .adjacency list is directly rendered to template
… format function to account for lack of clarity on the allowed syntax of makefile targets
@spackbot-app spackbot-app bot added the tests General test capability(ies) label Apr 5, 2023
@scheibelp
Copy link
Copy Markdown
Member Author

Alternatively, can you map all forbidden characters to _?

To be precise, the notion of "forbidden" characters is more appropriate for makefile variable names, which have a whitelist policy (see here).

The main problem this is trying to solve is Spec.format introducing characters which produce technically valid but meaningless makefiles. To that end, the goal would be to strip any character which has a special meaning in make.

To be safe, I updated the logic to whitelist, but only for the component that is dynamically derived from the spec (i.e I don't format the whole target, just the piece of it that comes from the spec).

I also added a test for this.

@scheibelp scheibelp requested a review from haampie April 5, 2023 22:25
@haampie
Copy link
Copy Markdown
Member

haampie commented Apr 6, 2023

whitelisting sounds good to me 👍

if not isinstance(spec, spack.spec.Spec):
raise ValueError("Internal error: got non-spec object: {0}".format(spec))
tgt = spec.format("{name}-{version}-{hash}")
return re.sub(r"[^A-Za-z0-9_.-]", "_", tgt)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to constructor and compile regex once?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no unified constructor so I created a singleton object for this. I think without the singleton I'd need to modify the MakeTargetVisitor.__init__ to add another attribute (which I'd be fine with).

In general, Python will cache patterns passed to re.compile so these changes feel excessive to me, but I don't think it overcomplicates things to reorganize this way.


monkeypatch.setattr(spack.spec.Spec, "format", _contrived_spec_format)

# Disable jobserver so we can do a dry run.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not using dry run 🤔 tbh, dry run is the best way to validate the makefile, and this is an integration test already, so it might even be better than manually validating the generated output

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dry run could be useful to do in addition to this, but I strongly feel that it is not enough: the problem I ran into involved a makefile which you could use (no syntax errors, and running make proceeds), but was entirely wrong (e.g. would perform installations of packages in the wrong order); I don't think a dry run could catch this. It could be useful for detecting other problems with generating the makefile though.

Copy link
Copy Markdown
Member

@haampie haampie Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show an example? I don't entirely see how it would not result in make error 🤔

I see now (commented in the wrong thread before, sorry)

abc=def: prereq # ordinary definition of `abc`
target: abc=def # target-specific variable value for variable `abc` in target `target`

Notice that a dry-run is more than "it runs", since it will print what it would do in order, which is better than parsing a Makefile. See other tests

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is https://github.com/spack/spack/pull/36642/files#r1160315876 an answer to this? It is to me. The use case I worked from is enormous so I can't post the full file here but:

  • It used a hash=number version for a build dependency
  • make "skipped" installation of that and proceeded directly to installing packages that depended on this build dependency

That's why I think a dry run would not expose the error: you have to actually execute the task.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would need to check but I think the makefile generated by this test should successfully execute a dry run without the changes outside of the unit-tests (I didn't check that as part of this PR because (a) the special characters were the source of the problem in the larger example and (b) I removed those as part of the PR).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The minimal change is 2 insertions(+), 1 deletion(-), and could've been a oneliner if spack wasn't so insistent on needing a git prop in the mock package:

I want to be clear that I strongly recommend against doing anything like what this diff suggests: ultimately it does not matter what properties inside of specs give rise to special characters, only that they could be generated as part of using Spec functionality to generate a makefile target.

There is a better way to do what I did (and I'm happy to try to make that happen), but it is not this particular diff.

instead of verifying the makefile behavior I think it is more-appropriate to verify that no unexpected special characters are introduced.

Sure, but imo an actual unit test would be best then: concrete specs in, adjacency list out; verify nodes, their names, and edges. That requires a bit of refactoring.

My initial interpretation of what you said is to examine something like MakeTargetVisitor.adjacency_list, but really any test that examines entities before the final makefile will be fragile: you have to know which properties to check; if the makefile generation introduces an extra reference to the spec anywhere then that would be another potential source of unwanted special characters.

I could test _fmt_spec_make_target directly (I think this would be closer to what you consider a unit test vs. an integration test), although if that isn't called in some spot where it should be, the makefile will again end up with special characters.

I could refactor the env_depfile function so that it only ever interacts with a new class like MakefileSpec, which could be mocked by the tests to ensure that users never call str on it, and only get special characters if they call an explicit function like MakefileSpec.unsafe_format(). Overall something like.

def env_depfile():
    roots = [s for _, s in env.concretized_specs()]
    all_specs = list(traverse(roots)) # No visitor here: need to convert type
    roots = [MakefileSpec(x) for x in roots]
    all_specs = [MakefileSpec(x) for x in all_specs]
    # At this point we have more control over how the makefile could pull details from the spec
    generate_makefile(roots, all_specs)

does that seem better?

Copy link
Copy Markdown
Member

@haampie haampie Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have an integration test for git versions, then this

if the makefile generation introduces an extra reference to the spec anywhere then that would be another potential source of unwanted special characters.

is not a concern in the unit test.

If I would write it I would just make a small class (the model) that has properties that are passed to template.render (the view) and then unit test the model properties without dealing with the view at all. Also, no SpackCommand (the controller) would be necessary in that test.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the same time, I also don't care too much at this point 😆 as long as the bug fix gets in

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I would write it I would just make a small class (the model) that has properties that are passed to template.render (the view)

I think you're talking about taking the dictionary passed to template.render and turning it into an object:

    rendered = template.render(
        {
            "all_target": get_target("all"),
            "env_target": get_target("env"),

Validating such an object would be complex: the targets in general can have several different special characters, and while I don't think = is among them (which makes it easy to look for that specifically as a problem), there are other special characters that can be generated as part of Spec.format which are used in targets (e.g. %). Therefore, this object would need to differentiate between the spec and non-spec portions of targets, i.e. have functions like:

class Depfile(object):
    def add_spec_target(self, preamble: str, spec: Spec):
        # the preamble can contain "%" or anything we want, and isn't validated
        ...

    def add_target(self, target: str):
        # Hopefully no one called str() on a Spec to get the target
        ....

I think my suggestion from #36642 (comment) would simplify the matter:

I could refactor the env_depfile function so that it only ever interacts with a new class like MakefileSpec (rather than Spec),

The MakefileSpecs (which are not full specs and so can be mocked without monkeypatching) can be fed into the functions more or less the same as before to generate the template input dictionary; I could split the logic to then stop before rendering the template (to me, creating the template does not make this an integration test, although from discussion here I gather you think it does).

Copy link
Copy Markdown
Member

@haampie haampie Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validating such an object would be complex

I'm not sure I follow 🤔 I've implemented

If I would write it I would just make a small class (the model) that has properties that are passed to template.render (the view)

in #36911, and it's not complex to do data = MakefileModel(...).to_dict() and test correctness of data. You could mock MakefileModel's Spec -> str function to produce something distinct and verify that it pops up in data for entries that are targets, etc, so you know the format function is called in the right places.

env(
"depfile",
"-o",
makefile,
Copy link
Copy Markdown
Member

@haampie haampie Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can save a few lines by dropping -o makefile and use what would be stdout (if not dry run)

Comment on lines +3151 to +3154
with open(makefile, "r") as f:
makefile_lines = f.readlines()

makefile_contents = "\n".join(makefile_lines)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with open(makefile, "r") as f:
makefile_lines = f.readlines()
makefile_contents = "\n".join(makefile_lines)

see above

Comment on lines +3130 to +3138
orig_format = spack.spec.Spec.format

def _contrived_spec_format(obj, format_str):
if obj.name == "dtlink1":
return "dtlink1-githash=version-spackhash"
else:
return orig_format(obj, format_str)

monkeypatch.setattr(spack.spec.Spec, "format", _contrived_spec_format)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you already have mock_packages, can't you actually use a normal git version with user specified spack version to the same effect? That would be less contrived

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid patching the packages to achieve this condition: IMO the source of all errors like the one this PR aims to correct is specially-interpreted characters via Spec.format; therefore I argue that this is the least-contrived way to test the outcome (changing things about the versions etc. would silently render this test useless if spec.format happened to behave differently in some scenarios vs. others).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it's a rule of thumb not to mock in integration tests, cause assumptions on the implementation make the test redundant on refactor. If you want an integration test for git versions, then just use a git version. If you want to test normalization of targets, then write a unit test. In both cases you don't need to use monkeypatch

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to test normalization of targets

I don't want to normalize the entire target: the depfile uses special characters in targets for their intended effect. The issue is when those targets are partially defined using Spec.format, which introduces unexpected special characters.

If you want an integration test for git versions, then just use a git version.

I don't think git versions are the (real) problem. For a system that generates makefiles of spack installs and needs to create targets based on the Specs, each such target is composed of:

  • Static components relevant to makefile processing
  • Details extracted from the Spec itself

The latter is the only place that can surprise us. The current logic has no formalized abstraction for the latter and directly uses .format. I considered it appropriate to just alter Spec.format, but it would be more-robust to mock the Spec object entirely.

Hm, it's a rule of thumb not to mock in integration tests, cause assumptions on the implementation make the test redundant on refactor.

given the above, while I'd say it is strictly true that this mocking makes an assumption in the implementation, setting a git version is even more a violation of this rule (conditional changes to str() of version could render this test useless: why not target the point where the depfile logic directly interfaces with the Spec API?).

Copy link
Copy Markdown
Member

@haampie haampie Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for being so insistent, but I'd rather see monkeypatch dropped. The other point is that Spec.format is so commonly called, mocking it will likely break invariants elsewhere in the future.

Please just add an integration/regression test with depfile and git versions which is the problem you're solving in the PR (you can just adapt the test above the one you've added), and if you don't like integration tests, refactor so that you can unit test the adjacency list

scheibelp added a commit to scheibelp/spack that referenced this pull request May 8, 2023
@scheibelp scheibelp closed this May 8, 2023
haampie pushed a commit to scheibelp/spack that referenced this pull request Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commands core PR affects Spack core functionality tests General test capability(ies)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants