Skip to content

Improved support for Go based packages#14061

Closed
hartzell wants to merge 18 commits intospack:developfrom
hartzell:feature/go-package-support
Closed

Improved support for Go based packages#14061
hartzell wants to merge 18 commits intospack:developfrom
hartzell:feature/go-package-support

Conversation

@hartzell
Copy link
Copy Markdown
Contributor

@hartzell hartzell commented Dec 9, 2019

TODO:


Applications that use Go should not download dependencies at build
time. This commit adds documentation and tooling to that end. It
also cleans up existing packages, continuing to support existing
versions where possible and removing them when necessary.

It:

  • Adds a new package directive, import_resources, which simplifies
    importing large numbers of resources. import_resources reads a
    JSON file containing an array of package definitions.

  • Adds GoPackage, which provides support for simple Go applications

    • adds GoPackage;
    • hooks it into spack create; and
    • documents it in the "Build Systems" section.
  • Documents, in the "Build Systems" documentation for GoPackage, how
    to handle:

    • easy applications that use Go modules and vendor their
      dependencies;
    • slightly more complicated applications that use Go modules but do
      not
      vendor their dependencies (use go mod download and
      modules2tuple to generate resources that are imported via
      import_resources); and
    • difficult applications that need to be handled on a one-off basis.
  • Adds/updates some existing packages to the use the New World Order:

    • direnv -- Had a vendor dir, but it wasn't being used, the fix
      is slightly tricksy.
      • No change in supported versions
    • docui -- A new package that uses GoPackage, it does not have
      vendored dependencies so it declares resources via
      import_resources.
      • New application, no previously supported versions.
    • fzf -- The versions that we have use Glide and don't have a
      vendor dir; their Makefile populates vendor at install time.
      Newer releases (0.18.0, 0.19.0) use go.mod. This drops the older
      releases as unsupportable, adds 0.1[89].0, and cleans up the
      package.
      • Deletes versions '0.17.5', '0.17.4', '0.17.3', '0.17.1', '0.17.0-2
        '0.17.0', '0.16.11' '0.16.10' '0.16.9', '0.16.8'.
      • Adds versions '0.19.0', '0.18.0'.
    • git-lfs -- It vendored its dependencies but they weren't being
      used. Set GO111MODULE and GOFLAGS so that they are.
      • No change in supported versions.
    • glow -- a new package that uses GoPackage, uses modules but
      does not vendor it's dependencies.
      • New application, no previously supported versions.
    • go-md2man -- Has vendored its dependencies, but we weren't
      using them; switched to GoPackage and now we are.
      • No change in supported versions.
    • hugo -- Well mannered, uses modules and go build but no
      vendored depdendencies; switched to GoPackage and
      import_resources.
      • No change in supported versions.
    • lazydocker -- A new package that uses GoPackage and has
      vendored dependencies.
      • New application, no previously supported versions.
    • lazygit -- A new package that uses GoPackage and has vendored
      dependencies.
      • New application, no previously supported versions.
    • modules2tuple -- For now, my fork of the FreeBSD tool, which
      can generate resource statements; it uses go.mod and has no
      dependencies.
      • When the dust settles my changes will be merged upstream and
        this package should be updated to point there.
      • New application, no previously supported versions.
    • rclone -- Well behaved, uses go.mod and has vendor its
      dependencies; switched to GoPackage.
      • No change in supported versions.
    • singularity - releases from 3.2.0-> seem to be well behaved
      wrt modules. I've taken the liberty to drop 3.1.1 because
      making it work is messy.
      • Deletes '3.1.1'.
    • skopeo -- Recent versions support modules and have vendored
      things. I've constrained go to @1.11 and disabled the older
      versions...
      • Deletes '0.1.37' and '0.1.36'.
    • umoci -- They have a vendor dir and @0.4.3: have go.mod, but
      they don't appear to use it in their build. This touches up their
      Makefile so that go runs in module mode and makes the install do
      the right thing. BUT, only for 0.4.3:.
      • Deletes '0.4.2', '0.4.1', and '0.4.0'.

As applications adopt the Go modules system and become easier to deal
with, the community will need to decide how to deprecate and remove
support for old, non-conforming versions.

Closes #13023
Closes #13973

@adamjstewart adamjstewart added build-systems documentation Improvements or additions to documentation go labels Dec 9, 2019
@hartzell
Copy link
Copy Markdown
Contributor Author

hartzell commented Dec 9, 2019

Looks like a False Negative test failure. Can I rerun them w/out pushing a meaningless commit?

@scheibelp
Copy link
Copy Markdown
Member

Looks like a False Negative test failure. Can I rerun them w/out pushing a meaningless commit?

Closing/reopening will do that. Also, I (as well as other Spack core devs) can restart the tests individually so I have done that in this case to speed things along (generally though the CI doesn't take long to re-run from scratch and will probably happen faster than pinging a core dev).

Copy link
Copy Markdown
Member

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments/suggestions. Overall, great work wrangling the Go build environment! My main question is why there are so many Go packages in Spack that don't use the GoPackage base class. Maybe we could do something like PerlPackage does where it supports both Makefile.PL and Build.PL build methods.

@adamjstewart
Copy link
Copy Markdown
Member

One more question. Should we preface all Go packages with go-* like we do for Python/Perl/R?

@adamjstewart adamjstewart changed the title Improved for support for Go based packages Improved support for Go based packages Dec 10, 2019
@hartzell
Copy link
Copy Markdown
Contributor Author

Should we preface all Go packages with go-*

I don't think so, any more than we have rust-ripgrep (hey, we don't have ripgrep!) or c-gcc or c-c++-fortran-openblas (just made that up...).

Given the way that Go works, there will never be any Go library packages (they'll all be resources defined within their consumers) and while it's interesting to call out the implementation language in a Hacker News link I don't see any point here.

@hartzell
Copy link
Copy Markdown
Contributor Author

My main question is why there are so many Go packages in Spack that don't use the GoPackage base class.

Mostly because only the simplest of projects (e.g. build a single executable) can get away with it (e.g. lazygit).

The go build tool's capabilities sit somewhere between cc and pip/cpanm.

It's sophisticated enough (e.g. build tags, build constraints, go generate, and so on) that its often all that's needed to build one or more executables.

But, like C/C++/Fortran/etc projects, there are things like running markdown through gomd2man to build roff manpages, running git to get the current commit so that you can embed it in the executable (via args to go build...), and etc.... that are more easily done via shell commands and are traditionally organized with a Makefile. And, like anything involving a makefile, the details depend on whose traditions you grew up with.

Plus, some people Just Like Makefiles(TM); more seriously their might be a go-centric way to build a project but the authors just reach for the tool they know.

I thought about dragging bits of MakefilePackage into GoPackage but it seems cleaner to just use the appropriate tool for the job.

Go's story has also been evolving. The language itself has a crazy-good backward compatibility promise and story, but the path of the build tooling has been bumpier. In the early days it did a really good job of building things the way they do inside Google (mono-repo centric, etc...) but wasn't really a good fit for the unix-y way of doing things. GOPATH is the canonical example here.

@hartzell
Copy link
Copy Markdown
Contributor Author

My main question is why there are so many Go packages in Spack that don't use the GoPackage base class.

ps. There might be additional consolidation opportunities for applications that build with go, but I'd prefer to wait and a) see how projects adapt-to/adopt modules and b) get a handful of real use-cases for further generalization. Otherwise I think YAGNI.

@hartzell
Copy link
Copy Markdown
Contributor Author

@adamjstewart -- any thoughts about the import_resources directive? It feels like the biggest change here.

@hartzell
Copy link
Copy Markdown
Contributor Author

My main question is why there are so many Go packages in Spack that don't use the GoPackage base class.

pps. It seems like there should be a place in the doc tree to have a general discussion of issues related to writing package definitions for applications written in , but I couldn't find one.

I guess that python/perl/... discussions fall easily into {Python,Perl}Package and the details of using C end up in the compilers section while the actual package definition falls into e.g. MakefilePackage or ... For now I've tucked the discussion into GoPackage, but I'm wondering if there's a need (although YAGNI...) for another docs section?

@adamjstewart
Copy link
Copy Markdown
Member

@adamjstewart -- any thoughts about the import_resources directive? It feels like the biggest change here.

It's been years since we added a new directive. I see why it's useful here, so I have no objections to adding it. I'll let @scheibelp @becker33 @alalazo review that part.

@hartzell
Copy link
Copy Markdown
Contributor Author

My last change to improve exception handling works for spack test directives but now that I see the Travis output, seems to have broken the daylights out of everything else. I'm digging into it.

@hartzell
Copy link
Copy Markdown
Contributor Author

broken the daylights out of everything else.

Seems that having erroneous packages in the mock repository leads to bad things happening. I've asked for feedback/help on the Google group, input welcome...

Applications that use Go should not download dependencies at build
time.  This commit adds documentation and tooling to that end.  It
also cleans up existing packages, continuing to support existing
versions where possible and removing them when necessary.

It:

- Adds a new package directive, `import_resources`, which simplifies
  importing large numbers of resources.  `import_resources` reads a
  JSON file containing an array of package definitions.

- Adds `GoPackage`, which provides support for simple Go applications
  - adds `GoPackage`;
  - hooks it into `spack create`; and
  - documents it in the "Build Systems" section.

- Documents, in the "Build Systems" documentation for GoPackage, how
  to handle:
  - easy applications that use Go modules *and* vendor their
    dependencies;
  - slightly more complicated applications that use Go modules *but do
    not* vendor their dependencies (use `go mod download` and
    `modules2tuple` to generate resources that are imported via
    `import_resources`); and
  - difficult applications that need to be handled on a one-off basis.

- Adds/updates some existing packages to the use the New World Order:
  - [x] direnv -- Had a vendor dir, but it wasn't being used, the fix
    is slightly tricksy.
    - No change in supported versions
  - [x] docui -- A new package that uses GoPackage, it does not have
    vendored dependencies so it declares resources via
    `import_resources`.
    - New application, no previously supported versions.
  - [x] fzf -- The versions that we have use Glide and don't have a
    vendor dir; their Makefile populates vendor at install time.
    Newer releases (0.18.0, 0.19.0) use go.mod.  This drops the older
    releases as unsupportable, adds 0.1[89].0, and cleans up the
    package.
    - Deletes versions '0.17.5', '0.17.4', '0.17.3', '0.17.1', '0.17.0-2
      '0.17.0', '0.16.11' '0.16.10' '0.16.9', '0.16.8'.
    - Adds versions '0.19.0', '0.18.0'.
  - [x] git-lfs -- It vendored its dependencies but they weren't being
    used.  Set GO111MODULE and GOFLAGS so that they are.
    - No change in supported versions.
  - [x] go-md2man -- Has vendored its dependencies, but we weren't
    using them; switched to GoPackage and now we are.
    - No change in supported versions.
  - [x] hugo -- Well mannered, uses modules and `go build` but no
    vendored depdendencies; switched to GoPackage and
    `import_resources`.
    - No change in supported versions.
  - [x] lazydocker -- A new package that uses GoPackage and has
        vendored dependencies.
    - New application, no previously supported versions.
  - [x] lazygit -- A new package that uses GoPackage and has vendored
        dependencies.
    - New application, no previously supported versions.
  - [x] modules2tuple -- For now, my fork of the FreeBSD tool, which
    can generate resource statements; it uses go.mod and has no
    dependencies.
    - When the dust settles my changes will be merged upstream and
      this package should be updated to point there.
    - New application, no previously supported versions.
  - [x] rclone -- Well behaved, uses go.mod and has vendor its
    dependencies; switched to GoPackage.
    - No change in supported versions.
  - [x] singularity - releases from 3.2.0-> seem to be well behaved
    wrt modules.  I've taken the liberty to drop 3.1.1 because
    making it work is messy.
    - Deletes '3.1.1'.
  - [x] skopeo -- Recent versions support modules and have vendored
    things.  I've constrained go to @1.11 and disabled the older
    versions...
    - Deletes '0.1.37' and '0.1.36'.
  - [x] umoci -- They have a vendor dir and @0.4.3: have go.mod, but
    they don't appear to use it in their build.  This touches up their
    Makefile so that go runs in module mode and makes the install do
    the right thing.  BUT, only for 0.4.3:.
    - Deletes '0.4.2', '0.4.1', and '0.4.0'.

As applications adopt the Go modules system and become easier to deal
with, the community will need to decide how to deprecate and remove
support for old, non-conforming versions.

Closes spack#13023
Closes spack#13973
- don't assume order in lists, use set()'s
- touch up ReST in gopackage docs
...still need to add

- Important Files section
- address build_args
- add section with links to additional reading.
Avoid footshooting if/when user overrides setup_build_environment and
this ends up not being set....
- Not everything involved in a test needs a "test" prefix.  Do what
  the other test repos do and use evocative names that begin with
  "import-resources".
- Produce more useful exception messages, in particular for bad JSON
  errors, and update tests to test for them.
@hartzell hartzell force-pushed the feature/go-package-support branch from a93a1f1 to 6ba19aa Compare January 7, 2020 00:20
@hartzell
Copy link
Copy Markdown
Contributor Author

hartzell commented Jan 7, 2020

updated to pick up new singularity version and latest changes.

@scheibelp scheibelp self-assigned this Jan 25, 2020
@scheibelp

This comment has been minimized.

Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some initial comments but I think I have a more fundamental question that would make sense to answer before you dive into addressing the individual comments (which most likely arises from my unfamiliarity with Go and its ecosystem):

I initially wondered why not use go get -d to fetch everything up-front. It looks like GoFetchStrategy already does go get -d, but doesn't handle the archiving of downloaded resources for offline installations; I assume that go get -d isn't repeatable in the sense that it's possible that different dependencies may be retrieved (e.g. if they are updated between the two invocation of go get -d) so checksums would be useless, but this could be similar to git (which also doesn't use checksums).

The only package that appears to use GoFetchStrategy is the-platinum-searcher, so I'm curious if there are other issues with go get -d. If GoFetchStrategy were updated to handle archival, do you think it would be easier to make Go packages in Spack (vs. using the approach you have provided here)?

from spack import *


class ImportResourcesZero(Package):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to test the robustness of the system, or is there a pragmatic use for empty resource files? I suppose if someone integrated modules2tuple into an automatic pipeline that it could generate empty resource files?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

robustness

# json error message never says "JSON", be helpful...
raise ResourcesFileError(
"Unable to load resources file, bad JSON: {0}".format(e))
except Exception as e:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking the only two exceptions that would be raised here are JSON issues, and SpackErrors (other types of errors would be bugs in Spack). Furthermore, the SpackErrors will already be formatted to our satisfaction, so I'm inclined to say there is no need to go out of our way to catch other exceptions other than the JSONDecodeError.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're suggesting that I should delete the except Exception as e block? Anything else I need to do to pass the rest of the errors up (my python hats over on the coatrack...)?

version('2.0', 'hash2')
version('1.0', 'hash1')

import_resources("resources.json")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to have to brainstorm more on how best to handle this:

Any test that ends up iterating over packages will fail because of this test package (because this is executed at package definition time). See https://travis-ci.org/spack/spack/jobs/633540326?utm_medium=notification&utm_source=github_status for an example.

Spack tests generally step around this issue by testing the underlying directive functions directly (see patch.py and variant.py test modules). You can also define an entirely separate mock repo (which would probably be useful for us to test other package failures).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to have to brainstorm more on how best to handle this:

I'll wait and see what the storm brings.

version('0.18.0', sha256='5406d181785ea17b007544082b972ae004b62fb19cdb41f25e265ea3cc8c2d9d')

depends_on('[email protected]:')
import_resources('resources-0.19.0.json')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you don't include a when constraint? I assume when='@0.19.0 would be correct. To be clear it's not technically wrong to make this unqualified.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The when constraints are included in the json definitions, which are then fed into the resource parser. I fixed the bad example in the GoPackage docs above.

@hartzell
Copy link
Copy Markdown
Contributor Author

hartzell commented Feb 1, 2020

I initially wondered why not use go get -d to fetch everything up-front. [...]

Summary

I've written up a couple responses to this and/but am having a tough time coming up with something brief/simple (cue obligatory Einstein quote). I can't seem to say less, but I can definitely say more, so if something's not clear, please ask....

This PR tries to meet Spack's goals (repeatability via signatures; caching and airgapped builds) using Spack's techniques and tooling. The things required to build a package have standard Spack signatures and are fetched, cached, and unpacked into the build tree using existing Spack machinery. It interacts with the Go build process via the vendor dir, which has a backward-compatibility promise. The only extension is support for bulk specification of resources, which isn't required but helps keep packages neat.

In contrast to using our existing fetch machinery, it's clearly possible to write an additional fetcher that uses go get to retrieve dependencies and write code to insert those artifacts into the build process (GoFetchStrategy and the-platinum-searcher provide an existence proof).

But, go get currently and historically uses the GOPATH dir, the structure of the contents of GOPATH (which is deprecated, see below) has no particular backward compatibility guarantee.

Furthermore, a go get approach would need to either eschew repeatable builds, trust the Go modules mechanism for repeatability/security (see below) or rely on Spack signatures in the package definition. The last would require adding/supporting code to create and specify those signatures, making it at least as complex/hard as this PR's approach.

I think that this is both "simple" and "easy", in Hickey's sense, while there may be many resources they're discrete (hence "simple") and the approach uses existing Spack tooling (hence "easy"). The one new bit of code, which automates resource definition creation, can share code with a sibling project (although if we have to Invent It Here, someone certainly can).

It's not magical or lazy but I think that it strikes the correct balance between the effort it requires from package authors and the effort (additional code, cognitive/support load) it requires from the Spack tooling.

It (and #13973) also opens the door to supporting other languages without explicitly building Spack packages for everything in their PyPi/CPAN equivalent. pypi-poet does/did something similar for Python applications in Homebrew (that ship has sailed in Spack, but it's still an interesting thought experiment, see also #8364). It might still work well for us with Lua, Haskell, Rust, or ....

Details

Some relevant Go factoids:

  • Historically Go organized "all" of your projects within your GOPATH, you'd pull the dependencies into GOPATH with go get ... and build with go build. Dependencies were not explicitly listed anywhere, they were figured out at build time. This approach doesn't align well with traditional project-based organization. Dependency versions weren't managed so repeatable builds were hard.
  • Go's new system for managing dependencies, "modules", (read more here) tracks them in a project-level go.mod file and ensures repeatable builds via signatures in a go.sum file. It deprecates GOPATH. Like Spack, it provides repeatable builds via signatures on its dependencies.
  • There are two-ish places that the Go build process looks for it's dependencies:
    • the place go get puts them (the actual location is a private implementation detail, currently within GOPATH, but that's deprecated, so....); and
    • within a vendor directory in the project.

Repeatability

A design for packaging Go-based applications in Spack needs to decide how to ensure that the things being used during the build are what they're expected to be. There seem to be three solutions:

  1. Punt, like we currently do for lua-rocks.
  2. Rely, transitively, on the repeatability promise of the Go modules system.
  3. Use Spack's existing tooling (signatures and resources).

The first choice isn't great, particularly given this PR as an existence proof of a better alternative.

The second choice requires no additional work of package authors. The standard GoPackage build step just becomes go build (which enforces the signatures in the go.sum file). This requires an additional conversation with our inner skeptics and our local sysadmin/security teams to explain why these applications are repeatable/trustworthy (just as we've had similar conversations justifying the lack of signatures for packages cloned from repositories via secure commit hashes).

The third choice (which I went with) is to find some manageable unit, calculate Spack signatures for it and use it in the build process. This requires additional work by the package author/maintainer. It can use existing Spack mechanisms. It doesn't require additional conversations with our skeptics (or coworkers). The unit to be managed is an open question (see below).

Caching

Spack generally supports "air gapped" builds and packaging for Go-based applications should too. While a system that grabbed the dependencies at build time might meet Spack's repeatability goal (see above), it wouldn't allow offline builds.

Spack has machinery for fetching and caching application source code and additional resources; this tooling is used by nearly all of the packaged applications. Bug fixes and improvements are localized and benefit all of the packages that rely on the standard machinery.

The Go module system also supports caching via "proxies" that satisfy the requests made by go build. Spack's Go support might rely on this but it would either be in addition to or on top of Spack's cache system. This is unattractive, particularly given that we can work with just Spack's tooling (e.g. this PR).

That leaves the question of what unit to work with, how to fetch it, and how to wire it into the build:

  1. (this PR) Describe each dependency at it's upstream location, fetch it (using standard Spack machinery), calculate/check a signature on it (using standard Spack machinery), and wire it into the build (using standard ...).
  2. Write a new fetcher that uses cached bits or calls go get to fetch a thing (TBD), write new code to find that thing and splice it into the caching/signing machinery, and then jimmy it into GOPATH (but that's deprecated, so...).
  3. Write a new fetcher that uses cached bits or calls go vendor to pull the prerequisites into the vendor dir, write new code that caches/signs either each individual dependencies or the entire dir, and then wire it into the build. This will only work for projects that use modules.

Etc...

While neither the structure of the things in GOPATH (deprecated, now what?) or the vendor dir are "public", there is an explicit backward compatibility promise for the vendor dir.

See the parent RFC (#13023) for even more details, including a review of how other package managers support Go-based applications.

@hartzell
Copy link
Copy Markdown
Contributor Author

hartzell commented Feb 1, 2020

ps. Thanks for the review, I'll follow up on the individual comments.

These are docs and consistency changes.  There are a couple more
comments to address (exception handling and testing).
@hartzell
Copy link
Copy Markdown
Contributor Author

hartzell commented Feb 3, 2020

FWIW: here's a link to a discussion about deprecating GOPATH on the go-nuts mailing list.


The best approach to packaging a Go-based project for Spack depends on
the details of the project. ``GoPackage`` is a good fit for simple
situations but another package, e.g. ``MakefilePackage``, might be
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it generally be advantageous to combine the logic from GoPackage and MakefilePackage? Is the issue with doing this the possibility of MRO conflicts?

If yes/yes: I could think about how to combine GoPackage with other build systems.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, at least immediately. If something uses a Makefile, than adding dependency on a go compiler, declaring resources via module2tuples (or by hand if you're a masochist), and using MakefilePackage seems like a nice, simple strategy. I'm open to something more complicated but don't see any need so far. At some point down the road it might also make sense to make the go compiler "virtual", satisfiable via go or gccgo or tinygo or one of the LLVM-based compilers. But not until there's a real use case.

FWIW, here's the current (including this PR) breakdown of build techniques:

APP STRATEGY
hugo go build
rclone go build
lazygit go build
docui go build
go-md2man go build
lazydocker go build
modules2tuple go build
glow go build
git-lfs makefile
skopeo makefile
umoci makefile
singularity makefile
fzf makefile
direnv makefile, w/ weird custom logic

@scheibelp
Copy link
Copy Markdown
Member

[Relying transitively on the repeatability promise of the Go modules system] requires an additional conversation with our inner skeptics and our local sysadmin/security teams to explain why these applications are repeatable/trustworthy

Is go.mod support required transitively? Or can dependencies exclude go.mod files and thereby introduce nondeterminism into a build? If go.mod ensures determinism for all transitive dependencies, then I think it would be worth trusting it.

Responding more generally to #14061 (comment), there are two benefits of pursuing an automatic solution if it is available (even if it only applies to a subset of Go packages):

  • Users do less work to create Spack Go packages
  • Avoids problems that arise when users disregard the mechanisms introduced in this PR (e.g. using MakefilePackage without -mod-vendor - although Improved support for Go based packages #14061 (comment) might help with avoiding that)

It sounds like there is at least one automatic solution that handles repeatability and caching/offline-installation, although it requires go.mod support. It looks like go.mod is expected to be the way forward (i.e. how all packages are distributed), although I don't know e.g. what percentage of commonly-used packages still need to be updated to use go modules. If that sounds correct then I don’t think

This will only work for projects that use modules.

is a problem.

For packages without go.mod support, I think your approach is ideal; it is perhaps the only way to ensure determinism, in particular across differing Spack instances (one Spack instance could cache the dependencies for a Go package and create the same result for a user on that particular instance, but there is no mechanism to ensure that a user with another Spack instance would use the same Go dependencies). However, if an automatic go.mod-based solution is possible, then it may be worthwhile to generate go.mod files directly vs. resource.json files

@hartzell
Copy link
Copy Markdown
Contributor Author

hartzell commented Feb 4, 2020

Edit: I wish I had posted my second followup comment before this one, perhaps read it first.


[...] If go.mod ensures determinism for all transitive dependencies, then I think it would be worth trusting it. [...]

An application uses Go modules to transitively describe its dependencies, allowing it to be built deterministically.

Using Go modules is the officially supported mechanism for determinism. The details are messy and are considered private to the toolchain's implementation. The toolchain provides no support for integrating with external package managers.

This PR, in a nutshell, trusts the modules data and uses it to build a set of resources which provide determinism and caching, uses those resources to construct a "vendor" dir, and configures the toolchain to use that vendor dir.

Users do less work to create Spack Go packages

Perhaps Spack's packaging support should require the least possible effort on the part of package authors, but that's not the only interesting metric. The code base needs to be approachable and maintainable, the security/determinism guarantees need to be simple and easy to explain, etc.

One of the things that I really like about Go is its emphasis on simplicity (simplicity is complicated, and made easy), even when it requires writing more (albeit simpler) code. One classic example is Go's eschewing exception handling in favor of returning (and checking) explicit error values. This PR's implementation seems to be the simplest (explicit, use of stable interfaces, minimal additional code) of the options.

It sounds like there is at least one automatic solution that handles repeatability and caching/offline-installation, although it requires go.mod support.

I'm not sure what solution you're referring to, but please keep reading.

Go modules are the futures, it's not worth adding Spack infrastructure for applications that don't use modules, they're one-offs or backwards-compatibility problems and should be handled as such.

There are two issues: how to use the Go modules system to provide trustworthy (deterministic) builds and how to cache the dependencies. Solutions include:

  • Don't bother (e.g. lua-rocks or the nextflow package).

  • Enforce the use of Go modules.

    This requires providing additional information to the toolchain, the details depend on the compiler version and if/how caching is managed. The simple presence of go.mod/go.sum is not sufficient. It requires vigilance during package reviews (there's nothing the Spack framework can do that a crafty/sloppy package author (or package Makefile) can't undo).

    1. We could leverage the vendoring support (Go modules supports vendoring, the vendor dir structure has a backward compatibility promise).

      Support requires ensuring modules are enabled and ensuring that vendor support is enabled.

      • For applications that have vendored their dependencies, this is simply a matter of setting the required environment variables (or command line switches) that tell the toolchain to use the vendor dir.

      • For those that haven't, we could use the go command and the modules data to build a vendor dir.

        1. This PR's implementation builds a vendor dir, uses it to define resources, and then uses Spack's standard resource machinery to repopulate it during the fetch stage. This has the advantages that it provides verifiable Spack signatures for each resource, caches them, and shares go-specific tooling with a sister project.

        2. An alternative implementation, which explicitly trusts Go's determinism guarantees, might build and cache the vendor dir en masse and copy it into place during the fetch stage. This would require building additional Spack tooling.

        3. A second alternative might create a single Spack resource for the entire vendor dir, minimizing the number of resources but still providing a caching and a signature. This would require building additional Spack tooling.

    2. We could peek/poke into the toolchain's workspace (go get -d).

      Support requires ensuring modules are enabled and (probably) pointing the toolchain at wherever we've deployed the cached dependencies.

      • For applications that have vendored their dependencies, as above.

      • For those that haven't, we could fetch and cache the dependencies by enforcing module mode and running go get -d, finally copying the results into the proper location.

        The location into which go get -d downloads the dependencies is a private implementation detail. Historically, it has been within the directory referred to by the GOPATH environment variable and the structure has been obvious (if not public). But, GOPATH is deprecated and it's unclear what the future holds.

    3. Finally, the toolchain supports fetching a module from a proxy server (in fact, it defaults to using a proxy run by Google). A local directory can be used as a proxy (a file:// URL). It's unclear how to populate one given a go.mod file.

      Support would require ensuring modules are enabled and pointing the toolchain at the proper proxy.

Do you have a solution that isn't one of the options that I've outlined?

Given the discussion above, how do you think an implementation should work, end-to-end?

@hartzell
Copy link
Copy Markdown
Contributor Author

hartzell commented Feb 5, 2020

Perhaps I can try to summarize what I hear you suggesting/asking for.

Given two implementations:

  1. an implementation that

    • adds support for bulk resources
    • asks package authors to run module2tuples for each release of the application and include the resource definitions in the package
    • supports Spack signatures on each resource/dependency
    • and reuses the existing fetchers, cachers, and resource deployment machinery;
  2. an implementation that

    • does not require the user to add resource definitions for dependencies
    • adds a new fetcher
    • uses existing caches (would it?)
    • adds logic to copy the cached dependencies into the build
    • and requires that users trust an additional determinism/security mechanism in addition to Spack signatures.

In both cases someone (the user or Spack) will need to ensure that proper flags/vars are set. PR reviewers will need to ensure that nothing slips through.

I prefer the first. I think that it's clean, simple, adds a single new general feature that might be useful for other languages (but proof, pudding, etc) and takes advantage of an existing tool.

I believe you're arguing for the second.


For what it's worth, one could build the second around the vendor directory support.

  • the fetcher would unpack the application dir
  • if the build dir didn't include a vendor dir and there was something cached, copy it into the build; if there wasn't anything cached then use go mod vendor to populate the vendor dir, cache it, and then run the build.

A variation would be to build tooling that created a resource for the vendor directory, with signatures and etc.

@scheibelp
Copy link
Copy Markdown
Member

Regarding #14061 (comment)

Do you have a solution that isn't one of the options that I've outlined?

No: I think you outlined what I was hoping for with:

We could leverage the vendoring support (Go modules supports vendoring, the vendor dir structure has a backward compatibility promise).
An alternative implementation, which explicitly trusts Go's determinism guarantees, might build and cache the vendor dir en masse and copy it into place during the fetch stage. This would require building additional Spack tooling.

On that note, is there a reason to doubt Go's determinism guarantees? Are there cases where it is officially nondeterministic? To me it seems like using modules2tuple places a similar trust in Go

The simple presence of go.mod/go.sum is not sufficient. It requires vigilance during package reviews

Would go.mod + -mod=vendor be sufficient? If not, when is it not sufficient?

Regarding #14061 (comment)

I believe you're arguing for the second.

Yes. Or at least: I want to explore whether there are advantages to the second.

an implementation that
does not require the user to add resource definitions for dependencies
adds a new fetcher
uses existing caches (would it?)

It would: a GoFetchStrategy could be updated to specifically retrieve all the artifacts associated with a go.mod file. It would also be responsible for storing these artifacts in the cache, and furthermore later on it would be responsible for understanding that it doesn't need to retrieve an artifact that is mentioned in a go.mod because it is already in the cache.

If that functions correctly, then only two things have to be true for a Spack Go package to get reproducible and/or offline installs:

  • A go.mod file is present (either provided by the package or generated by the person who created the Spack package)
  • The "go build" invocation uses -mod=vendor

As more Go packages include a go.mod file, the amount of work required for the Spack package creator will be very small (in the alternative I am proposing, they don't have to use modules2tuple, although they may want to use it if it generates go.mod files and the Go package doesn't include one).

@hartzell
Copy link
Copy Markdown
Contributor Author

hartzell commented Feb 6, 2020

NOTE: I've been loosely claiming that the json resources include a "signature". I just reviewed what I actually wrote in module2tuple and it turns out that the resources referr to commits in a Git repository, which falls under our "we trust git commit hashes" promise.

@hartzell
Copy link
Copy Markdown
Contributor Author

hartzell commented Feb 6, 2020

On that note, is there a reason to doubt Go's determinism guarantees? Are there cases where it is officially nondeterministic?

Not that I'm aware of. I have at least as much faith in it as I do in Spack's signatures, based simply on the eye-share and the amount of money behind the effort.

I, personally, would be happy to depend on it. For the purposes of this PR, however:

  • I'm not aware of any other bit of Spack that guarantees repeatability by depending on an external promise.

    Relying on the immutability of VCS commit hashes is the closest analogue I can think of.

  • I can't speak to the difficulty of security reviews for the people at supercomputer centers. I'm not sure who can, nor am I sure who can weigh the trade-offs and choose a path (@tgamblin?).

To me it seems like using modules2tuple places a similar trust in Go

It doesn't. In this workflow the Spack package author runs go mod vendor, then runs modules2tuple -spack vendor/modules.txt, and commits the resulting resource definitions alongside the package.py.

go mod vendor is intended for use by Go developers that decide to include their dependencies in their project (notes here). I'm confident-ish that it does not use the signatures from go.sum but rather expects the application author to vet the vendor dir before committing it (as they would/should with the go.sum info).

go mod vendor, as a housekeeping detail, creates a vendor/modules.txt that summarizes the sources (repo, commit/branch/... id) of each dependency in the vendor dir.

module2tuples uses that info to generate the resource definitions, which are/would then be committed to Spack.

The Spack package author could/should review the contents of the vendor dir they create and confirm that the modules.txt file is accurate.

I believe that this parallels the spack checksum case:

  • it's up to the Spack package author/updater to ensure that the item that has been checksum-ed is safe before committing its signature (or git commit id or ...) to Spack; and
  • it's up to the Spack user to ensure that those signatures are enforced (e.g. no --no-checksum or similar shenanigans).

In contrast, a GoFetchStrategy would do all of this dynamically without an opportunity for review.

All of that said, I'm OK with this approach. But, I doubt that I'm a good speaker for the more security-conscious members of the Spack community.

Would go.mod + -mod=vendor be sufficient? If not, when is it not sufficient?

We need to ensure the compiler's running in module mode and we need to invoke the use of the vendor dir. This can be achieved via environment variables and/or command line switches and varies slightly with compiler versions. I don't have the details at hand, but it's doable.

My main point with this bit was that even if we set things up the way we need them, an sufficiently foolish package can screw it up.

This is true for both this PR's implementation or one that uses a new GoFetchStrategy. It doesn't matter how much care we put into building the vendor dir if the go tool ends up fetching things itself.

As more Go packages include a go.mod file, the amount of work required for the Spack package creator will be very small [...]

That'd be great.

If the powers-that-be approve of this approach then it's the way to go.

[...] use modules2tuple, although they may want to use it if it generates go.mod files and the Go package doesn't include one).

module2tuples requires that the application use Go modules.

If an application doesn't use Go modules, then there's no supported, systematic mechanism for deterministic builds.


NOTE: I've been loosely claiming that the json resources include a "signature". I just reviewed what I actually wrote and it's referring to commits in a Git repository, which falls under our "we trust git commit hashes" promise.

@hartzell
Copy link
Copy Markdown
Contributor Author

hartzell commented Feb 6, 2020

W.R.T. support in the go command for vendoring, this is from the vendoring section of the draft Go 1.14 Release Notes:

When the main module contains a top-level vendor directory and its go.mod file specifies go 1.14 or higher, the go command now defaults to -mod=vendor for operations that accept that flag. A new value for that flag, -mod=mod, causes the go command to instead load modules from the module cache (as when no vendor directory is present).

When -mod=vendor is set (explicitly or by default), the go command now verifies that the main module's vendor/modules.txt file is consistent with its go.mod file.

@adamjstewart
Copy link
Copy Markdown
Member

@hartzell what's the status of this PR? Anything I can do to help get it merged?

@iarspider
Copy link
Copy Markdown
Contributor

Thanks @hartzell for preparing this PR. I have a few concerns regarding scenario 1 "dependencies are resources":

  1. What about transitive dependencies (i.e. package go-foo depends on go-bar that depends on go-baz) - will they be handled correctly? I guess both go-bar and go-baz will be installed under go-foo's prefix, right?
  2. If two packages depend on a same third package, will that third one be installed twice? How about the views - will they work correctly?

Sorry if asking something obvious.

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Aug 3, 2022

Closing since this has been stale for a long while. Feel free to reopen if you want to continue working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build-systems directives documentation Improvements or additions to documentation

Projects

None yet

5 participants