pkgsStatic.haskellPackages: fix TemplateHaskell#383165
pkgsStatic.haskellPackages: fix TemplateHaskell#383165wolfgangwalther wants to merge 3 commits intoNixOS:haskell-updatesfrom
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
b5fec50 to
14e2355
Compare
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
This one is confusing.. it's the only place here, where we check build vs host. Everywhere else in common-hadrian, we only check host vs target.
Is this an oversight or on purpose? If on purpose, it probably deserves a comment.
|
@sternenseemann In light of #391975 and this PR here, could you give some input / your opinion on how we could proceed (or not!). I agree with "lack of a crystal clear path forward" - so let's discuss and make one. I can then work on that. |
This is more in line with the other names. Also "ghc" could be "native" as well, when we're not cross. So "native" is not really a useful distinction to make. It's used to compile Setup.hs - so let's call it that way.
…build Nesting pkgsStatic inside pkgsMusl results in: - build platform: musl, shared libs - host/target platform: musl, static libs We consider this cross, because of the difference in shared vs static libs. However, GHC does not - it only looks at the system triple, which is identical. Thus GHC will not prefix its binaries and folders. We need to compensate for that by applying the targetPrefix only when *GHC thinks we are cross**, and not when we do. This then leads to the next problem: Since we're adding both "ghc" and "systemGhc" to PATH and they are both without prefix... naturally either the build of Setup.hs or the main build will fail, because it uses the wrong compiler for the package-db. This leads to either trying to statically link Setup.hs, when only shared libs are available - or to trying to dynamically link the main package, when only static libs are available. The simplest solution for this is, to just use the host compiler for Setup.hs, as long as we can run the code it produces. With the canExecute check we only provide a single GHC whenever we're compiling "native cross".
14e2355 to
612de00
Compare
|
I have created a GHC ticket to expose an argument to Of course we probably won't be able to benefit from this for a while. |
sternenseemann
left a comment
There was a problem hiding this comment.
I get the feeling this is the opposite direction of what we should be doing. This just adds onto the pile of dissatisfactory hacks which lets Hadrian dictate us its way of seeing things (just think of all the bits of logic in common-hadrian.nix that just try to reimplement what Hadrian decides on without our input). I think we should say that Nixpkgs has the right idea about cross and we should be aiming to make Hadrian see that.
I accept that there is a level of hackery involved with pkgsStatic and TemplateHaskell precisely because we need to execute host code on the build platform from Nixpkgs' point of view. In this sense this isn't that different from what used to happen, but this used to just work with GHC < 9.6 without adding a pile of logic that subverts normal Nixpkgs cross logic.
From my point of view, canExecute should never inform our view of cross or not: This should always be != and we shouldn't start rolling back things with un-prefixed cross compilers. Much rather, canExecute reveals extra things we happen to be able to do which don't fundamentally change things, e.g. we can run tests or under these circumstances GHC can do TH.
Overall I'm unsure what to do. My inclination is that we can fix Hadrian to a point that it would do what we want in this case without having to reimplement its installation logic ourselves, but (even though I did start working on it) I don't know how hard it would be. I'm also concerned that upstream is not very interested in reviewing and merging such patches. Also, maybe stable-haskell/ghc#34 will bail us out, but when?
|
|
||
| # If enabled, use -fPIC when compiling static libs. | ||
| enableRelocatedStaticLibs ? stdenv.targetPlatform != stdenv.hostPlatform, | ||
| enableRelocatedStaticLibs ? !stdenv.targetPlatform.canExecute stdenv.hostPlatform, |
There was a problem hiding this comment.
Why would we disable -fPIC for dyn musl -> static musl?
There was a problem hiding this comment.
I can't really tell - let me rephrase the question: Why are we disabling -fPIC when targetPlatform != hostPlatform? Aka, what is it about cross-compiling that requires us to disable this?
There was a problem hiding this comment.
W.r.t. cross-compiling, I'm not quite as I wasn't around for the change, maybe @Ericson2314 remembers.
In the case of pkgsStatic I assume it is crucial for the internal linker of the GHC rts to be able to load statically linked libraries (since that requires relocation). Also later, -fexternal-dynamic-refs was added to this conditional which is definitely required for the internal linker to load static libs, see 4884fcc and https://www.tweag.io/blog/2020-09-30-bazel-static-haskell/.
| || (stdenv.buildPlatform != stdenv.hostPlatform) | ||
| || (stdenv.hostPlatform != stdenv.targetPlatform) | ||
| || (!stdenv.buildPlatform.canExecute stdenv.hostPlatform) | ||
| || (!stdenv.hostPlatform.canExecute stdenv.targetPlatform) |
There was a problem hiding this comment.
I think I would prefer to force hadrian to build terminfo if we say so like it used to be possible. I proposed a solution for this, but got no reaction from upstream: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/13932. (This is really the tragedy with Hadrian, a lot of regressions were never fixed and it is hard to get changes reviewed when you try to work on them yourself…)
| # TODO(@Ericson2314) Make unconditional | ||
| targetPrefix = lib.optionalString (targetPlatform != hostPlatform) "${targetPlatform.config}-"; | ||
| targetPrefix = lib.optionalString ( | ||
| !hostPlatform.canExecute targetPlatform |
There was a problem hiding this comment.
Also here, technically, you'd compare system or config, but I don't think we should be encouraging this behavior from Hadrian. I'd much prefer to force targetPrefix.
There was a problem hiding this comment.
I tried forcing targetPrefix for all compilers, native, native cross or fully cross, earlier, but that comes with a lot more challenges down the road, so I reverted these parts again.
| stdenv.mkDerivation ( | ||
| { | ||
| pname = "${targetPrefix}ghc${variantSuffix}"; | ||
| pname = "${targetPlatform.config}-ghc${variantSuffix}"; |
There was a problem hiding this comment.
This is unnecessarily confusing, I think since it'll add the prefix to native compilers which we do nowhere.
There was a problem hiding this comment.
I can keep the old condition here, too, no problem.
| --replace-fail 'CrossCompiling=YES' \ | ||
| 'CrossCompiling=NO' \ | ||
| --replace-fail 'cross_compiling=yes' \ | ||
| 'cross_compiling=no' |
There was a problem hiding this comment.
If we're going to do this, we should use https://gitlab.haskell.org/ghc/ghc/-/commit/81577fe7c1913c53608bf03e48f84507be904620.
| ${lib.escapeShellArgs hadrianSettings} | ||
| ) | ||
| '' | ||
| # "native cross", e.g. pkgsStatic & co, should not be treated as "cross" by hadrian / GHC, otherwise |
There was a problem hiding this comment.
should is a little strong. This is arguably a bug in hadrian. We do want to build a GHC cross compiler, the problem really is that Hadrian gates to much on the CrossCompiling boolean that doesn't have to be, i.e. a lot of logic that was more fine grained and overridable individually via ghc.mk got folded into it.
| # "native cross", e.g. pkgsStatic & co, should not be treated as "cross" by hadrian / GHC, otherwise | ||
| # the internal interpreter and iserve are not built, and Template Haskell will not be supported. |
There was a problem hiding this comment.
We already build a stage2 compiler for pkgsStatic and friends:
nixpkgs/pkgs/development/compilers/ghc/common-hadrian.nix
Lines 130 to 137 in 963a110
The logic around internal-interpreter is just broken in Hadrian and I haven't yet figured out how to fix it. I've started working on that at some point, but got discouraged since I couldn't even get relatively trivial changes to hadrian reviewed and merged.
| # TODO(@sternenseemann): there's no stage0:exe:haddock target by default, | ||
| # so haddock isn't available for GHC cross-compilers. Can we fix that? | ||
| hasHaddock = stdenv.hostPlatform == stdenv.targetPlatform; | ||
| hasHaddock = stdenv.hostPlatform.canExecute stdenv.targetPlatform; |
There was a problem hiding this comment.
If we are going to do this, we need to make this clearer since this is very confusing. Most of the old stdenv.hostPlatform == stdenv.targetPlatform basically mean “hadrian thinks we're cross compiling“. A lot of them are purely descriptive, e.g. in this case: There is no way to force hadrian to build haddock even if we wanted to.
We should probably make a bind somewhere that is hadrianThinksCross or something like that. It'd probably also be good if we made the CrossCompiling=NO hack configurable, e.g. a forceDisableCross or similar argument.
| # Same as our GHC, unless we're cross, in which case it is native GHC with the | ||
| # same version, or ghcjs, in which case its the ghc used to build ghcjs. | ||
| nativeGhc = buildHaskellPackages.ghc; | ||
| setupGhc = |
There was a problem hiding this comment.
I'm against the setupGhc change. We shouldn't do strange hacks that deviates from the standard Nixpkgs way of doing things unnecessarily. Setup.hs should be compiled using the build->build compiler, but we're using the build->host one here.
We are violating the user's configuration here, i.e. build ought to be dynamic!
There was a problem hiding this comment.
The setupGhc situation is already a problem without considering TemplateHaskell at all. Whenever you cross-compile to something that has the same targetPrefix, these will collide, no matter whether we don't prefix these two or always prefix both of them. This only doesn't happen because we compile from glibc to musl, but that's just accidental.
How do you suggest to solve this problem otherwise?
btw, this also has practical consequences wrt closure size at build time: When I build stuff for pkgsStatic.haskellPackages, for example in CI, I need to download two copies of GHC, which is really annoying given the size of that package. I really wished that Setup.hs would be built with the build->host compiler if at all possible.
While I appreciate you wanting to have a sane Hadrian build system, we also have to deal with reality. Even if it was possible to fix stuff upstream, this would still not help us with 9.6 and 9.8. To a certain degree, the job of Nixpkgs is to "take whatever upstream gives us and work with/around it". And this is what they gave us, so we have to accept that and deal with it. It's not pretty, but it's a fact. If we can upstream changes to make this work nicely down the road, that would be great. But this should not stop us from proceeding here and now. If we need to implement hacks to make this work, we need to do that - we can't just keep this issue open for 1 1/2 years now and wait for it to magically resolve itself. This will never happen. Maybe for GHC 9.14 or something, but not for all the existing hadrian-based GHC versions. |
|
FTR the basic approach in this PR, no matter whether via upstream patch or manual replacement only works with the |
|
I am with @sternenseemann here. We shouldn't regress on conceptual clarity for expediency, it will bite us later. The right thing to do is stop using Hadrian. The approach in GCC NG and LLVM is the right one. Then we can correctly not conflate build and host while also no having massive closure sizes. And in the no custom setup case, we should not be building setup from scratch every time at all. Surely that will yield most of the perf gains right there. |
While I sympathize with this, too, this is not a practical approach to a fix anytime soon. I think we really need to separate the discussions about "how to solve this correctly long-term" and "what's the least amount of hackery required to fix this now". Those can be different things. We can't just bury our head in the sand and ignore the fact that TemplateHaskell has been broken for |
|
Another option for better long-term solution would be to just get rid of But again, this is not a short term fix for a concrete regression that we currently have. |
It really is! I think the difficulty of doing this is vastly overrated. I am right now trying to wrangle GCC, which is far cruftier, with upstream maintainers none of us know, and it is still going well. I am also told (I think) that the main thing holding up stable-haskell/ghc#34 is some cabal-install issues. cabal2nix doesn't use cabal-install, so we simply won't be blocked. I think we can I estimate that with the above approach, we could have a PR that we land in 2 weeks. A think getting an prototype (not ready to merge) draft PR that nonetheless succeeded in building GHC, base, and friends in at least one (friendly) configuration could be done in 3 days. So no, I am not asking for us to wait 5 years, I am asking for a few days sprint, a targeted effort. |
That sounds great. I can't take initiative on this, because I lack the in-depth knowledge about GHC and how to build it, but I am ready to help where I can. |
|
@wolfgangwalther Glad that sounds better to you! Are you on Matrix? It would be nice to talk to you and @sternenseemann about this a bit in real time. |
I am not, no. I'm sure that if you can convince @sternenseemann of your approach it will more than work for me, too. |
|
Well basically what I think would be ideal, if you have the time/energy, but don't think you have the knowledge for this, would be for @sternenseemann and/or I to advise you in a pair-programming way on how to do this: you "driving", us unblocking you answering questions, saying what needs to happen conceptually. |
|
Also the "GCC NG" approach is instructive, in that if we make "GHC NG", we can land the incomplete prototype right away. It just will be longer before it becomes the default and the old approach will be removed. That can help make things less stressful --- even if we can only get through the 3 days, something does get merged. |
|
As I said earlier, I don't think I have the necessary knowledge / experience to drive this. |
|
@wolfgangwalther Oh by "drive[r]" I just meant the slang term from pair-programming meaning that means "[person that does the] typing". The person that is not typing during the pair programming does the "initiative-taking", or "driving" in that sense. A good first step is just to create a GHC derivation that just builds the compile itself. No base or anything else. GHC is now "reinstallable", which means that once a few missing bits of source code are symlinked in place, that shuold "just work" with very few hacks. I doing that, with one/both of us looking at a screen share as you do it, should easily be within your ability :). |
|
I guess what's missing for me to be even able to judge that for myself is some kind of bigger picture. "Get rid of Hadrian" is what convinces me immediately, because I had only bad experiences with Hadrian. But that doesn't say anything about the replacement, yet. And gcc / LLVM NG doesn't say anything either, if you're not familiar with that. I know there is some work going on in that area, but I have literally no idea about it. So without some kind of description of what the goal looks like, we can't make a judgement of whether we actually want the replacement. And without even a rough plan or outline of steps required, I certainly can't judge where I can help. |
|
https://discourse.nixos.org/t/blog-post-compiler-bootstrapping-in-nixpkgs/66791 Hopefully this helps --- I have recently tried to write up exactly what the bigger picture is! :) Our LLVM packaging, GCC NG, and GHC NG all have these fundamental things in common:
|
|
I'm no expert in Hadrian but my impression is that there's two related but distinct things going on here. There is the inherent complexity of GHC's build system and there is complexity from Hadrian. I think when Hadrian was first written, the previous build system had got to a point were the technical debt was too much and a rewrite was needed (this is my impression from reading the Hadrian paper). Since then, a lot of effort has gone into improving the inherent complexity of GHC's builds. One part of this is ensuring that all of the packages can be built with Cabal. In my mind, it's not about Hadrian vs Make. It's about reducing the complexity of GHC's build system to the point where it could be implemented in any system (+ Cabal) without too much trouble. And 🤞 we are there or almost there nowadays. So, hopefully something like what is suggested here is less scary than it sounds. |
|
Yes. My view is that Nix is really adapt at leading, not following this sort of thing. Because both:
So I think we can quickly get these done, and then we can work with other efforts like stable-haskell/ghc#34 and upstream efforts to contribute back in the other direction. |
|
So summarizing:
Since I can't really tell how much has happened between 9.6 and 9.14 wrt building the components separately and how much the patches we need to do would apply all the way down to GHC 9.6, this raises the question: Is this path a solution for all Hadrian-based GHCs from 9.6 onwards or will we only be able to implement this for, let's say, GHC 9.14 going forward? If there is no reasonable expectation that this will work for GHC 9.6 as well, then we still need a short-term fix for these versions. That's because the gap between the default GHC version on From a practical point of view: For PostgREST, which relies on |
I think that would be the most reasonable thing to do. I don't have a comprehensive understanding of the whole build process, but I made a lot of changes to the bootstrapping for EG:
Footnotes
|
I think we can avoid patches almost entirely, possibly at the cost of fixing things in hacker ways. Any patching would be down for elegance, not to make things work at all.
I think it would be good to start with the latest GHC, but then work backwards. I'm a bit more optimistic that supporting multiple versions will not be so bad. For example, we already have GHC-version-specific overlays, and that's a fine way to deal with the way the runtime libs separation of concerns has changed over time. What was the first release where all CPP variables from the RTS came from the RTS configure script rather than top level configure script? I think we can at least get it working back to that version. I would like to do the "right thing" for GHCs going forward before any temporary stop gap for Template Haskell today, otherwise I think we'll just be procrastinating more on getting rid of Hadrian. It's important for me to get over the initial "put something together hurdle" so we get to the point of "iterating on the partially working thing in-tres", which is less cognitively taxing. |
Do you mean https://gitlab.haskell.org/ghc/ghc/-/commit/7dfcab2f4bcb7206174ea48857df1883d05e97a2 and the commits before that? This commit seems to appear for the first time in GHC 9.10, but not GHC 9.8:
I understand that and share it, too. I'd just like to avoid investing time in this... only to have a solution that will still not solve the practical problem that I am actually interested in solving. That being said, if we can get this working for GHC 9.10, that could be enough (for me). I'd still like to hear @sternenseemann's opinion on the whole topic, though. |
|
Yes, that's the one. Thank you for finding it!
Glad to hear it!
Yes, sounds good. |
|
Closing in favor of #445672. |
TLDR: Compiling haskell in
pkgsStaticonly worked by accident, so far - becausepkgsStaticalso implied a different libc, so "true" cross. Given a "just static, same libc" setup, it failed - this is fixed here.Nesting
pkgsStaticinsidepkgsMuslresults in:We consider this cross, because of the difference in shared vs static libs. However, GHC does not - it only looks at the system triple, which is identical. Thus GHC will not prefix its binaries and folders. We need to compensate for that by applying the
targetPrefixonly when GHC thinks we are cross, and not when we do.This then leads to the next problem: Since we're adding both
ghcandsystemGhcto PATH and they are both without prefix... naturally either the build of Setup.hs or the main build will fail, because it uses the wrong compiler for the package-db. This leads to either trying to statically linkSetup.hs, when only shared libs are available - or to trying to dynamically link the main package, when only static libs are available.The simplest solution for this is, to just use the host compiler for
Setup.hs, as long as we can run the code it produces. With the canExecute check we only provide a single GHC whenever we're compiling "native cross".Important observation: Template Haskell in
pkgsMusl.pkgsStaticwith GHC 9.6+ works fine! (#275304)Based on the above, I added one more commit to just tell Hadrian, that as long was we can execute the code we built, it should not treat the build as a "cross compilation". This makes TemplateHaskell work.
Things done
pkgsStatic.haskell.packages.native-bignum.ghc96.th-orphanspkgsStatic.haskell.packages.native-bignum.ghc98.th-orphanspkgsStatic.haskell.packages.native-bignum.ghc910.th-orphanspkgsStatic.haskell.packages.native-bignum.ghc912.th-orphanspkgsStatic.haskell.packages.native-bignum.ghcHEAD.th-orphanshaskellPackages.th-orphanspkgsCross.x86_64-darwin.haskellPackages.hello❌ (also fails on master)pkgsStatic.haskell.packages.native-bignum.ghc96.th-orphanspkgsStatic.haskell.packages.native-bignum.ghc98.th-orphanspkgsStatic.haskell.packages.native-bignum.ghc910.th-orphanspkgsStatic.haskell.packages.native-bignum.ghc912.th-orphanspkgsStatic.haskell.packages.native-bignum.ghcHEAD.th-orphanshaskellPackages.th-orphanspkgsMusl.pkgsStatic.haskell.packages.native-bignum.ghc96.th-orphans❌ (failing dependency)pkgsCross.gnu64.haskellPackages.hellopkgsCross.gnu64.pkgsStatic.haskell.packages.native-bignum.ghc96.hellopkgsStatic.haskell.packages.native-bignum.ghc96.th-orphanspkgsStatic.haskell.packages.native-bignum.ghc98.th-orphanspkgsStatic.haskell.packages.native-bignum.ghc910.th-orphanspkgsStatic.haskell.packages.native-bignum.ghc912.th-orphanspkgsStatic.haskell.packages.native-bignum.ghcHEAD.th-orphanshaskellPackages.th-orphanspkgsMusl.pkgsStatic.haskell.packages.native-bignum.ghc96.th-orphans❌ (failing dependency)pkgsCross.aarch64-multiplatform.haskellPackages.hellopkgsCross.aarch64-multiplatform.pkgsStatic.haskell.packages.native-bignum.ghc96.hellopkgsCross.x86_64-freebsd.haskellPackages.hellopkgsCross.x86_64-freebsd.pkgsStatic.haskell.packages.native-bignum.ghc96.hello❌ (fails during configure due to FreeBSD stdenv pkgsStatic limitations, same on master)Add a 👍 reaction to pull requests you find important.