Skip to content

Move docker to ci-scripts#1020

Merged
WardBrian merged 4 commits intomasterfrom
move-docker-to-ciscripts
Nov 3, 2021
Merged

Move docker to ci-scripts#1020
WardBrian merged 4 commits intomasterfrom
move-docker-to-ciscripts

Conversation

@serban-nicusor-toptal
Copy link
Copy Markdown
Contributor

@serban-nicusor-toptal serban-nicusor-toptal commented Nov 2, 2021

Submission Checklist

  • Run unit tests
  • Documentation
    • If a user-facing facing change was made, the documentation PR is here:
    • OR, no user-facing changes were made

Release notes

Move docker to ci-scripts repository.
Removed a comment from stanc.ml

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)

@serban-nicusor-toptal
Copy link
Copy Markdown
Contributor Author

@WardBrian you had a good hunch, seems it doesn't like something in armhf: https://jenkins.mc-stan.org/blue/organizations/jenkins/stanc3/detail/master/934/pipeline/90
Please see the PR in ci-scripts that comes together with this one -> stan-dev/ci-scripts#10

@serban-nicusor-toptal serban-nicusor-toptal changed the title Move docker to ciscripts Move docker to ci-scripts Nov 2, 2021
@serban-nicusor-toptal
Copy link
Copy Markdown
Contributor Author

Everything is the same except one thing, the image hash used:

For a failure:

stanorg/stanc3:multiarch@sha256:5ca86a028f80bbdeccd8fa868dc867fd54ca7ed0959ada97bab5766537ee51d7

For a success:

stanorg/stanc3:multiarch@sha256:a9cab3fe33e13e16d23a75d0a06b4648b963750371f852a9ace1e77d3ad6dcaf

Checking ...

@serban-nicusor-toptal
Copy link
Copy Markdown
Contributor Author

So the failing hash is the build variant for ARM from what I understand in https://github.com/stan-dev/stanc3/blob/master/scripts/build_multiarch_stanc3.sh#L23
I ran locally docker run -it stanorg/stanc3:multiarch@sha256:5ca86a028f80bbdeccd8fa868dc867fd54ca7ed0959ada97bab5766537ee51d7 and dune is indeed not found.
When I run stanorg/stanc3:multiarch@sha256:a9cab3fe33e13e16d23a75d0a06b4648b963750371f852a9ace1e77d3ad6dcaf dune is found.
This may be an error in the build process and it corrupted the hash for arm ? I'm rebuilding the docker image from scratch

@WardBrian
Copy link
Copy Markdown
Member

Very odd. This is why we test

@serban-nicusor-toptal
Copy link
Copy Markdown
Contributor Author

serban-nicusor-toptal commented Nov 2, 2021

I've rebuild the image ( on Ubuntu WSL ) but it doesn't have any change, trying now on Mac M1.
@andrjohns did you ever ran into this err or maybe have an idea where to look ?

@andrjohns
Copy link
Copy Markdown
Contributor

No this is new to me. I'm building the armhf image locally now, so will see if there are any differences

@serban-nicusor-toptal
Copy link
Copy Markdown
Contributor Author

serban-nicusor-toptal commented Nov 2, 2021

Thanks! Quick question, are you on M1 Mac (ARM) CPU or amd64 ?

Edit: I've added you as a collaborator to stanorg/stanc3 in Docker Hub.

@andrjohns
Copy link
Copy Markdown
Contributor

amd64 for me

@andrjohns
Copy link
Copy Markdown
Contributor

Works for me locally, oddly enough. The dune not found error occurs if you haven't run eval $(opam env) first, but this is specified as part of the build_multiarch_stanc3 script

@WardBrian
Copy link
Copy Markdown
Member

I believe we can also remove the Dockerfile found in the root dir of the repo (I believe it is just a copy of docker/debian/Dockerfile)

@serban-nicusor-toptal
Copy link
Copy Markdown
Contributor Author

I've tested the others and they work just fine ( also in ci ), but when I pull the specific hash it wants for the armhf build:

docker run -it stanorg/stanc3:multiarch@sha256:5ca86a028f80bbdeccd8fa868dc867fd54ca7ed0959ada97bab5766537ee51d7

and then run

eval $(opam env)
dune --version

I get bash: dune: command not found

Should we try to override the remote build with your local build?:
docker tag $LOCAL_TAG stanorg/stanc3:multiarch
docker push stanorg/stanc3:multiarch

Building on this M1 chip takes ages :(

@WardBrian
Copy link
Copy Markdown
Member

Could we set up a github action in the ci-scripts repo to build/push these images? Not sure how involved that would be/how much support GHA has for such a thing.

@serban-nicusor-toptal
Copy link
Copy Markdown
Contributor Author

I had that in mind too, we can set it up but I would suggest using Jenkins as it's a resource-intensive workload. Lmk what you think, I'll create a PR with it in ci-scripts.

@WardBrian
Copy link
Copy Markdown
Member

It would be great to be able to trigger it without needing anyone to do it on their local machine, even if it's not going to happen that often (probably just now, when #1019 is ready, and then hopefully not for a long time). Definitely not a priority, and if @andrjohns could just push his working one now that would be great for the time being

@andrjohns
Copy link
Copy Markdown
Contributor

I don't think you can replace a single architecture when pushing a docker image, would need to rebuild the whole bunch. Did you want me to start that running?

@serban-nicusor-toptal
Copy link
Copy Markdown
Contributor Author

If that wouldn't be too much of a bother @andrjohns many thanks!
I'll automate it so it's easier, a simple job that triggers when anyone changes the Dockerfile in a commit and pushes to the remote Docker Hub.

@andrjohns
Copy link
Copy Markdown
Contributor

Sounds good! Will let you know when it's all rebuilt and pushed

@WardBrian
Copy link
Copy Markdown
Member

@serban-nicusor-toptal what is Jenkinsfile-test-binaries? I noticed it still has references to the dockerfiles but I'm not sure if we need it?

@WardBrian
Copy link
Copy Markdown
Member

WardBrian commented Nov 2, 2021

https://jenkins.mc-stan.org/blue/organizations/jenkins/stanc3/detail/master/938/pipeline/86 worked after the latest docker hub push. Thanks @andrjohns !

edit: although it looks like odoc failed. Might just be because I started the CI on an intermediary step?

@rok-cesnovar
Copy link
Copy Markdown
Member

what is Jenkinsfile-test-binaries? I noticed it still has references to the dockerfiles but I'm not sure if we need it?

Those are used to make win/mac/linux binaries for branches, which is useful for making a test build you can share with others and they dobt need ocaml installed.

@WardBrian
Copy link
Copy Markdown
Member

@serban-nicusor-toptal I removed Dockerfile and Jenkins-test-binaries from the root directory as well. I'm re-running the latest main CI to see if that odoc failure is an issue but otherwise I think this is good?

@WardBrian
Copy link
Copy Markdown
Member

Ah, that is useful @rok-cesnovar. It will need to be updated to use the docker images rather than files, then

@WardBrian
Copy link
Copy Markdown
Member

Readded and copied over the changes from #1017. We should test after this is merged

@WardBrian
Copy link
Copy Markdown
Member

Seems like a different one of the non-x86 binaries failed to build. Very weird

@serban-nicusor-toptal
Copy link
Copy Markdown
Contributor Author

serban-nicusor-toptal commented Nov 2, 2021

Thanks for looking into it!
I've pushed a commit which fixed this 'edge case'.

To explain this one here:
When the pipeline ran, the host had more than one container running at that time so this command ( docker ps -q ) returns two container IDs which would result in the Jenkins command to look like
docker run --volumes-from=8ac50e5d0f43 67e09085c973:rw stanorg/stanc3:multiarch@sha256:16fd05b701a61570f282eb47f79d18542f7b2e7ecded594a77487480c6e7960a
Where 67e09085c973:rw would be taken as the input for the docker image resulting in an error.
docker ps -aqf "ancestor=stanorg/stanc3:static" will filter and make sure we're mounting the base image volume.

I also saw this error but it looks like a false positive, is the checkout of the gh-pages wrong or is that a false-positive from github ?

Might just be because I started the CI on an intermediary step?

Maybe? I'll check it

@WardBrian
Copy link
Copy Markdown
Member

@serban-nicusor-toptal is this ready?

@serban-nicusor-toptal
Copy link
Copy Markdown
Contributor Author

I've committed a change that hopefully will fix the random failure with the gh-pages branch.
I've automated ci-scripts to build and push the images when there are any changes to them, see Jenkins.
I think this is ready now.

@WardBrian
Copy link
Copy Markdown
Member

Because we force-push to gh-pages anyway, if we really needed to I think it's reasonable to make those commands that keep failing something like git branch -d gh-pages || echo "Skipping deletion". But let's see if the problem persists first

@WardBrian WardBrian merged commit ca8b4dd into master Nov 3, 2021
@WardBrian WardBrian deleted the move-docker-to-ciscripts branch November 3, 2021 15:26
@WardBrian WardBrian mentioned this pull request Nov 11, 2021
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants