Skip to content

Conversation

@MrXinWang
Copy link
Member

@MrXinWang MrXinWang commented Jun 18, 2020

This commit enables the AArch64 Jenkins CI with build and running unit tests for the GNU toolchain.

Signed-off-by: Henry Wang [email protected]

@MrXinWang MrXinWang force-pushed the arm64_unit_tests_enable branch from b677051 to 6cf6835 Compare June 18, 2020 01:47
@MrXinWang
Copy link
Member Author

MrXinWang commented Jun 18, 2020

@rbradford I think the Jenkins pipeline can be deployed to the AArch64 node now. I am not very sure though however I can see AArch64 related building and testing logs. However I think some other issues with musl toolchain on arm64 needs to be fixed first. It seems that I can always reproduce this either in the container or in my development machine or the CI machine.

Error (only occur when building with musl toolchain):

 = note: /usr/bin/ld: /tmp/rustcE1X2Bs/liblibc-1170e40c1719a823.rlib(__stack_chk_fail.lo): undefined reference to symbol '__stack_chk_guard@@GLIBC_2.17'
          /usr/bin/ld: /lib/aarch64-linux-gnu/ld-linux-aarch64.so.1: error adding symbols: DSO missing from command line
          collect2: error: ld returned 1 exit status

This error happens at the last step, i.e. linking the cloud-hypervisor binary. Root cause see here: https://www.openwall.com/lists/musl/2014/11/05/3

@MrXinWang MrXinWang force-pushed the arm64_unit_tests_enable branch 2 times, most recently from bc37920 to 1f38391 Compare June 18, 2020 02:22
@MrXinWang MrXinWang marked this pull request as draft June 18, 2020 02:44
@MrXinWang
Copy link
Member Author

MrXinWang commented Jun 18, 2020

Also I noticed that the bionic-arm64 machine is very unstable. It seems that this machine constantly goes offline and then restart. Here I showed the error log when the bionic-arm64 machine is offline:

Connection was broken

java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2738)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3213)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:896)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:142)
	at hudson.remoting.Command.readFrom(Command.java:128)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)

@rbradford
Copy link
Member

Also I noticed that the bionic-arm64 machine is very unstable. It seems that this machine constantly goes offline and then restart. Here I showed the error log when the bionic-arm64 machine is offline:

Connection was broken

java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2738)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3213)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:896)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:142)
	at hudson.remoting.Command.readFrom(Command.java:128)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)

I think this is normal with the SSH launcher. It comes back when the node is needed.

@MrXinWang
Copy link
Member Author

MrXinWang commented Jun 18, 2020

I think this is normal with the SSH launcher. It comes back when the node is needed.

If it is normal that is very good, but I am worrying a little bit if you do this in your local development machine: ssh root@<the AArch64 CI machine IP> and leave it there without doing anything. If you leave it there for about an hour (or even less), the connection of SSH will be closed as client_loop: send disconnect: Broken pipe.

I checked the /etc/ssh/sshd_config file and the ClientAliveInterval is indeed commented. Hope it is just me worrying too much, but this behaviour is very abnormal to me.

@rbradford
Copy link
Member

Can you do something like: https://www.jenkins.io/blog/2019/07/05/jenkins-pipeline-stage-result-visualization-improvements/

try {
  sh('false')
} catch (ex) {
  unstable('Script failed!')
}

So the the Jenkins build doesn't fail. We need the Jenkins builds to be stable as part of our PR merging criteria.

@MrXinWang MrXinWang force-pushed the arm64_unit_tests_enable branch 3 times, most recently from 2fcce41 to 0f9af69 Compare June 28, 2020 07:41
This commit enables the AArch64 Jenkins CI with build and running
unit tests for GNU toolchain.

Signed-off-by: Henry Wang <[email protected]>
@MrXinWang MrXinWang force-pushed the arm64_unit_tests_enable branch from 0f9af69 to 9f20523 Compare June 28, 2020 09:39
cloud-hypervisor#1225
introduces a hypervisor abstraction crate, which breaks some of
the unit test cases on AArch64. This commit fixes related test
cases.

Signed-off-by: Henry Wang <[email protected]>
@MrXinWang MrXinWang force-pushed the arm64_unit_tests_enable branch from 9f20523 to 189be82 Compare June 28, 2020 09:43
@MrXinWang MrXinWang marked this pull request as ready for review June 28, 2020 09:50
@MrXinWang
Copy link
Member Author

Hi @rbradford, I have struggled for some weeks to fix the musl toolchain on arm, but sadly none of these methods worked...So in order to let the CI online and let the enablement of integration test unblocked, I will firstly get the GNU toolchain working in this PR and propose following PR for the musl toolchain. rust-lang/rust#73493 created to track this musl issue.

try {
  sh('false')
} catch (ex) {
  unstable('Script failed!')
}

So the the Jenkins build doesn't fail. We need the Jenkins builds to be stable as part of our PR

I am not sure if I misunderstood, but I think in this case it seems that this unstable thing is not needed?

@rbradford
Copy link
Member

Hi @rbradford, I have struggled for some weeks to fix the musl toolchain on arm, but sadly none of these methods worked...So in order to let the CI online and let the enablement of integration test unblocked, I will firstly get the GNU toolchain working in this PR and propose following PR for the musl toolchain. rust-lang/rust#73493 created to track this musl issue.

Okay with me.

try {
  sh('false')
} catch (ex) {
  unstable('Script failed!')
}

So the the Jenkins build doesn't fail. We need the Jenkins builds to be stable as part of our PR

I am not sure if I misunderstood, but I think in this case it seems that this unstable thing is not needed?

This was just if you wanted to keep the musl rules in there and have them fail but not block the build.

@MrXinWang MrXinWang requested review from rbradford and sboeuf June 29, 2020 02:09
@rbradford rbradford merged commit d824d55 into cloud-hypervisor:master Jun 29, 2020
@MrXinWang MrXinWang mentioned this pull request Jul 8, 2020
8 tasks
@MrXinWang MrXinWang deleted the arm64_unit_tests_enable branch June 4, 2021 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants