Skip to content

Conversation

@phip1611
Copy link
Member

@phip1611 phip1611 commented May 22, 2025

Split-out of the problematic part of #7085. The commits

  • misc: ch-remote: streamline #[source] and Error impl
  • misc: vmm/api: streamline #[source] and Error impl
    somehow introduce the problem.

I'm seeking help and guidance; I have no idea how to debug the failing test_api_http_pause_resume test efficiently. Failing Output Example

...
Error running ch-remote command: "target/x86_64-unknown-linux-gnu/release/ch-remote" "--api-socket=/tmp/ch4qye1L/cloud-hypervisor.sock" "resume"
stderr: Error opening HTTP socket: Connection refused (os error 111)

thread 'common_parallel::test_api_http_pause_resume' panicked at tests/integration.rs:377:9:
assertion failed: target_api.remote_command("resume", None)
...

Using bash ./scripts/dev_cli.sh tests --integration -- --test-filter test_api_http_pause_resume -- --show-output, I could reduce the feedback loop to around six minutes on my machine, which is not very pleasant to work with. Thanks for any help in advance!

Steps to Undraft

@phip1611 phip1611 requested a review from a team as a code owner May 22, 2025 08:11
@phip1611 phip1611 marked this pull request as draft May 22, 2025 08:11
@phip1611 phip1611 changed the title [WIP] treewide: streamline #[source] and Error (follow-up (2/2)) [WIP] [DEBUG HELP NEEDED] treewide: streamline #[source] and Error (follow-up (2/2)) May 22, 2025
ApiError(api_error) => write!(f, "{api_error}"),
}
}
#[error("Error from internal API: {0:?}")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, I'm not entirely sure what is going on, but it looks like the test succeeds if I use the Display impl instead of Debug here:

    #[error("Error from internal API: {0}")]

Copy link
Member Author

@phip1611 phip1611 May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I actually wanted to use the Display impl in the first place. Might have been a typo from my side

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot beliebe that the CI is actually green 😁 I'd feel uncomfortable if we merge this without

  • understanding the root cause
  • improve the debug experience for the next time

@rbradford

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have a chance to look at the code and try and reproduce manually - but I was wondering if there may have been a stray new line in the output and thus breaking the HTTP response?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing is that it could have been trying to do a stack unwind for the embedded debug error message which will then fail as the seccomp rules won't allow it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should ony be newlines with {0:#?} but not {0:?} 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahahaha you were right, it is a seccomp violation. I just verified this by passing --seccomp false to the CHV command line. So the debugging guide here is: Run the test without seccomp if it suspicially fails.

phip1611 added 2 commits May 22, 2025 15:26
This streamlines the Error implementation in the Cloud Hypervisor code
base to match the remaining parts so that everything follows the agreed
conventions. These are leftovers missed in the previous commits.

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
This streamlines the Error implementation in the Cloud Hypervisor code
base to match the remaining parts so that everything follows the agreed
conventions. These are leftovers missed in the previous commits.

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
@phip1611 phip1611 force-pushed the more-error-follow-up-2 branch from 03897ad to a41982a Compare May 22, 2025 13:29
@rbradford
Copy link
Member

@phip1611 Looking good?

@phip1611
Copy link
Member Author

phip1611 commented May 22, 2025

@phip1611 Looking good?

I'd love to see if we can improve the debuggability here while we're already working on it.

Discussion continues here.

@phip1611 phip1611 marked this pull request as ready for review May 22, 2025 14:59
@phip1611 phip1611 changed the title [WIP] [DEBUG HELP NEEDED] treewide: streamline #[source] and Error (follow-up (2/2)) treewide: streamline #[source] and Error (follow-up (2/2)) May 22, 2025
@rbradford rbradford added this pull request to the merge queue May 22, 2025
Merged via the queue into cloud-hypervisor:main with commit ab6e1bd May 22, 2025
39 checks passed
@phip1611 phip1611 deleted the more-error-follow-up-2 branch May 22, 2025 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants