Skip to content

[vsock]cloud-hypervisor VM crashes when we don't send \n along with the connect <portnum> message and instead close the host client #6798

@harisubash

Description

@harisubash

[vsock]cloud-hypervisor VM crashes when we don't send \n along with the connect message and instead close the host client

Summary

For connecting to a vsock listener inside the cloudhypervisor, we need to first connect to a unix socket on the host side and send a message in the following format : "CONNECT <port_number> \n". But, if we don't send the \n and instead kill the python application with < ctrl > +c, then the VM is crashing.

Expected Result

VM shouldn't have crashed under this scenario even if the connection failed.

Consistency

YES, the bug is consistently reproducible and can be reproduced with the steps mentioned.

Version

# ./cloud-hypervisor --version
cloud-hypervisor v41.0

VM Configuration

# ./cloud-hypervisor --kernel ./linux-cloud-hypervisor/arch/x86/boot/compressed/vmlinux.bin -vv --log-file /tmp/cloud-hypervisor-P2.log --console off --serial tty --disk path=jammy-server-cloudimg-amd64-P2.raw --cmdline 'console=ttyS0 root=/dev/vda1 rw' --cpus boot=4 --memory size=2048M,shared=on --api-socket=/tmp/ch-socket-P2.sock --vsock cid=32,socket=/tmp/vsock-P2.sock

Host Machine Details

Architecture		    = x86_64
Operating_System	    = Ubuntu 20.04.6 LTS
Kernel_Version		    = 5.4.0-196-generic

Steps to reproduce the bug

Step 1: Launch the cloud-hypervisor VM

Step 2: Inside the VM, launch a vsock listener at port 1234

Step 3: From the host side, try connecting to the host unix socket and send a "CONNECT 1234\n" message to confirm that everything is working as expected.

Step 4: From the host side, try connect by sending "Connect 1234" but without "\n". And kill the python application using +c

Step 5: Check the VM

Logs observed

{panel:title=VM console output}


# ./cloud-hypervisor --kernel ./linux-cloud-hypervisor/arch/x86/boot/compressed/vmlinux.bin -vv --log-file /tmp/cloud-hypervisor-P2.log --console off --serial tty --disk path=jammy-server-cloudimg-amd64-P2.raw --cmdline 'console=ttyS0 root=/dev/vda1 rw' --cpus boot=4 --memory size=2048M,shared=on --api-socket=/tmp/ch-socket-P2.sock --vsock cid=32,socket=/tmp/vsock-P2.sock


[    0.000000] Linux version 6.2.0+ (root@65) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #1 SMP Mon Jul 29 19:52:59 PDT 2024
[    0.000000] Command line: console=ttyS0 root=/dev/vda1 rw
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] signal: max sigframe size: 1360
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007fffffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000e8000000-0x00000000f7ffffff] reserved
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 3.2.0 present.
[    0.000000] DMI: Cloud Hypervisor cloud-hypervisor, BIOS 0
[    0.000000] Hypervisor detected: KVM
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000004] kvm-clock: using sched offset of 159660727 cycles
[    0.000020] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000040] tsc: Detected 2399.934 MHz processor
[    0.000408] last_pfn = 0x80000 max_arch_pfn = 0x400000000
[    0.000452] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT
[    0.000688] found SMP MP-table at [mem 0x000f0090-0x000f009f]
[    0.000723] Using GB pages for direct mapping
[    0.000863] ACPI: Early table checksum verification disabled
[    0.000889] ACPI: RSDP 0x00000000000A0000 000024 (v02 CLOUDH)
[    0.000897] ACPI: XSDT 0x00000000000A1553 00003C (v01 CLOUDH CHXSDT	 00000001 RVAT 01000000)
[    0.000914] ACPI: FACP 0x00000000000A1381 000114 (v06 CLOUDH CHFACP	 00000001 RVAT 01000000)
[    0.000926] ACPI: DSDT 0x00000000000A0024 00135D (v06 CLOUDH CHDSDT	 00000001 RVAT 01000000)
[    0.000931] ACPI: APIC 0x00000000000A1495 000082 (v05 CLOUDH CHMADT	 00000001 RVAT 01000000)
[    0.000936] ACPI: MCFG 0x00000000000A1517 00003C (v01 CLOUDH CHMCFG	 00000001 RVAT 01000000)
[    0.000939] ACPI: Reserving FACP table memory at [mem 0xa1381-0xa1494]

...... Bootup logs

Ubuntu 22.04.4 LTS ubuntu ttyS0

ubuntu login: root
Password:
Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 6.2.0+ x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:	   https://landscape.canonical.com
 * Support:	   https://ubuntu.com/pro

 System information as of Mon Oct 21 17:28:12 UTC 2024

  System load: 0.15		  Memory usage: 7%   Processes:       106
  Usage of /:  34.4% of 21.33GB   Swap usage:	0%   Users logged in: 0


Expanded Security Maintenance for Applications is not enabled.

0 updates can be applied immediately.

1 additional security update can be applied with ESM Apps.
Learn more about enabling ESM Apps service at https://ubuntu.com/esm


The list of available updates is more than a week old.
To check for new updates run: sudo apt update
Failed to connect to https://changelogs.ubuntu.com/meta-release-lts. Check your Internet connection or proxy settings


Last login: Mon Oct 21 17:22:47 UTC 2024 on ttyS0
root@ubuntu:~#
root@ubuntu:~# python3 vsock-listener.py
VSOCK listener bound to port 1234
Waiting for connections...
Connection from (2, 1073741824)
Received: hello

thread '_vsock1' panicked at virtio-devices/src/vsock/unix/muxer.rs:499:41:
									   slice index starts at 12 but ends at 10
			 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


Host console output

# socat - UNIX-CLIENT:/tmp/vsock-P2.sock
connect 1234
OK 1073741824
hello
VsockListener_ACK:hello


# python3 /tmp/vsock-sender-without-newline.py /tmp/vsock-P2.sock
Vsock socket: /tmp/vsock-P2.sock
Connected to /tmp/vsock-P2.sock
Sent: CONNECT 1234
^CTraceback (most recent call last):
  File "/tmp/vsock-sender-without-newline.py", line 87, in <module>
    vsock = connect_and_request_port(socket_path, requested_port)
  File "/tmp/vsock-sender-without-newline.py", line 26, in connect_and_request_port
    response = sock.recv(1024)	# Receive the response
KeyboardInterrupt

Cloud-Hypervisor Verbose Logs output during crash

cloud-hypervisor: 69.717229s: <_vsock1> DEBUG:cloud-hypervisor-src/virtio-devices/src/vsock/device.rs:282 -- vsock: backend event
cloud-hypervisor: 69.717371s: <_vsock1> DEBUG:virtio-devices/src/vsock/unix/muxer.rs:310 -- vsock: muxer received kick
cloud-hypervisor: 69.717476s: <_vsock1> DEBUG:virtio-devices/src/vsock/unix/muxer.rs:385 -- vsock: muxer processing event: fd=131, event_set=Events(EPOLLIN | EPOLLHUP)
cloud-hypervisor: 69.717896s: <_vsock1> ERROR:cloud-hypervisor-src/virtio-devices/src/thread_helper.rs:50 -- _vsock1 thread panicked
cloud-hypervisor: 69.718059s: <vmm> INFO:vmm/src/lib.rs:1161 -- VM exit event
cloud-hypervisor: 69.718190s: <vmm> INFO:virtio-devices/src/device.rs:334 -- Resuming virtio-rng
cloud-hypervisor: 69.718293s: <vmm> INFO:virtio-devices/src/device.rs:334 -- Resuming virtio-block
cloud-hypervisor: 69.718394s: <vmm> INFO:virtio-devices/src/device.rs:334 -- Resuming virtio-vsock
cloud-hypervisor: 69.722963s: <vmm> INFO:virtio-devices/src/device.rs:334 -- Resuming virtio-rng
cloud-hypervisor: 69.723086s: <vmm> INFO:virtio-devices/src/device.rs:334 -- Resuming virtio-block
cloud-hypervisor: 69.723187s: <vmm> INFO:virtio-devices/src/device.rs:334 -- Resuming virtio-vsock
cloud-hypervisor: 69.723419s: <serial-manager> INFO:vmm/src/serial_manager.rs:424 -- KILL_EVENT received, stopping epoll loop
cloud-hypervisor: 69.723665s: <__rng> INFO:virtio-devices/src/epoll_helper.rs:216 -- KILL_EVENT received, stopping epoll loop
cloud-hypervisor: 69.724106s: <_disk0_q0> INFO:virtio-devices/src/epoll_helper.rs:216 -- KILL_EVENT received, stopping epoll loop

Vsock client sample code without newline -

import socket
import sys

def connect_and_request_port(socket_path, requested_port):
    """Connects to a UNIX socket, sends a port request, and processes the response.

    Args:
        socket_path: Path to the UNIX socket file.
        requested_port: The port number to request.

    Returns:
        The allocated port number from the server if successful, None otherwise.
    """

    sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
    try:
        sock.connect(socket_path)
        print(f"Connected to {socket_path}")

        message = f"CONNECT {requested_port}"
        sock.sendall(message.encode())  # Send the request
        print(f"Sent: {message}")

        response = sock.recv(1024)  # Receive the response
        if not response:
            print("No response received from server.")
            return None

        response_str = response.decode().strip()
        print(f"Received: {response_str}")

        if response_str.startswith("OK"):
            allocated_port = int(response_str.split(" ")[1])
            print(f"Allocated port: {allocated_port}")
            return sock
        else:
            print(f"Server did not allocate a port: {response_str}")
            return None
    except socket.error as e:
        print(f"Error connecting or communicating: {e}")
        return None


def send_message(sock, message):
    """Sends a message over a connected socket.

    Args:
        sock: The connected socket object.
        message: The message to send.
    """

    try:
        sock.sendall(message.encode())
        print(f"Sent: {message}")

        # Attempt to receive a response (with a timeout)
        sock.settimeout(5)  # Set a timeout of 5 seconds (adjust as needed)
        response = sock.recv(1024)

        # Remove timeout for further communication on this socket if needed
        sock.settimeout(None)

        if response:
            response_str = response.decode().strip()
            print(f"Received: {response_str}")
        else:
            print("No response received from VM.")
    except socket.timeout:
        print("Timeout waiting for response from VM.")
    except socket.error as e:
        print(f"Error sending or receiving message: {e}")


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python3 vsock-sender.py <vsock-socket>")
        sys.exit(1)  # Exit with an error code

    socket_path = sys.argv[1]  # Get the socket_path
    print(f"Vsock socket: {socket_path}")

    requested_port = "1234"

    # Connect and get allocated port
    vsock = connect_and_request_port(socket_path, requested_port)

    if vsock:
            # Now you can use `send_message(vsock, message)` to send messages
            send_message(vsock, "Hello from the host!")
    vsock.close()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions