UEFI Based KVM Virtualization

Overview of Virtualization support

Neoverse reference platforms support virtualization by providing architectural support of AArch64 virtualization host extension (VHE). The reference platform software stack uses Linux kernel based virtual machine (KVM) as the hypervisor and the userspace program kvmtool as the virtual machine manager (VMM) to leverage this hardware feature. The Virtualization document guides on how to validate virtualization on Neoverse reference platforms using a buildroot filesystem with Linux as the guest operating system. This setup helps in validating the architectural features, however lacks the support of a firmware to boot the platform. Booting a full fledged Linux distribution operating system (OS) such as Fedora or Ubuntu, etc. with UEFI firmware and grub boot-loader as the guest OS can help in validating more real-time virtualization use-cases. This setup also provides support for ACPI tables based platform resource control.

Objective

The purpose of validating virtualization with a Linux distribution is to prepare virtual machines (VM) on a host system that allow booting multiple guest operating systems running Linux distributions such as Ubuntu, Fedora, etc. with the UEFI firmware support. The virtualized platform is prepared and launched using KVM module of the host Linux kernel and kvmtool which is a standalone userspace tool. kvmtool allows booting either directly from a kernel or from a firmware, where firmware will initiate the bootloader for Linux distro OS boot. The firmware based booting allows inclusion of ACPI tables to communicate the hardware info to the OS and perform resource control. The firmware is built with the UEFI EDK2 ArmVirtKvmTool platform descriptor from ArmVirtPkg EDK2 package. The ArmVirtKvmTool takes help of DynamicTablesPkg EDK2 package to dynamically produce ACPI tables from device tree blob (dtb). The DynamicTablesPkg parses the harware information from the dtb that is prepared by the kvmtool for the spawned VMs.

The spawned virtual machine simulates the necessary hardware required for the guest to run. This hardware support includes, but not limited to:

  • Processor (vCPUs)

  • Interrupt controller (e.g. gic-v3, gic-v3-its)

  • Main memory or RAM

  • Timer (e.g. armv8/7-timer)

  • Flash memory (e.g. cfi-flash) required by UEFI firmware

  • UART controller (e.g. uart-16550) to setup console devices,

  • Real time clock (e.g. motorola,mc146818)

  • Block and net devices for disk access and network support both of which are realised using virtio devices.

It is important to note that for this validation all the virtio devices (block and net devices) use pci as their underlying transport mechanism and thus are enumerated as pci endpoint devices.

Overview of ArmVirtKvmTool

ArmVirtKvmTool firmware is sepcifically designed to initialize the hardware (h/w) that is described by the kvmtool using device tree during the VM launch. The ArmVirtKvmTool supports multiple libraries corresponding to the hardware devices emulated by kvmtool, e.g. flash memory, uart, rtc, timer, pci and virtio devices. Few common devices that require initalization by the firmware are parsed through flattened device tree (fdt) library. The firwmare also makes use of KvmtoolVirtMemInfoLib library to create a system memory map before doing the h/w initization. The ArmVirtKvmTool platform descriptor is originally based on ArmVirtPkg and borrows various base libraries to implement the pre-pi and dxe stage drivers.

EDK2 supports handling ACPI tables which are then passed to OS after firmware exits from bds stage. But as kvmtool provide h/w info as dtb and not as ACPI tables, another EDK2 package DynamicTablePkg is used to dynamically parse the dtb and generate appropriate ACPI tables. ArmVirtKvmTool implements a configuration manager protocol that holds a platform info repository. The fdt hardware parser from DynamicTablePkg is used to collect all the platform details as Arm Cmobjects and then to communicate these objects to the table factory of DynamicTablePkg. The table factory obtains a rich set of ACPI table generators from the main table manager and sequentially invokes each generator to create a table. The supported list of libraries include DBG2, FADT, GTDT, IORT, MADT, MCFG, PPTT, SPCR and many more.

It is equally important to align the firmware input based on the environment created by kvmtool with the help of KVM. Refer the Virtualization document for more details on configuring kvmtool for the required virtual platform.

Build & Install

Download the software stack

Skip this section if the required sources have been downloaded and the host TAP interface has been setup.

To obtain the required sources for the platform, follow the steps listed on the Setup Workspace page (including the setting up of the host TAP interface). Ensure that the platform software is downloaded before proceeding with the steps listed below. Also, note the host machine requirements listed on that page which is essential to build and execute the platform software stack.

Build the platform software

This section describes the procedure to prepare the necessary setup to validate UEFI firmware based booting of Linux distributions on the virtual machines. Following software packages from the Neoverse reference platform software stack are needed to do the validation:

  • ArmVirtKvmTool based firmware (built as part of UEFI build)

  • Kvmtool VMM

Skip this section if a Buildroot or Busybox build is already performed for the platform software stack as the ArmVirtKvmTool uefi firmware and kvmtool binaries are already built.

  • Build UEFI firmware for the host and for the guest OS (ArmVirtKvmTool) by running the appropriate script from software stack:

    ./build-scripts/build-test-uefi.sh -p <platform name> <command>
    

Supported command line options are listed below

  • <platform name>

  • <command>

    • Supported commands are

      • clean

      • build

      • package

      • all (all of the three above)

Examples of the build command are

  • Command to clean, build and package the software stack needed for the UEFI firmware on RD-N2-Cfg1 platform:

    ./build-scripts/build-test-uefi.sh -p rdn2cfg1 all
    
  • Lastly, build the userspace hypervisor program kvmtool.

    ./build-scripts/build-kvmtool.sh -p <platform name> clean
    ./build-scripts/build-kvmtool.sh -p <platform name> build
    ./build-scripts/build-kvmtool.sh -p <platform name> package
    

For examples to build kvmtool for rdn2cfg1 platform use the below command:

./build-scripts/build-kvmtool.sh -p rdn2cfg1 clean
./build-scripts/build-kvmtool.sh -p rdn2cfg1 build
./build-scripts/build-kvmtool.sh -p rdn2cfg1 package

Setup Satadisk Images

To use Linux distributions as the host and guest OS create disk images by following the guidelines from Distro Boot document. There can be a Ubuntu or Fedora as host OS and multiple distributions as guest. It is important to remember however, that the host disk image should be large enough to hold multiple guest disk images e.g. host of ~32GiB and multiple guest images of Ubuntu/Fedora with ~6GiB size. Guest disk images are used later to run KVM session.

Note

For simplicity the setup instructions where specific are given for Ubuntu v22.04 distro host OS.

Booting the platform for validation

Boot Host OS

  • Boot the host satadisk image on the FVP with network enabled as mentioned in Distro Boot. For example, to boot Ubuntu as the host OS give the follwing command to begin the distro boot from the ubuntu.satadisk image:

    ./distro.sh -p rdn2cfg1 -d /absolute/path/to/ubuntu.satadisk -n true
    
  • After booting the host OS verify that the KVM and virtualization support is enabled. Each Linux distro has different ways to verify this but it is also possible to confirm by looking into the kernel boot logs.

    dmesg | grep -i "kvm"
    

    Above command puts out KVM related boot logs which should be similar to the logs shown below:

    kvm [1]: IPA Size Limit: 48 bits
    kvm [1]: GICv4 support disabled
    kvm [1]: GICv3: no GICV resource entry
    kvm [1]: disabling GICv2 emulation
    kvm [1]: GIC system register CPU interface enabled
    kvm [1]: vgic interrupt IRQ1
    kvm [1]: VHE mode initialized successfully
    

    Also make sure /dev/kvm exists. If any of this is not met, please follow through for the solution mentioned in the below sections.

Network Support

  • Check if host OS has network access by running ping -c 5 8.8.8.8. If the ping doesn’t work as the network is unreachable then enable it using dhclient utility for dhcp discovery on the host OS:

    sudo dhclient -v
    
  • Check the available network interfaces on the host with below command:

    ip link show
    

    Check if the above command shows a virtual bridge virbr# already configured and running on host. This virtual bridge will help in giving network access to the guest OS.

  • If the KVM support or the virtual bridge could not be found then try the below commands. For more details refer to the instructions in Ubuntu KVM Installation guide to resolve any issues.

    sudo apt update
    sudo apt install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils libfdt-dev -y
    
  • Now start the libvirtd service to initiate the communication between the KVM and the libvirt APIs. Use below commands to configure the system to start the service at every boot.

    sudo systemctl start libvirtd
    sudo systemctl enable libvirtd
    
  • The network acces to the guest OS can be given by creating a bridge and a tap interface. Follow commands shown below to create the tap interface and add it to virtual bridge virbr# as listed from executing ip link show.

    sudo ip tuntap add dev tap0 mode tap user $(whoami)
    sudo ip link set tap0 master virbr# up
    

Now create a workspace to begin with virtualization test example.

mkdir -p ~/kvm-test/
cd ~/kvm-test/

Emulate Flash Memory

ArmvirtKvmTool UEFI firmware needs a flash memory while booting to store various objects. Create an empty zero filled flash memory file which will be presented by kvmtool as a flash device to the UEFI firmware and guest OS.

dd if=/dev/zero of=efivar.img bs=128M count=1

Enable PCIe pass-through based device virtualization

As mentioned in the Virtualization document PCIe pass-through (also called as direct device assignment) allows a device to be assigned to a guest such that the guest runs the driver for the device without intervention of the hypervisor/host. This is one of the device virtuali- zation technique that provides near near host device performance. This is achieved with the help of VFIO driver framework and IOMMU support. More about this can be read from Linux vfio.

  • Neoverse reference platforms have few smmu-test-engine devices that are the PCIe endpoint devices that can be used to demonstrate this feature Use the verbose lspci command to check the status of these devices for example, with pci BDF ids 08:00.0 and 08:00.1.

    sudo lspci -v
    sudo lspci -v -s 0000:08:00.1
    
  • Check if vfio_pci kernel module is already loaded or not.

    lsmod | grep -i "vfio"
    

    if not then manually probe the kernel driver module

    sudo modprobe vfio-pci
    
  • Unbind the pci endpoint device from its current driver if the device is attached to its class driver. If the driver doesn’t exist ignore the error produced on running below command

    echo "0000:08:00.1" | sudo tee /sys/bus/pci/devices/0000\:08\:00.1/driver/unbind
    
  • Bind the device to vfio-pci driver

    echo "vfio-pci" | sudo tee /sys/bus/pci/devices/0000\:08\:00.1/driver_override
    echo "0000:08:00.1" | sudo tee /sys/bus/pci/drivers_probe
    
  • Confirm that device has been attached to vfio-pci driver

    sudo lspci -v -s 0000:08:00.1 | grep -i "Kernel driver"
    
  • In order to use the device for direct assignment, it is required that all the devices sharing the iommu group with this particular device are attached to vfio-pci driver. So perform the above mentioned unbinding and binding for all the endpoint devices that shares the common iommu group. List out all the devices that are under that specific iommu group

    ls /sys/bus/pci/drivers/vfio-pci/0000\:08\:00.1/iommu_group/devices/
    

Obtain the built binaries

  • Running the KVM session will require the ArmvirtKvmTool UEFI firmware, a guest disk image with pre-installed Linux distro OS and the kvmtool binary which were obtained in section Build & Install. Copy these to the host OS through network using below commands in the workspace directory kvm-test.

    rsync -Wa --progress user@server:absolute/path/to/guest-ubuntu.satadisk .
    rsync -Wa --progress user@server:TOP_DIR/output/<platform name>/components/css-common/KVMTOOL_EFI.bin .
    rsync -Wa --progress user@server:TOP_DIR/output/<platform name>/components/rdn2/lkvm .
    

Launch VMs with multiple Linux distributions

Finally, launch the virtual machine with a Linux distribution image as the guest OS. As mentioned in the Virtualization document ‘screen’ utility can be used to multiplex console outputs.

Note

To switch back to host session detach from the screen by pressing ctrl+a d.

Run the below command from kvm-test workspace directory to start a KVM session with ArmvirtKvmTool binary KVMTOOL_EFI.bin, kvmtool binary lkvm, flash image efivar.img, the distribution disk image for guest guest-ubuntu.satadisk, tap0 tap inteface and the PCI device with requester-ID (BDF) 0000:08:00.1 used for direct device assignment:

screen -md -S "virt0" sudo ./lkvm run -m 2048 -f KVMTOOL_EFI.bin -F efivar.img -d guest-ubuntu.satadisk -n tapif=tap0 --console serial --force-pci --vfio-pci 0000:08:00.1;
  • The launched screens can be viewed from the target by using the following command:

    screen -ls
    
  • Jump to the screen using:

    screen -r virt0
    
  • The guest can be seen booting with logs as shown below:

    # lkvm run --firmware ./KVMTOOL_EFI.bin -m 2048 -c 4 --name guest-3882
    Info: Using IOMMU type 3 for VFIO container
    Info: 0000:08:00.1: assigned to device number 0x0 in group 3
    Info: flash file size (134217728 bytes) is not a power of two
    Info: only using first 16777216 bytes
    UEFI firmware (version  built at 14:51:31 on Apr  4 2022)
    
  • Notice the logs about PCIe device being setup using the Linux VFIO driver.

    Info: Using IOMMU type 3 for VFIO container
    Info: 0000:08:00.1: assigned to device number 0x0 in group 9
    
  • Once the guest has booted. check if network is accessible and assigned pci device is listed in lspci.

    # If network is unreachable use dhclient:
    sudo dhclient -v
    
    ping -c 2 8.8.8.8
    
    # Check the listed PCI devices
    lspci
    
    # Output of lspci
    00:00.0 Unassigned class [ff00]: ARM Device ff80
    
  • To shutdown the guest execute the following command:

    sudo poweroff
    

    On completion of guest shutdown kvmtool prints a message denoting error free closing of KVM session.

    # KVM session ended normally.
    

Copyright (c) 2022-2023, Arm Limited. All rights reserved.