PCIe I/O virtualization

What is I/O virtualization?

I/O virtualization allows sharing a common I/O resource between multiple running virtual machines so that the resource usage and cost are optimized for a typical infrastructure use-case. Few techniques used for I/O virtualization are:

  • Trap and emulate

  • Paravirtualization

  • PCI passthrough

This page describes the PCI passthrough technique that is the most widely adopted technique for I/O virtualization.

PCIe pass-through based device virtualization

PCIe pass-through (also called as direct device assignment) allows a device to be assigned to a guest such that the guest runs the driver for the device without intervention of the hypervisor/host. This is one of the device virtualization technique besides para-virtualization.

PCIe pass-through is achieved using frameworks in Linux kernel, such as VFIO, virtio, IOMMU, and pci. A smmu-test-engine (smmute) device that is available on the platform is used as a test device for this virtualization technique. The smmu-test-engine is a PCIe exerciser that generates DMA workloads and it uses arm-smmu-v3 to provide dma isolation. This device first probed in the host kernel can be assigned to the guest and the smmu-test-engine driver in the guest kernel can then manage the device directly.

PCI pass-through using multiple guests and smmu test engine:

  • Boot the platform by following the Buildroot guide, and then ensure that the smmu test engine device is probed correctly. Use the lspci command to check for smmu test engine devices with pci BDF ids - 07:00.0, 07:00.3, 08:00.0 and 08:00.1.

    lspci
    
  • Verbose output of lspci will show the last four devices with above mentioned pci BDF ids are managed by ‘smmut-pci’ kernel driver.

    lspci -v
    
  • Also check that the smmute-pci driver has probed the smmu test engine devices properly, and a device extry exists for each of the four smmute devices.

    ls -l /dev/smmute*
    
  • Use one of the smmute devices (e.g. device 0000:08:00.1) to perform the PCI pass-through. Detach the pcie device from its class driver and attach to vfio-pci driver, as also explained in the kernel doc.

    echo 0000:08:00.1 > /sys/bus/pci/devices/0000:08:00.1/driver/unbind
    echo vfio-pci > /sys/bus/pci/devices/0000:08:00.1/driver_override
    echo 0000:08:00.1 > /sys/bus/pci/drivers_probe
    
  • The kernel and ramdisk images to launch VMs are available in the second partition of grub disk image that gets probed at /dev/vda2 in the host. Mount this to use the images.

    mount /dev/vda2 /mnt
    
  • This mounted partition can also be shared with guest using 9p virtual filesystem. A binary to run tests over smmute device is also available in this partition. So after sharing the filesystem with a guest, tests can be run on assigned smmute device to verify pci pass-through.

  • Launch VMs using lkvm tool that supports virtio-iommu and vfio drivers to allow pci pass-through.

    screen -md -S "virt1" lkvm run -k /mnt/Image -i /mnt/ramdisk-buildroot.img --irqchip gicv3-its -c 2 -m 512 --9p /mnt,hostshare --console serial --params "console=ttyS0 --earlycon=uart,mmio,0x1000000 root=/dev/vda" --vfio-pci 0000:08:00.1;
    
  • Jump to the right screen to view boot-up logs from guest. Use following command to go to a specific screen:

    screen -r virt1
    
  • After the guest boots up, mount the 9p filesytem to a mount point in the guest. For example, use the following command to mount at /tmp

    mount -t 9p -o trans=virtio hostshare /tmp/
    cd /tmp
    
  • Check that the smmu test engine is probed in the guest. The device will show a different pci BDF id here in guest as compared to the id shown in host kernel.

    # lspci
    00:00.0 Unassigned class [ff00]: ARM Device ff80
    
    # ls -l /dev/smmute*
    crw-------    1 root     root      235,   0 Jan  1 00:00 /dev/smmute0
    
  • From /tmp directory that contains the ‘smmute’ binary, run the test.

    ./smmute -s 0x100 -n 10
    
  • Check that the MSI interrupts on the smmu test engine PCI device in the guest are triggered.

    cat /proc/interrupts
    
    • For example, after running few iterations of smmute test the MSI interrupts on the PCI device would look like:

    #         CPU0       CPU1       CPU2       CPU3
    20:          1          0          0          0   ITS-MSI   0 Edge      0000:00:00.0
    21:          0          2          0          0   ITS-MSI   1 Edge      0000:00:00.0
    22:          0          0          1          0   ITS-MSI   2 Edge      0000:00:00.0
    23:          0          0          0          1   ITS-MSI   3 Edge      0000:00:00.0
    24:          1          0          0          0   ITS-MSI   4 Edge      0000:00:00.0
    25:          0          1          0          0   ITS-MSI   5 Edge      0000:00:00.0
    26:          0          0          1          0   ITS-MSI   6 Edge      0000:00:00.0
    27:          0          0          0          0   ITS-MSI   7 Edge      0000:00:00.0
    
  • Jump back to the host by exiting the screen using ‘Ctrl-a d’, and launch another guest by repeating the above commands and updating the screen_name, and device. For example,

    echo 0000:08:00.0 > /sys/bus/pci/devices/0000:08:00.0/driver/unbind
    echo vfio-pci > /sys/bus/pci/devices/0000:08:00.0/driver_override
    echo 0000:08:00.0 > /sys/bus/pci/drivers_probe
    
    screen -md -S "virt2" lkvm run -k /mnt/Image -i /mnt/ramdisk-buildroot.img --irqchip gicv3-its -c 2 -m 512 --9p /mnt,hostshare --console serial --params "console=ttyS0 --earlycon=uart,mmio,0x1000000 root=/dev/vda" --vfio-pci 0000:08:00.0;
    
  • Perform test over smmu test engine in this second screen by mounting the 9p filesystem and executing the ‘smmute’ binary. Check again in this guest that the MSI interrupts on the smmu test engine PCI device are triggered.

    cat /proc/interrrupts
    
  • Jump back to the host by exiting the screen using ‘Ctrl-a d’ and use the following command to list the guests that are managed by lkvm tool.

    # lkvm list
    PID NAME                 STATE
    ------------------------------------
    309 guest-309            running
    276 guest-276            running
    
  • Power-off the guests by jumping to the respective screens and executing the command:

    poweroff
    
  • The guests would shutdown and the following message would be displayed on the console.

    # KVM session ended normally.
    

Copyright (c) 2021-2023, Arm Limited. All rights reserved.