0
votes

I'm trying to set up a very minimal aarch64 KVM-capable system.

My design requires a minimalist kernel with few drivers linked to the kernel image. My objective is to bring up a virtual machine running a bare-metal application as quickly as possible. The same hypervisor is required later to be able to run a full-fledged Linux distribution.

It happens to me that when this aarch64 hypervisor starts a Linux VM with qemu -M virt,accel=kvm the VM executes the bootloader, the kernel's efi stub, but hangs in the kernel's arch-specific initialization. To be more precise, running the qemu and peep into the hung system, I found the PC to be often around this position:

U-Boot 2021.01 (Aug 20 2021 - 10:50:56 +0200)

DRAM:  128 MiB
Flash: 128 MiB
In:    pl011@9000000
Out:   pl011@9000000
Err:   pl011@9000000
Net:   No ethernet found.
Hit any key to stop autoboot:  0 
Timer summary in microseconds (7 records):
       Mark    Elapsed  Stage
          0          0  reset
815,990,026815,990,026  board_init_f
816,418,047    428,021  board_init_r
818,760,551  2,342,504  id=64
818,763,069      2,518  main_loop

Accumulated time:
                10,017  dm_r
                53,553  dm_f
10189312 bytes read in 111 ms (87.5 MiB/s)
Scanning disk virtio-blk#31...
Found 1 disks
Missing RNG device for EFI_RNG_PROTOCOL
No EFI system partition
Booting /\boot\Image
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...
QEMU 5.2.0 monitor - type 'help' for more information
(qemu) info registers 
 PC=ffffffc010010a34 X00=1de7ec7edbadc0de X01=ffffffc01084ca38
X02=ffffffc01084ca38 X03=1de7ec7edbadc0de X04=ffffffc0108f0120
X05=0000000000000348 X06=ffffffc010679000 X07=0000000000000000
X08=ffffffc010a079b8 X09=ffffffc01008ef80 X10=ffffffc0108ee398
X11=ffffffc010000000 X12=ffffffc01099a3c8 X13=0000000300000101
X14=0000000000000000 X15=0000000046df02b8 X16=0000000047f6d968
X17=0000000000000000 X18=0000000000000000 X19=ffffffc0109006c0
X20=1de7ec3eec328b16 X21=ffffffc0108f02b0 X22=ffffffc01073fd40
X23=00000000200001c5 X24=0000000046df0368 X25=0000000000000001
X26=0000000000000000 X27=0000000000000000 X28=ffffffc0109006c0
X29=ffffffc0108f0090 X30=ffffffc01065abf4  SP=ffffffc01084c2b0
PSTATE=400003c5 -Z-- EL1h     FPCR=00000000 FPSR=00000000
Q00=0000000000000000:0000000000000000 Q01=0000000000000000:0000000000000000
Q02=0000000000000000:0000000000000000 Q03=0000000000000000:0000000000000000
Q04=0000000000000000:0000000000000000 Q05=0000000000000000:0000000000000000
Q06=0000000000000000:0000000000000000 Q07=0000000000000000:0000000000000000
Q08=0000000000000000:0000000000000000 Q09=0000000000000000:0000000000000000
Q10=0000000000000000:0000000000000000 Q11=0000000000000000:0000000000000000
Q12=0000000000000000:0000000000000000 Q13=0000000000000000:0000000000000000
Q14=0000000000000000:0000000000000000 Q15=0000000000000000:0000000000000000
Q16=0000000000000000:0000000000000000 Q17=0000000000000000:0000000000000000
Q18=0000000000000000:0000000000000000 Q19=0000000000000000:0000000000000000
Q20=0000000000000000:0000000000000000 Q21=0000000000000000:0000000000000000
Q22=0000000000000000:0000000000000000 Q23=0000000000000000:0000000000000000
Q24=0000000000000000:0000000000000000 Q25=0000000000000000:0000000000000000
Q26=0000000000000000:0000000000000000 Q27=0000000000000000:0000000000000000
Q28=0000000000000000:0000000000000000 Q29=0000000000000000:0000000000000000
Q30=0000000000000000:0000000000000000 Q31=0000000000000000:0000000000000000
(qemu) x/10i 0xffffffc010010a34
0xffffffc010010a34:  8b2063ff      add sp, sp, x0
0xffffffc010010a38:  d53bd040      mrs x0, (unknown)
0xffffffc010010a3c:  cb2063e0      sub x0, sp, x0
0xffffffc010010a40:  f274cc1f      tst x0, #0xfffffffffffff000
0xffffffc010010a44:  54002ca1      b.ne #+0x594 (addr -0x3feffef028)
0xffffffc010010a48:  cb2063ff      sub sp, sp, x0
0xffffffc010010a4c:  d53bd060      mrs x0, (unknown)
0xffffffc010010a50:  140003fc      b #+0xff0 (addr -0x3feffee5c0)
0xffffffc010010a54:  d503201f      nop
0xffffffc010010a58:  d503201f      nop

https://elixir.bootlin.com/linux/v5.12.10/source/arch/arm64/kernel/entry.S#L113

The guest kernel is the upstream 5.12, with no modifications in the arch initialization area. There are just a few changes in the start_kernel() where a few printk() are added to provide timestamps, but the VM never reaches them. Also, the same hypervisor can run the guest with the same qemu by only removing the KVM acceleration.

I have a little experience with KVM and less on KVM on aarch64. I feel like I miss something in my hypervisor kernel. I removed many drivers from the kernel to have the quickest boot time possible, so no USB or network drivers, but I made sure the kernel to have the full-fledged virtualization suite configurations enabled.

As the system starts, /dev/kvm is there, and the qemu seems to shallow my command, including KVM feature without complaining.

But then the host hangs.

I got these results on two different platforms, just to be sure it is not hardware dependent:

  • Pine64 with ATF and uboot (I'm aware allwinner BSP leaves the processor at EL0, so I implemented the boot chain using ATF)
  • Raspberry3

Currently, I have no clue where to investigate next. Any suggestion would be greatly appreciated.

1
To me, it seems just a problem of your KVM configuration. It doesn't seem to execute the kernel. Check the settings of your VM (and so that they should emulate a machine expected by your kernels). In any case, your question is not about programming, so off-topic here.Giacomo Catenazzi

1 Answers

0
votes

At last, I managed to make the KVM host run in the environment described up there.

Resuming here the issue:

  • Minimal Linux with KVM support
  • Minimal Linux system with its bootloader and kernel image to run in the virtualized context
  • Linux kernel slightly modified in the start_kernel() by adding a few printk() to emit timestamps
  • The image is proved to be working on qemu without the KVM extension

Test with qemu and the KVM extension gives the following results:

As the qemu starts

qemu-system-aarch64 -machine virt -cpu host -enable-kvm -smp 1 -nographic -bios ./u-boot.bin -drive file=./rootfs.ext2,if=none,format=raw,id=hd0 -device v
irtio-blk-device,drive=hd0

The bootloader gets executed, the flow is handed to the Linux kernel, the Linux kernel EFI stub emits its messages, the messages flow stops, and everything seems to hang.

Peeping into the execution at a random time, I verified the code to be consistently inside the /arch/arm64/kernel/entry.S.

A quick inspection of this code led me to the wrong conclusion this code is to be part of a loop controlled by some exotic (for me) aarch64 specific control register.

I fired my question on StackOverflow

At a second time look, I realized no loop was involved and that the phenomenon results from an exception chain.

Apparently, my modification to the kernel where I added a printk() as early as possible added this code before the kernel correctly set up its stack.

This modification was the root of the problem.

A few things I learned from this issue:

  • Peep into the code at random times, and finding it around the same instruction does not imply the code is in a loop
  • Having the code apparently working in an environment does not imply the code's correct
  • Emulation and virtualization aim to give the same results, but it is very unlikely they will ever reach 100% of the cases. Consider them different if you don't want to stick with false assumptions.

For more than half of the tests, I have used the Pine64 board. Pine64, the one I own, is based on Allwinner A64 SoC. There's not a lot of documentation on this board and its main SoC. Moreover, the documentation available is old. I had concerns about its boot sequence, as there is a document online stating that because a proprietary component of the boot chain releases the processor at EL1, the KVM can not work properly. This fact triggered me to dig into the Pine64 boot process.

What I learned on the Pine64 board and KVM:

  • The document is referring to the early BSP provided by Allwinner. The component is NOT the boot ROM. Newer components of the boot chain based on ATF do NOT suffer this problem. Refer to the code box below for details
  • Allwinner A64 SoC is a Cortex-A53, and as any other ARMv8, as no design issue in running KVM or any other virtualization tool.
  • On aarch64, The kernel can start at EL2, or EL1. But If you need virtualization with KVM, you need it to start at EL2, because this is the exception level where hypervisors (KVM) can do their job.
  • U-boot starts kernel at EL2; there is a specific configuration to make it start the kernel at EL1, which is NOT the default
  • The starting kernel installs the KVM hypervisor code and changes the CPU to EL1 before the start_kernel () function.
U-Boot SPL 2021.01 (Aug 31 2021 - 10:14:46 +0200)
DRAM: 512 MiB
Trying to boot from MMC1


U-Boot 2021.01 (Aug 31 2021 - 10:13:59 +0200) Allwinner Technology

EL = 2
CPU:   Allwinner A64 (SUN50I)
Model: Pine64
DRAM:  512 MiB
MMC:   mmc@1c0f000: 0
Loading Environment from FAT... *** Warning - bad CRC, using default environment

In:    serial
Out:   serial
Err:   serial
Net:   phy interface6
eth0: ethernet@1c30000
starting USB...
Bus usb@1c1a000: USB EHCI 1.00
Bus usb@1c1a400: USB OHCI 1.0
Bus usb@1c1b000: USB EHCI 1.00
Bus usb@1c1b400: USB OHCI 1.0
scanning bus usb@1c1a000 for devices... 1 USB Device(s) found
scanning bus usb@1c1a400 for devices... 1 USB Device(s) found
scanning bus usb@1c1b000 for devices... 1 USB Device(s) found
scanning bus usb@1c1b400 for devices... 1 USB Device(s) found
       scanning usb for storage devices... 0 Storage Device(s) found
Hit any key to stop autoboot:  0
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1...
Found U-Boot script /boot.scr
270 bytes read in 2 ms (131.8 KiB/s)
## Executing script at 4fc00000
32086528 bytes read in 1541 ms (19.9 MiB/s)
27817 bytes read in 4 ms (6.6 MiB/s)
Moving Image from 0x40080000 to 0x40200000, end=42120000
## Flattened Device Tree blob at 4fa00000
   Booting using the fdt blob at 0x4fa00000
EHCI failed to shut down host controller.
   Loading Device Tree to 0000000049ff6000, end 0000000049fffca8 ... OK
EL = 2

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000] Linux version 5.12.19 (alessandro@x1) (aarch64-buildroot-linux-gnu-gcc.br_real (Buildroot 2021.05.1) 9.4.0, GNU ld (GNU Binutils) 2.35.2) #2 SMP PREEMPT Mon Aug 30 20:06:45 CEST 2021
[    0.000000] EL = 1
[    0.000000] Machine model: Pine64
[    0.000000] efi: UEFI not found.