0
votes

My configuration

I'm trying to deploy a virtual machine in Azure using a Oracle Linux image (version 8.4, generation 2). We change the mount point of the Azure temporary disk (ephemeral0) to /mnt/resource. In addition I create a swapfile on the temporary disk. I'm using the following custom cloud-init script during deployment:

#cloud-config
datasource_list: [ Azure ]
mounts:
- [ ephemeral0, /mnt/resource, auto, "defaults,nofail,x-systemd.requires=cloud-init.service" ]
mount_default_fields: [ None, None, "auto", "defaults,nofail", "0", "2" ]
swap:
  filename: /mnt/resource/swap.img
  size: "auto" # or size in bytes
  max_size: 17179869184 # 16GB

On the first boot right after VM creation everything (ephemeral0 and swap) is working as expected. If I take a look in /var/log/cloud-init.log you can see the following entries:

[root@test01 ~]# cat /var/log/cloud-init.log | grep swap
2021-08-16 08:19:29,814 - cc_mounts.py[DEBUG]: Attempting to determine the real name of swap
2021-08-16 08:19:29,814 - cc_mounts.py[DEBUG]: changed default device swap => None
2021-08-16 08:19:29,814 - cc_mounts.py[DEBUG]: Ignoring nonexistent default named mount swap
2021-08-16 08:19:29,815 - cc_mounts.py[DEBUG]: suggest 4096.0 MB swap for 7672.03125 MB memory with '17018.25390625 MB' disk given max=None [max=4254.5634765625 MB]'
2021-08-16 08:19:29,815 - cc_mounts.py[DEBUG]: Creating swapfile in '/mnt/resource/swap.img' on fstype 'xfs' using 'fallocate'
2021-08-16 08:19:29,815 - subp.py[DEBUG]: Running command ['fallocate', '-l', '4096M', '/mnt/resource/swap.img'] with allowed return codes [0] (shell=False, capture=True)
2021-08-16 08:19:29,849 - subp.py[DEBUG]: Running command ['mkswap', '/mnt/resource/swap.img'] with allowed return codes [0] (shell=False, capture=True)
2021-08-16 08:19:29,887 - util.py[DEBUG]: Setting up swap file took 0.072 seconds
2021-08-16 08:19:29,893 - cc_mounts.py[DEBUG]: Changes to fstab: ['+ /dev/disk/cloud/azure_resource-part1 /mnt/resource auto defaults,nofail,x-systemd.requires=cloud-init.service,comment=cloudconfig 0 2', '+ /mnt/resource/swap.img none swap sw,comment=cloudconfig 0 0']
2021-08-16 08:19:29,893 - subp.py[DEBUG]: Running command ['swapon', '-a'] with allowed return codes [0] (shell=False, capture=True)
2021-08-16 08:19:29,929 - cc_mounts.py[DEBUG]: Activate mounts: PASS:swapon -a

As you can see cloud-init suggests 4096 MB as swapsize.

In /etc/fstab the following entries are added:

/dev/disk/cloud/azure_resource-part1    /mnt/resource   auto    defaults,nofail,x-systemd.requires=cloud-init.service,comment=cloudconfig       0       2
/mnt/resource/swap.img  none    swap    sw,comment=cloudconfig  0       0

Also swapon -s states that swap is configured corretly:

Filename                                Type            Size    Used    Priority
/mnt/resource/swap.img                  file            4194300 0       -2

The Problem

Now if I deallocate the virtual machine and start it again the temporary disk is deleted and recreated as expected. It is mounted again under /mnt/resource but swap is not created any longer. /var/log/cloud-init.log states:

2021-08-16 08:29:33,331 - cc_mounts.py[DEBUG]: Attempting to determine the real name of swap
2021-08-16 08:29:33,331 - cc_mounts.py[DEBUG]: changed default device swap => None
2021-08-16 08:29:33,331 - cc_mounts.py[DEBUG]: Ignoring nonexistent default named mount swap
2021-08-16 08:29:33,331 - util.py[DEBUG]: Reading from /proc/swaps (quiet=False)
2021-08-16 08:29:33,331 - util.py[DEBUG]: Read 37 bytes from /proc/swaps
2021-08-16 08:29:33,331 - cc_mounts.py[DEBUG]: swap file /mnt/resource/swap.img exists, but not in /proc/swaps
2021-08-16 08:29:33,332 - cc_mounts.py[DEBUG]: suggest 0.0 MB swap for 7672.03125 MB memory with '12889.515625 MB' disk given max=None [max=3222.37890625 MB]'
2021-08-16 08:29:33,332 - cc_mounts.py[DEBUG]: Not creating swap: suggested size was 0
2021-08-16 08:29:33,337 - cc_mounts.py[DEBUG]: Changes to fstab: ['- /mnt/resource/swap.img none swap sw,comment=cloudconfig 0 0']

For my understanding the cc_mounts module of cloud-init suggests a swapsize of 0 MB because it determines that the temporary disk has only about 12 GB space left. This seems to be wrong since (a) the disk is empty due to deallocating and (b) df -h states it has about 15 GB available:

[root@test01 ~]# df -h /mnt/resource/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1        16G   45M   15G   1% /mnt/resource

Am I missing something here? Can anybody explain why cloud-init behaves like this and how to create a swapfile properly for every reboot?

1

1 Answers

0
votes

• Your problem is occurred from a misconfiguration that causes Azure Linux Agent and the cloud init agent to try to configure the swap file. When cloud-init is responsible for provisioning, the swap file must be configured by cloud-init to enable only one agent (either cloud-init or waagent) for provisioning. This issue can be intermittent because of the timing of when the waagent daemons start.

• You can fix this problem by disabling the disk formatting and then swapping the configuration within the waagent configuration file, i.e., /etc/waagent.conf and ensuring that the azure linux agent is not mounting the ephemeral disk as this should be handled by the cloud-init agent. For this purpose, set the parameters as below: -

#vi /etc/waagent.conf

#Mount point for the resource disk

ResourceDisk.MountPoint=/mnt

#Create and use swapfile on resource disk

ResourceDisk.EnableSwap=n

#Size of the swapfile

ResourceDisk.SwapSizeMB=0

• Restart the Azure Linux agent and ensure that the VM is configured to create a swap file through cloud init. Also, add the below script to ‘/var/lib/cloud/scripts/per-boot’ and making the file executable by using the ‘# chmod +x create_swapfile.sh’ command: -

#!/bin/sh

 if [ ! -f '/mnt/swapfile' ]; then

 fallocate --length 2GiB /mnt/swapfile

 chmod 600 /mnt/swapfile

 mkswap /mnt/swapfile

 swapon /mnt/swapfile

 swapon -a

 else

 swapon /mnt/swapfile; fi

• Once done, stop and start the VM and check for swap enablement. Below is its example. Also, compare the logs from /var/log/waagent.log and /var/log/cloud-init.log for reboot timeframe. To avoid this situation completely, deploy the VM by using the swap configuration custom data during provisioning.

Please find the below documentation for more information: -

https://docs.microsoft.com/en-us/troubleshoot/azure/virtual-machines/swap-file-not-recreated-linux-vm-restart

https://docs.microsoft.com/en-us/azure/virtual-machines/extensions/update-linux-agent

Thanking you,