Tuesday, July 12, 2016

Live Migrating Virtual Machines in KVM Without Shared Storage (with scripts!)

The last few weeks I've been working on redistributing our KVM guests amongst our host systems at the datacenters. Unfortunately, this process is slow and labor intensive due to some legacy design (the state of which is steadily improving). One of the few remaining items on the improvements list, some big, fast shared storage. After several frustrating days of hunting down system custodians, coordinating maintenance windows, shutting down services, scping files and battling with fussy services upon reboot I decided I had to come up with a better solution. This post will walk you through my process for migrating a VM between two hosts on the same network using local storage.

If you're interested, there's a whole lot of really great information on the libvirt migration page about how libvirt handles migrations. After a brief read of this page, a few more searches around the net and a brief study of the man pages I came up with the following command:

virsh migrate --live \
--persistent \
--undefinesource \
--copy-storage-all \
--verbose \
--desturi [destination] [vm_name]

The key to this command is the --copy-storage-all command. This will copy the contents of the source image file to the destination. Before we can run this command, though, we need to create a representation of the image on the destination server, preferably in the same location. Since I use sparsely allocated qcow2 files, I will be using qemu-img with the "qcow2" format flag. If you are using something different, you should create the destination image the same way as you created the original source image. This file should be the same size as the source image:

qemu-img create -f qcow2 /vm_storage/[my_vm].img

Once you've created the file on the destination server, you are almost ready to kick off migration. First we need to make sure you can connect to the remote host. You can test this with virsh:

virsh -c qemu+ssh://[remote_host]/system -l

If all goes well, you should see a list of the VMs on the remote host. You may, however, receive an error or an empty list (assuming it shouldn't be empty). If this happens, you probably need to add the following pkla file which allows the group your user is in to connect to the remote libvirt socket. I've provided the policykit file contents below:

[Remote libvirt SSH access]
Identity=unix-group:MY_GROUP
Action=org.libvirt.unix.manage
ResultAny=yes
ResultInactive=yes
ResultActive=yes

Change "MY_GROUP" to whatever group you want to use. Wheel is a good choice if your user is a member. On a Redhat/CentOS system, save this file as /etc/polkit-1/localauthority/50-local.d/50-libvirt-remote-access.pkla and try again.

Once you're able to successfully run the test above, you can try to migrate your VM. I've provided an example below. Run this as the user who is a member of the group we added to the pkla file above. You should probably also run this inside of a tmux or screen session as it will take a while to complete:

virsh migrate --live --persistent --undefinesource --copy-storage-all \
--verbose --desturi qemu+ssh://[remote_host]/system [vm_name]

If all goes well, you should receive a progress indicator that will progress slowly. In my tests, a 100GB VM with 4GB of RAM under minimal load took about 15 minute to transfer over a 1GbE network.

This script should be run from the source VM host. It will handle creating the img file on the destination host and do some sanity checking for you:

# Change libvirt default URI to allow enforecement by the pkla file
export LIBVIRT_DEFAULT_URI=qemu:///system
vm="$1"
dest_host="$2"
storage="/directory/of/img/files"

# Are you root?
if [[ "$UID" == "0" ]]
then
  echo "You can not run migrate as root"
  exit 1
fi

# Exit if not running in screen or tmux
if [[ -z "$TMUX" ]] || [[ "$TERM" != "screen" ]]
then
  echo "You must run migrations in either screen or tmux. Aborting."
  exit
fi

# Confirmation
read -p "You are about to move ${vm} to ${dest_host}. Are you sure? y/N " -n 1 -r
echo

# Check if VM exists
if ! virsh list --all | awk '{print $2}' | grep -q "$vm"
then
  echo "$vm does not exist on this server. Aborting."
  exit 1
fi

# Check if remote destination exists
if ! host "$dest_host" > /dev/null
then
  echo "Unable to reach $dest_host. Aborting."
  exit 1
fi

# Check that we are forwarding ssh agent
if [[ -z "${SSH_AUTH_SOCK}" ]]
then
  echo "Please exit and re-connect using ssh's -A flag"
  exit 1
fi

# Get VM size (this is ugly)
# TODO: Make this handle multiple disks. Right now it will only work if the
#       img file is named the same as the host and there is only one file
disk_size=$(virsh vol-info --pool default "${vm}".img | awk '/Capacity/ {print $2}' | awk -F'.' '{print $1}')

# Create remote disk
if ! ssh "${dest_host}" "[[ ! -f ${storage}/${vm}.img ]] && sudo qemu-img create -f qcow2 ${storage}/${vm}.img ${disk_size}G"
then
  echo "Unable to create image file ${storage}/${vm}.img on ${dest_host}. Confirm that this does not already exist and try again."
  exit 1
fi

# Migrate VM to new host
virsh migrate --live --persistent --undefinesource --copy-storage-all --verbose --desturi qemu+ssh://"${dest_host}"/system "${vm}"
if [[ $? -eq 0 ]]
then
  echo
  echo "Migration complete. A copy of the VM image file still resides on the old host inside of ${storage}."
  read -n 1 -p "Do you want to delete ${storage}/${vm}.img now? y/N " -r del
  if [ "$del" == "y" ]
  then
    rm -f "${storage}"/"${vm}".img
  fi
else
  echo "The migration does not appear to have completed cleanly. Cleaning up remote host and exiting."
  ssh "${dest_host}" "[[ ! -f ${storage}/${vm}.img ]] && sudo rm -f ${storage}/${vm}.img"
  exit 1
fi

exit 0


Invoke the script as follows:

script [vm_name] [destination_server]


Good luck!

No comments:

Post a Comment