Citrix Hypervisor

Clustered pools

Clustering provides extra features that are required for resource pools that use GFS2 SRs. For more information about GFS2, see Configure storage.

A cluster is a pool of Citrix Hypervisor hosts that are more closely connected and coordinated than hosts in non-clustered pools. The hosts in the cluster maintain constant communication with each other on a selected network. All hosts in the cluster are aware of the state of every host in the cluster. This host coordination enables the cluster to control access to the contents of the GFS2 SR.

Note:

The clustering feature only benefits pools that contain a GFS2 SR. If your pool does not contain a GFS2 SR, do not enable clustering in your pool.

Quorum

Each host in a cluster must always be in communication with the majority of hosts in the cluster (including itself). This state is known as a host having quorum. If a host does not have quorum, that host self-fences.

The number of hosts that must be in communication to initially achieve quorum can be different to the number of hosts a cluster requires to keep quorum.

The following table summarizes this behavior. The value of n is the total number of hosts in the clustered pool.

  Number of hosts required to achieve quorum Number of hosts required to remain quorate
Odd number of hosts in the pool (n+1)/2 (n+1)/2
Even number of hosts in the pool (n/2)+1 n/2

Odd-numbered pools

To achieve the quorum value for an odd-numbered pool you require half of one more than the total number of hosts in the cluster: (n+1)/2. This is also the minimum number of hosts that must remain contactable for the pool to remain quorate.

For example, in a 5-host clustered pool, 3 hosts must be contactable for the cluster to both become active and remain quorate [(5+1)/2 = 3].

Where possible it is recommended to use an odd number of hosts in a clustered pool as this ensures that hosts are always able to determine if they have a quorate set.

Even-numbered pools

When an even-numbered clustered pool powers up from a cold start, (n/2)+1 hosts must be available before the hosts have quorum. After the hosts have quorum, the cluster becomes active.

However, an active even-numbered pool can remain quorate if the number of contactable hosts is at least n/2. As a result, it is possible for a running cluster with an even number of hosts to split exactly in half. The running cluster decides which half of the cluster self-fences and which half of the cluster has quorum. The half of the cluster that contains the node with the lowest ID that was seen as active before the cluster split remains active and the other half of the cluster self-fences.

For example, in a 4-host clustered pool, 3 hosts must be contactable for the cluster to become active [4/2 + 1 = 3]. After the cluster is active, to remain quorate, only 2 hosts must be contactable [4/2 = 2] and that set of hosts must include the host with the lowest node ID known to be active.

Self-fencing

If a host detects that it does not have quorum, it self-fences within a few seconds. When a host self-fences, it restarts immediately. All VMs running on the host are immediately stopped because the host does a hard shutdown. In a clustered pool that uses high availability, Citrix Hypervisor restarts the VMs according to their restart configuration on other pool members. The host that self-fenced restarts and attempts to rejoin the cluster.

If the number of live hosts in the cluster becomes less than the quorum value, all the remaining hosts lose quorum.

In an ideal scenario, your clustered pool always has more live hosts than are required for quorum and Citrix Hypervisor never fences. To make this scenario more likely, consider the following recommendations when setting up your clustered pool:

  • Ensure that you have good hardware redundancy.

  • Use a dedicated bonded network for the cluster network. Ensure that the bonded NICs are on the same L2 segment. For more information, see Networking.

  • Configure storage multipathing between the pool and the GFS2 SR. For more information, see Storage multipathing.

Create a clustered pool

Before you begin, ensure the following prerequisites are met:

  • All Citrix Hypervisor servers in the clustered pool must have at least 2 GiB of control domain memory.

    Depending on your environment, your hosts might require more control domain memory than this. If you have insufficient control domain memory on your hosts, your pool can experience network instabililty. Network instability can cause problems for a clustered pool with GFS2 SRs. For information about changing the amount of control domain memory and monitoring the memory behavior, see Memory usage.

  • All hosts in the cluster must use static IP addresses for the cluster network.

  • We recommend that you use clustering only in pools containing at least three hosts, as pools of two hosts are sensitive to self-fencing the entire pool.

  • If you have a firewall between the hosts in your pool, ensure that hosts can communicate on the cluster network using the following ports:
    • TCP: 8892, 8896, 21064
    • UDP: 5404, 5405

    For more information, see Communication ports used by Citrix technologies.

  • If you are clustering an existing pool, ensure that high availability is disabled. You can enable high availability again after clustering is enabled.

  • We strongly recommend that you use a bonded network for your clustered pool that is not used for any other traffic.

If you prefer, you can set up clustering on your pool by using XenCenter. For more information, see the XenCenter product documentation.

To use the xe CLI to create a clustered pool:

  1. Create a bonded network to use as the clustering network.

    Note:

    We strongly recommend that you use a dedicated bonded network for your clustered pool. Do not use this network for any other traffic.

    On the Citrix Hypervisor server that you want to be the pool master, complete the following steps:

    1. Open a console on the Citrix Hypervisor server.

    2. Name your resource pool by using the following command:

      xe pool-param-set name-label="New Pool" uuid=<pool_uuid>
      
    3. Create a network for use with the bonded NIC by using the following command:

      xe network-create name-label=bond0
      

      The UUID of the new network is returned.

    4. Find the UUIDs of the PIFs to use in the bond by using the following command:

      xe pif-list
      
    5. Create your bonded network in either active-active mode, active-passive mode, or LACP bond mode. Depending on the bond mode you want to use, complete one of the following actions:

      • To configure the bond in active-active mode (default), use the bond-create command to create the bond. Using commas to separate the parameters, specify the newly created network UUID and the UUIDs of the PIFs to be bonded:

         xe bond-create network-uuid=<network_uuid> /
              pif-uuids=<pif_uuid_1>,<pif_uuid_2>,<pif_uuid_3>,<pif_uuid_4>
        

        Type two UUIDs when you are bonding two NICs and four UUIDs when you are bonding four NICs. The UUID for the bond is returned after running the command.

      • To configure the bond in active-passive or LACP bond mode, use the same syntax, add the optional mode parameter, and specify lacp or active-backup:

         xe bond-create network-uuid=<network_uuid> pif-uuids=<pif_uuid_1>, /
              <pif_uuid_2>,<pif_uuid_3>,<pif_uuid_4> /
              mode=balance-slb | active-backup | lacp
        

    After you have created your bonded network on the pool master, when you join other Citrix Hypervisor servers to the pool, the network and bond information is automatically replicated to the joining server.

    For more information, see Networking.

  2. Create a resource pool of at least three Citrix Hypervisor servers.

    Repeat the following steps on each Citrix Hypervisor server that is a (non-master) pool member:

    1. Open a console on the Citrix Hypervisor server.
    2. Join the Citrix Hypervisor server to the pool on the pool master by using the following command:

      xe pool-join master-address=master_address master-username=administrators_username master-password=password
      

      The value of the master-address parameter must be set to the fully qualified domain name of the Citrix Hypervisor server that is the pool master. The password must be the administrator password set when the pool master was installed.

    For more information, see Hosts and resource pools.

  3. For every PIF that belongs to this network, set disallow-unplug=true.

    1. Find the UUIDs of the PIFs that belong to the network by using the following command:

      xe pif-list
      
    2. Run the following command on a Citrix Hypervisor server in your resource pool:

      xe pif-param-set disallow-unplug=true uuid=<pif_uuid>
      
  4. Enable clustering on your pool. Run the following command on a Citrix Hypervisor server in your resource pool:

    xe cluster-pool-create network-uuid=<network_uuid>
    

    Provide the UUID of the bonded network that you created in an earlier step.

Destroy a clustered pool

You can destroy a clustered pool. After you destroy a clustered pool, the pool continues to exist, but is no longer clustered and can no longer use GFS2 SRs.

To destroy a clustered pool, run the following command:

xe cluster-pool-destroy cluster-uuid=<uuid>

Manage your clustered pool

When managing your clustered pool, the following practices can decrease the risk of the pool losing quorum.

Add or remove a host on a clustered pool

When adding or removing a host on a clustered pool, ensure that all the hosts in the cluster are online.

You can add or remove a host on a clustered pool by using XenCenter. For more information, see Add a Server to a Pool and Remove a Server From a Pool.

You can also add or remove a host on a clustered pool by using the xe CLI. For more information, see Add a host to a pool by using the xe CLI and Remove Citrix Hypervisor hosts from a resource pool.

Ensure that hosts are shut down cleanly

When a host is cleanly shut down, it is temporarily removed from the cluster until it is started again. While the host is shut down, it does not count toward the quorum value of the cluster. The host absence does not cause other hosts to lose quorum.

However, if a host is forcibly or unexpectedly shut down, it is not removed from the cluster before it goes offline. This host does count toward the quorum value of the cluster. Its shutdown can cause other hosts to lose quorum.

If it is necessary to shut down a host forcibly, first check how many live hosts are in the cluster. You can do this with the command corosync-quorumtool. In the command output, the number of live hosts is the value of Total votes: and the number of live hosts required to retain quorum is the value of Quorum:.

  • If the number of live hosts is the same as the number of hosts needed to remain quorate, do not forcibly shut down the host. Doing so causes the whole cluster to fence.

    Instead, attempt to recover other hosts and increase the live hosts number before forcibly shutting down the host.

  • If the number of live hosts is close to the number of hosts needed to remain quorate, you can forcibly shut down the host. However, this makes the cluster more vulnerable to fully fencing if other hosts in the pool have issues.

Always try to restart the shut down host as soon as possible to increase the resiliency of your cluster.

Use maintenance mode

Before doing something on a host that might cause that host to lose quorum, put the host into maintenance mode. When a host is in maintenance mode, running VMs are migrated off it to another host in the pool. Also, if that host was the pool master, that role is passed to a different host in the pool. If your actions cause a host in maintenance mode to self-fence, you don’t lose any VMs or lose your XenCenter connection to the pool.

Hosts in maintenance mode still count towards the quorum value for the cluster.

You can only change the IP address of a host that is part of a clustered pool when that host is in maintenance mode. Changing the IP address of a host causes the host to leave the cluster. When the IP address has been successfully changed, the host rejoins the cluster. After the host rejoins the cluster, you can take it out of maintenance mode.

Recover hosts that have self-fenced or are offline

It is important to recover hosts that have self-fenced. While these cluster members are offline, they count towards the quorum number for the cluster and decrease the number of cluster members that are contactable. This situation increases the risk of a subsequent host failure causing the cluster to lose quorum and shut down completely.

Having offline hosts in your cluster also prevents you from performing certain actions. In a clustered pool, every member of the pool must agree to every change of pool membership before the change can be successful. If a cluster member is not contactable, Citrix Hypervisor prevents operations that change cluster membership (such as host add or host remove).

Mark hosts as unrecoverable

If one or more offline hosts cannot be recovered, you can tell the clustered pool to forget them. These hosts are permanently removed from the pool. After hosts are removed from the pool, they no longer count towards the quorum value.

To mark a host as unrecoverable, use the following command:

xe host-forget uuid=<host_uuid>

Recover a forgotten host

After a clustered pool is told to forget a host, the host cannot be added back into the pool.

To rejoin the clustered pool, you must reinstall XenServer on the host so that it appears as a new host to the pool. You can then join the host to the clustered pool in the usual way.

Troubleshoot your clustered pool

If you encounter issues with your clustered pool, see Troubleshoot clustered pools.

Constraints

  • Clustered pools only support up to 16 hosts per pool.
  • To enable HA in a clustered pool, the heartbeat SR must be a GFS2 SR.
  • For cluster traffic, you must use a bonded network that uses at least two different network switches. Do not use this network for any other purposes.
  • Changing the IP address of the cluster network by using XenCenter requires clustering and GFS2 to be temporarily disabled.
  • Do not change the bonding of your clustering network while the cluster is live and has running VMs. This action can cause the cluster to fence.
  • If you have an IP address conflict (multiple hosts having the same IP address) on your clustering network involving at least one host with clustering enabled, the hosts do not fence. To fix this issue, resolve the IP address conflict.
Clustered pools