XenServer

GFS2 clustered pools

GFS2 clustering provides extra features that are required for resource pools that use GFS2 SRs. For more information about GFS2, see Configure storage.

A GFS2 cluster is a pool of up to 16 XenServer® hosts that are more closely connected and coordinated than hosts in non-clustered pools. The hosts in the GFS2 cluster maintain constant communication with each other on a selected network. All hosts in the GFS2 cluster are aware of the state of every host in the GFS2 cluster. This host coordination enables the GFS2 cluster to control access to the contents of the GFS2 SR.

Note:

The GFS2 clustering feature only benefits pools that contain a GFS2 SR. If your pool does not contain a GFS2 SR, do not enable GFS2 clustering in your pool.

Quorum

Each host in a GFS2 cluster must always be in communication with the majority of hosts in the GFS2 cluster (including itself). This state is known as a host having quorum. If a host does not have quorum, that host self-fences.

The number of hosts that must be in communication to initially achieve quorum can be different to the number of hosts a GFS2 cluster requires to keep quorum.

The following table summarizes this behavior. The value of n is the total number of hosts in the GFS2 clustered pool.

  Number of hosts required to achieve quorum Number of hosts required to remain quorate
Odd number of hosts in the pool (n+1)/2 (n+1)/2
Even number of hosts in the pool (n/2)+1 n/2

For a GFS2 clustered pool, you can check if the pool has quorum by querying the is-quorate parameter of the GFS2 cluster:

xe cluster-list params=is-quorate uuid=<cluster_id>

To see how many hosts in the GFS2 cluster are live, run the following command:

xe cluster-list params=live-hosts uuid=<cluster_id>

To see how many live hosts are required for the GFS2 cluster to achieve quorum, run the following command:

xe cluster-list params=quorum uuid=<cluster_id>

When the GFS2 cluster is created, the number of live hosts must be greater than or equal to this value. To retain quorum, the required number of hosts can be different to the value returned by this command depending on whether the GFS2 cluster contains an odd or even number of hosts.

Odd-numbered pools

To achieve the quorum value for an odd-numbered pool you require half of one more than the total number of hosts in the GFS2 cluster: (n+1)/2. This is also the minimum number of hosts that must remain contactable for the pool to remain quorate.

For example, in a 5-host GFS2 clustered pool, 3 hosts must be contactable for the GFS2 cluster to both become active and remain quorate [(5+1)/2 = 3].

Where possible it is recommended to use an odd number of hosts in a GFS2 clustered pool as this ensures that hosts are always able to determine if they have a quorate set.

Even-numbered pools

When an even-numbered GFS2 clustered pool powers up from a cold start, (n/2)+1 hosts must be available before the hosts have quorum. After the hosts have quorum, the GFS2 cluster becomes active.

However, an active even-numbered pool can remain quorate if the number of contactable hosts is at least n/2. As a result, it is possible for a running GFS2 cluster with an even number of hosts to split exactly in half. The running GFS2 cluster decides which half of the GFS2 cluster self-fences and which half of the GFS2 cluster has quorum. The half of the GFS2 cluster that contains the node with the lowest ID that was seen as active before the GFS2 cluster split remains active and the other half of the GFS2 cluster self-fences.

For example, in a 4-host GFS2 clustered pool, 3 hosts must be contactable for the GFS2 cluster to become active [4/2 + 1 = 3]. After the GFS2 cluster is active, to remain quorate, only 2 hosts must be contactable [4/2 = 2] and that set of hosts must include the host with the lowest node ID known to be active.

Self-fencing

If a host detects that it does not have quorum, it self-fences within a few seconds. When a host self-fences, it restarts immediately. All VMs running on the host are immediately stopped because the host does a hard shutdown. In a GFS2 clustered pool that uses high availability, XenServer restarts the VMs according to their restart configuration on other pool members. The host that self-fenced restarts and attempts to rejoin the GFS2 cluster.

If the number of live hosts in the GFS2 cluster becomes less than the quorum value, all the remaining hosts lose quorum.

In an ideal scenario, your GFS2 clustered pool always has more live hosts than are required for quorum and XenServer never fences. To make this scenario more likely, consider the following recommendations when setting up your GFS2 clustered pool:

  • Ensure that you have good hardware redundancy.

  • Use a dedicated bonded network for the GFS2 cluster network. Ensure that the bonded NICs are on the same L2 segment. For more information, see Networking.

  • Configure storage multipathing between the pool and the GFS2 SR. For more information, see Storage multipathing.

Create a GFS2 clustered pool

Before you begin, ensure the following prerequisites are met:

  • All XenServer hosts in the GFS2 clustered pool must have at least 2 GiB of control domain memory.

    Depending on your environment, your hosts might require more control domain memory than this. If you have insufficient control domain memory on your hosts, your pool can experience network instability. Network instability can cause problems for a GFS2 clustered pool with GFS2 SRs. For information about changing the amount of control domain memory and monitoring the memory behavior, see Memory usage.

  • All hosts in the GFS2 cluster must use static IP addresses for the GFS2 cluster network.

  • We recommend that you use GFS2 clustering only in pools containing at least three hosts, as pools of two hosts are sensitive to self-fencing the entire pool.

  • GFS2 clustered pools only support up to 16 hosts per pool.

  • If you have a firewall between the hosts in your pool, ensure that hosts can communicate on the GFS2 cluster network using the following ports:
    • TCP: 8892, 8896, 21064
    • UDP: 5404, 5405

    For more information, see Communication ports used by XenServer.

  • If you are adding GFS2 clustering to an existing pool, ensure that high availability is disabled. You can enable high availability again after GFS2 clustering is enabled.

  • We strongly recommend that you use a bonded network for your GFS2 clustered pool that is not used for any other traffic.

If you prefer, you can set up GFS2 clustering on your pool by using XenCenter. For more information, see the XenCenter product documentation.

To use the xe CLI to create a GFS2 clustered pool:

  1. Create a bonded network to use as the GFS2 clustering network.

    Note:

    We strongly recommend that you use a dedicated bonded network for your GFS2 clustered pool. Do not use this network for any other traffic.

    On the XenServer host that you want to be the pool coordinator, complete the following steps:

    1. Open a console on the XenServer host.

    2. Create a network for use with the bonded NIC by using the following command:

      xe network-create name-label=bond0
      <!--NeedCopy-->
      

      The UUID of the new network is returned.

    3. Find the UUIDs of the PIFs to use in the bond by using the following command:

      xe pif-list
      <!--NeedCopy-->
      
    4. Create your bonded network in either active-active mode, active-passive mode, or LACP bond mode. Depending on the bond mode you want to use, complete one of the following actions:

      • To configure the bond in active-active mode (default), use the bond-create command to create the bond. Using commas to separate the parameters, specify the newly created network UUID and the UUIDs of the PIFs to be bonded:

         xe bond-create network-uuid=<network_uuid> /
              pif-uuids=<pif_uuid_1>,<pif_uuid_2>,<pif_uuid_3>,<pif_uuid_4>
         <!--NeedCopy-->
        

        Type two UUIDs when you are bonding two NICs and four UUIDs when you are bonding four NICs. The UUID for the bond is returned after running the command.

      • To configure the bond in active-passive or LACP bond mode, use the same syntax, add the optional mode parameter, and specify lacp or active-backup:

         xe bond-create network-uuid=<network_uuid> pif-uuids=<pif_uuid_1>, /
              <pif_uuid_2>,<pif_uuid_3>,<pif_uuid_4> /
              mode=balance-slb | active-backup | lacp
         <!--NeedCopy-->
        

    After you have created your bonded network on the pool coordinator, when you join other XenServer hosts to the pool, the network and bond information is automatically replicated to the joining server.

    For more information, see Networking.

  2. Create a resource pool of at least three XenServer hosts.

    Repeat the following steps on each XenServer host that is a (non-master) pool member:

    1. Open a console on the XenServer host.
    2. Join the XenServer host to the pool on the pool coordinator by using the following command:

      xe pool-join master-address=master_address master-username=administrators_username master-password=password
      <!--NeedCopy-->
      

      The value of the master-address parameter must be set to the fully qualified domain name of the XenServer host that is the pool coordinator. The password must be the administrator password set when the pool coordinator was installed.

    For more information, see Hosts and resource pools.

  3. For every PIF that belongs to this network, set disallow-unplug=true.

    1. Find the UUIDs of the PIFs that belong to the network by using the following command:

      xe pif-list
      <!--NeedCopy-->
      
    2. Run the following command on a XenServer host in your resource pool:

      xe pif-param-set disallow-unplug=true uuid=<pif_uuid>
      <!--NeedCopy-->
      
  4. Enable GFS2 clustering on your pool. Run the following command on a XenServer host in your resource pool:

    xe cluster-pool-create network-uuid=<network_uuid>
    <!--NeedCopy-->
    

    Provide the UUID of the bonded network that you created in an earlier step.

Disable GFS2 clustering

You can disable GFS2 clustering. After you disable GFS2 clustering, the pool continues to exist, but is no longer GFS2 clustered and can no longer use GFS2 SRs.

To disable GFS2 clustering, run the following command:

xe cluster-pool-destroy cluster-uuid=<uuid>

Manage your GFS2 clustered pool

When managing your GFS2 clustered pool, the following practices can decrease the risk of the pool losing quorum.

Add or remove a host on a GFS2 clustered pool

When adding or removing a host on a GFS2 clustered pool, ensure that all the hosts in the GFS2 cluster are online.

You can add or remove a host on a GFS2 clustered pool by using XenCenter. For more information, see Add a Server to a Pool and Remove a Server From a Pool.

You can also add or remove a host on a GFS2 clustered pool by using the xe CLI. For more information, see Add a host to a pool by using the xe CLI and Remove XenServer hosts from a resource pool.

Ensure that hosts are shut down cleanly

When a host is cleanly shut down, it is temporarily removed from the GFS2 cluster until it is started again. While the host is shut down, it does not count toward the quorum of the GFS2 cluster. The host’s absence does not cause other hosts to lose quorum. For more information, see Shut down a XenServer host.

However, if a host is forcibly or unexpectedly shut down, it is not removed from the GFS2 cluster before it goes offline. This host does count toward the quorum value of the GFS2 cluster. Its shutdown can cause other hosts to lose quorum.

If it is necessary to shut down a host forcibly, first check how many live hosts are in the GFS2 cluster. You can do this with the command corosync-quorumtool. In the command output, the number of live hosts is the value of Total votes: and the number of live hosts required to retain quorum is the value of Quorum:.

  • If the number of live hosts is the same as the number of hosts needed to remain quorate, do not forcibly shut down the host. Doing so causes the whole GFS2 cluster to fence.

    Instead, attempt to recover other hosts and increase the live hosts number before forcibly shutting down the host.

  • If the number of live hosts is close to the number of hosts needed to remain quorate, you can forcibly shut down the host. However, this makes the GFS2 cluster more vulnerable to fully fencing if other hosts in the pool have issues.

Always try to restart the shut down host as soon as possible to increase the resiliency of your GFS2 cluster.

Use maintenance mode

Before doing something on a host that might cause that host to lose quorum, put the host into maintenance mode. When a host is in maintenance mode, running VMs are migrated off it to another host in the pool. Also, if that host was the pool coordinator, that role is passed to a different host in the pool. If your actions cause a host in maintenance mode to self-fence, you don’t lose any VMs or lose your XenCenter® connection to the pool.

Hosts in maintenance mode still count towards the quorum value for the GFS2 cluster.

You can only change the IP address of a host that is part of a GFS2 clustered pool when that host is in maintenance mode. Changing the IP address of a host causes the host to leave the GFS2 cluster. When the IP address has been successfully changed, the host rejoins the GFS2 cluster. After the host rejoins the GFS2 cluster, you can take it out of maintenance mode.

Recover hosts that have self-fenced or are offline

It is important to recover hosts that have self-fenced. While these GFS2 cluster members are offline, they count towards the quorum number for the GFS2 cluster and decrease the number of GFS2 cluster members that are contactable. This situation increases the risk of a subsequent host failure causing the GFS2 cluster to lose quorum and shut down completely.

Having offline hosts in your GFS2 cluster also prevents you from performing certain actions. In a GFS2 clustered pool, every member of the pool must agree to every change of pool membership before the change can be successful. If a GFS2 cluster member is not contactable, XenServer prevents operations that change GFS2 cluster membership (such as host add or host remove).

Mark hosts as unrecoverable

If one or more offline hosts cannot be recovered, you can tell the GFS2 clustered pool to forget them. These hosts are permanently removed from the pool. After hosts are removed from the GFS2 clustered pool, they no longer count towards the quorum value.

To mark a host as unrecoverable, use the following command:

xe host-forget uuid=<host_uuid>

Recover a forgotten host

After a GFS2 clustered pool is told to forget a host, the host cannot be added back into the pool.

To rejoin the GFS2 clustered pool, you must reinstall XenServer on the host so that it appears as a new host to the pool. You can then join the host to the GFS2 clustered pool in the usual way.

Troubleshoot your GFS2 clustered pool

If you encounter issues with your GFS2 clustered pool, see Troubleshoot GFS2 clustered pools.

Constraints

  • GFS2 clustered pools only support up to 16 hosts per pool.
  • To enable HA on your GFS2 clustered pool, the heartbeat SR must be a GFS2 SR.
  • For GFS2 cluster traffic, we strongly recommend that you use a bonded network that uses at least two different network switches. Do not use this network for any other purposes.
  • Changing the IP address of the GFS2 cluster network by using XenCenter requires GFS2 clustering and all GFS2 SRs to be temporarily disabled.
  • Do not change the bonding of your GFS2 clustering network while the GFS2 cluster is live and has running VMs. This action can cause hosts in the GFS2 cluster to hard restart (fence).
  • If you have an IP address conflict (multiple hosts having the same IP address) on your GFS2 clustering network involving at least one host with GFS2 clustering enabled, the GFS2 cluster does not form correctly and the hosts are unable to fence when required. To fix this issue, resolve the IP address conflict.
GFS2 clustered pools