Clustered pools
Clustering provides extra features that are required for resource pools that use GFS2 SRs. For more information about GFS2, see Configure storage.
A cluster is a pool of Citrix Hypervisor hosts that are more closely connected and coordinated than hosts in non-clustered pools. The hosts in the cluster maintain constant communication with each other on a selected network. All hosts in the cluster are aware of the state of every host in the cluster. This host coordination enables the cluster to control access to the contents of the GFS2 SR.
Quorum
Each host in a cluster must always be in communication with at least half of hosts in the cluster (including itself). This state is known as a host having quorum.
The quorum value for an odd-numbered pool is half of one plus the total number of hosts in the cluster: (n+1)/2. The quorum value for an even-numbered pool is half the total number of hosts in the cluster: n/2.
For an even-numbered pool, it is possible for the running cluster to split exactly in half. The running cluster decides which half of the cluster self-fences and which half of the cluster has quorum. When an even-numbered clustered pool powers up from a cold start, (n/2)+1 hosts must be available before the hosts have quorum. After the hosts have quorum, the cluster becomes active.
If a host does not have quorum, that host self-fences.
Where possible it is recommended to use an odd number of hosts in a clustered pool as this ensures that hosts are always able to determine if they have a quorate set.
Self-fencing
If a host detects that it does not have quorum, it self-fences within a few seconds. When a host self-fences, it restarts immediately. All VMs running on the host are immediately stopped because the host does a hard shutdown. In a clustered pool that uses high availability, Citrix Hypervisor restarts the VMs according to their restart configuration on other pool members. The host that self-fenced restarts and attempts to rejoin the cluster.
If the number of live hosts in the cluster becomes less than the quorum value, all the remaining hosts lose quorum.
In an ideal scenario, your clustered pool always has more live hosts than are required for quorum and Citrix Hypervisor never fences. To make this scenario more likely, consider the following recommendations when setting up your clustered pool:
-
Ensure that you have good hardware redundancy.
-
Use a dedicated bonded network for the cluster network. Ensure that the bonded NICs are on the same L2 segment. For more information, see Networking.
-
Configure storage multipathing between the pool and the GFS2 SR. For more information, see Storage multipathing.
-
Configure high availability on the clustered pool. In clustered pools, the heartbeat SR must be a GFS2 SR. For more information, see High availability.
Create a clustered pool
Before you begin, ensure the following prerequisites are met:
-
All Citrix Hypervisor servers in the clustered pool must have at least 2 GiB of control domain memory.
-
All hosts in the cluster must use static IP addresses for the cluster network.
-
We recommend that you use clustering only in pools containing at least three hosts, as pools of two hosts are sensitive to self-fencing the entire pool.
- If you have a firewall between the hosts in your pool, ensure that hosts can communicate on the cluster network using the following ports:
- TCP: 8892, 21064
- UDP: 5404, 5405
For more information, see Communication ports used by Citrix technologies.
-
If you are clustering an existing pool, ensure that high availability is disabled. You can enable high availability again after clustering is enabled.
- We strongly recommend that you use a bonded network for your clustered pool that is not used for any other traffic.
If you prefer, you can set up clustering on your pool by using XenCenter. For more information, see the XenCenter product documentation.
To use the xe CLI to create a clustered pool:
-
Create a bonded network to use as the clustering network.
Note:
We strongly recommend that you use a dedicated bonded network for your clustered pool. Do not use this network for any other traffic.
On the Citrix Hypervisor server that you want to be the pool master, complete the following steps:
-
Open a console on the Citrix Hypervisor server.
-
Name your resource pool by using the following command:
xe pool-param-set name-label="New Pool" uuid=<pool_uuid>
-
Create a network for use with the bonded NIC by using the following command:
xe network-create name-label=bond0
The UUID of the new network is returned.
-
Find the UUIDs of the PIFs to use in the bond by using the following command:
xe pif-list
-
Create your bonded network in either active-active mode, active-passive mode, or LACP bond mode. Depending on the bond mode you want to use, complete one of the following actions:
-
To configure the bond in active-active mode (default), use the
bond-create
command to create the bond. Using commas to separate the parameters, specify the newly created network UUID and the UUIDs of the PIFs to be bonded:xe bond-create network-uuid=<network_uuid> / pif-uuids=<pif_uuid_1>,<pif_uuid_2>,<pif_uuid_3>,<pif_uuid_4>
Type two UUIDs when you are bonding two NICs and four UUIDs when you are bonding four NICs. The UUID for the bond is returned after running the command.
-
To configure the bond in active-passive or LACP bond mode, use the same syntax, add the optional
mode
parameter, and specifylacp
oractive-backup
:xe bond-create network-uuid=<network_uuid> pif-uuids=<pif_uuid_1>, / <pif_uuid_2>,<pif_uuid_3>,<pif_uuid_4> / mode=balance-slb | active-backup | lacp
-
After you have created your bonded network on the pool master, when you join other Citrix Hypervisor servers to the pool, the network and bond information is automatically replicated to the joining server.
For more information, see Networking.
-
-
Create a resource pool of at least three Citrix Hypervisor servers.
Repeat the following steps on each Citrix Hypervisor server that is a (non-master) pool member:
- Open a console on the Citrix Hypervisor server.
-
Join the Citrix Hypervisor server to the pool on the pool master by using the following command:
xe pool-join master-address=master_address master-username=administrators_username master-password=password
The value of the
master-address
parameter must be set to the fully qualified domain name of the Citrix Hypervisor server that is the pool master. Thepassword
must be the administrator password set when the pool master was installed.
For more information, see Hosts and resource pools.
-
For every PIF that belongs to this network, set
disallow-unplug=true
.-
Find the UUIDs of the PIFs that belong to the network by using the following command:
xe pif-list
-
Run the following command on a Citrix Hypervisor server in your resource pool:
xe pif-param-set disallow-unplug=true uuid=<pif_uuid>
-
-
Enable clustering on your pool. Run the following command on a Citrix Hypervisor server in your resource pool:
xe cluster-pool-create network-uuid=<network_uuid>
Provide the UUID of the bonded network that you created in an earlier step.
Destroy a clustered pool
You can destroy a clustered pool. After you destroy a clustered pool, the pool continues to exist, but is no longer clustered and can no longer use GFS2 SRs.
To destroy a clustered pool, run the following command:
xe cluster-pool-destroy cluster-uuid=<uuid>
Manage your clustered pool
When managing your clustered pool, the following practices can decrease the risk of the pool losing quorum.
Ensure that hosts are shut down cleanly
When a host is cleanly shut down, it is temporarily removed from the cluster until it is started again. While the host is shut down, it does not count toward the quorum value of the cluster. The host absence does not cause other hosts to lose quorum.
However, if a host is forcibly or unexpectedly shut down, it is not removed from the cluster before it goes offline. This host does count toward the quorum value of the cluster. Its shutdown can cause other hosts to lose quorum.
Use maintenance mode
Before doing something on a host that might cause that host to lose quorum, put the host into maintenance mode. When a host is in maintenance mode, running VMs are migrated off it to another host in the pool. Also, if that host was the pool master, that role is passed to a different host in the pool. If your actions cause a host in maintenance mode to self-fence, you don’t lose any VMs or lose your XenCenter connection to the pool.
Hosts in maintenance mode still count towards the quorum value for the cluster.
You can only change the IP address of a host that is part of a clustered pool when that host is in maintenance mode. Changing the IP address of a host causes the host to leave the cluster. When the IP address has been successfully changed, the host rejoins the cluster. After the host rejoins the cluster, you can take it out of maintenance mode.
Recover hosts that have self-fenced or are offline
It is important to recover hosts that have self-fenced. While these cluster members are offline, they count towards the quorum number for the cluster and decrease the number of cluster members that are contactable. This situation increases the risk of a subsequent host failure causing the cluster to lose quorum and shut down completely.
Having offline hosts in your cluster also prevents you from performing certain actions. In a clustered pool, every member of the pool must agree to every change of pool membership before the change can be successful. If a cluster member is not contactable, Citrix Hypervisor prevents operations that change cluster membership (such as host add or host remove).
Mark hosts as dead
If one or more offline hosts cannot be recovered, you can mark them as dead to the cluster. Marking hosts as dead removes them permanently from the cluster. After hosts are marked as dead, they no longer count towards the quorum value.
Constraints
- Clustered pools only support up to 16 hosts per pool.
- For cluster traffic, you must use a bonded network that uses at least two different network switches. Do not use this network for any other purposes.
- Changing the IP address of the cluster network by using XenCenter requires clustering and GFS2 to be temporarily disabled.
- Do not change the bonding of your clustering network while the cluster is live and has running VMs. This action can cause the cluster to fence.
- If you have an IP address conflict (multiple hosts having the same IP address) on your clustering network involving at least one host with clustering enabled, the hosts do not fence. To fix this issue, resolve the IP address conflict.