XenServer

Monitor CPU usage

The optimum number of vCPUs per pCPU on a host depends on your use case. During operation, ensure that you monitor the performance of your XenServer environment and adjust your configuration accordingly.

Terms

In this area, there are various terms that are sometimes used interchangeably. In this article, we use the following terms and meanings:

  • CPU (physical CPU): The physical hardware attached to a processor socket.
  • Core: A physical processing unit, capable of one independent thread of execution, which contains all functional units required to support that execution.
  • Hyperthread: A physical processing unit, capable of one independent thread of execution, which shares some functional units with another hyperthread (also known as its “sibling thread”).
  • Logical CPU (pCPU): A unit capable of an independent thread of execution that includes a set of registers and an instruction pointer. In a system with hyperthreads enabled, this is a hyperthread. In other cases, it’s a core.
  • Host pCPUs: The total number of logical CPUs in the host.
  • vCPU (Virtual CPU): A virtualized logical CPU. This is a logical unit capable of an independent thread of execution, provided to VMs. In XenServer, vCPUs can “time-share” pCPUs, using a scheduler to determine which vCPU is running on which pCPU at any given time.
  • Guest vCPUs: The vCPUs that are presented to a guest operating system inside a VM.
  • Dom0 vCPUs: The vCPUs that are visible to the XenServer control domain (dom0).
  • Host total vCPUs: The sum of dom0 vCPUs and all the guest vCPUs in the host.

General behavior

The total number of vCPUs on a host is the number of vCPUs used by dom0 added to the total number of vCPUs assigned to all the VMs on the host. As you increase the number of vCPUs on a host, you can experience the following types of behavior:

  • When the total number of vCPUs on the host is less than or equal to the number of pCPUs on the host, the host always provides as much CPU as is requested by the VMs.

  • When the total number of vCPUs on the host is greater than the number of pCPUs on the host, the host shares the time of the host pCPUs to the VMs. This behavior does not generally affect the VMs because their vCPUs are usually idle for some of the time and, in most cases the host does not reach 100% pCPU usage.

  • When the total number of vCPUs on the host is greater than the number of pCPUs on the host and the host is sometimes reaching 100% host pCPU usage, the vCPUs of the VMs don’t receive as much pCPU as they request during the spikes. Instead, during these spikes the VMs slow down to receive a share of the available pCPU on the host.

  • When the total number of vCPUs on the host is greater than the number of pCPUs on the host and the host is often reaching 100% host pCPU usage, the vCPUs of the VMs are continuously slowed down to receive a share of the available CPUs on the host. If the VMs have real-time requirements, this situation is not ideal and you can address it by reducing the number of vCPUs on the host.

The optimum number of vCPUs on a host can depend on the VM users’ perception of the speed of their VMs, especially when the VMs have real-time requirements.

Getting information about your CPUs

To find the total number of pCPUs on your host, run the following command:

xe host-cpu-info --minimal

To find the total number of vCPUs (guest and dom0) currently on your host, run the following command:

xl vcpu-list | grep -v VCPU | wc -l

Monitoring CPU usage with RRD metrics

XenServer provides RRD metrics that describe how the vCPUs on your VMs are performing.

When host pCPU usage is 100%

When a host is reaching 100% of host pCPU usage, use these VM metrics to decide whether to move the VM to another host:

runstate_concurrency_hazard

  • runstate_concurrency_hazard > 0% indicates that sometimes, at least one vCPU is running while at least one other vCPU wants to run but can’t get pCPU time. If the vCPUs must coordinate, this behavior causes performance issues.

  • runstate_concurrency_hazard approaching 100% is a situation to avoid.

    Suggested actions:

    If there are performance issues, take one of the following actions:

    • Decrease the number of vCPUs in the VM.
    • Move the VM to another host.
    • Decrease the total number of vCPUs on the host by migrating other VMs or decreasing their number of vCPUs.

runstate_partial_contention

  • runstate_partial_contention > 0% indicates both that at least one vCPU wants to run but can’t get pCPU time, and also that at least one other vCPU is blocked (either because there’s nothing to do or it’s waiting for I/O to complete).

  • runstate_concurrency_hazard approaching 100% is a situation to avoid.

    Suggested action:

    Check whether the back end I/O storage servers are overloaded by looking at the back-end metrics provided by your storage vendor. If the storage servers are not overloaded and there are performance issues, take one of the following actions:

    • Decrease the number of vCPUs in the VM.
    • Move the VM to another host.
    • Decrease the total number of vCPUs on the host by migrating other VMs or decreasing their number of vCPUs.

runstate_full_contention

  • runstate_full_contention > 0% indicates that sometimes the vCPUs want to run all at the same time but none can get pCPU time.

  • runstate_full_contention approaching 100% is a situation to avoid.

    Suggested actions:

    If there are performance issues, take one of the following actions:

    • Decrease the number of vCPUs in the VM.
    • Move the VM to another host.
    • Decrease the total number of vCPUs on the host by migrating other VMs or decreasing their number of vCPUs.

When host pCPU usage is less than 100%

If a host is not reaching 100% of host pCPU usage, use these VM metrics to decide whether a VM has the right number of vCPUs:

runstate_fullrun

  • runstate_fullrun = 0% indicates that the vCPUs are never being used all at the same time.

    Suggested action:

    Decrease the number of vCPUs in this VM.

  • 0% < runstate_fullrun < 100% indicates that the vCPUs are sometimes being used all at the same time.

  • runstate_fullrun = 100% indicates that the vCPUs are always being used all at the same time.

    Suggested action:

    You can increase the number of vCPUs in this VM, until runstate_fullrun < 100%. Do not increase the number of vCPUs further, otherwise it can increase the probability of concurrency hazard if the host reaches 100% of pCPU usage.

runstate_partial_run

  • runstate_partial_run = 0% indicates that either all vCPUs are always being used (full-run=100%) or no vCPUs are being used (idle=100%).

  • 0% < runstate_partial_run < 100% indicates that, sometimes, at least one vCPU is blocked, either because they have nothing to do, or because they are waiting for I/O to complete.

  • runstate_partial_run=100% indicates that there is always at least one vCPU that is blocked.

    Suggested action:

    Check whether the back-end I/O storage servers are overloaded. If they are not, the VM probably has too many vCPUs and you can decrease the number of vCPUs in this VM. Having too many vCPUs in a VM can increase the risk of the VM going into the concurrency hazard state when the host CPU usage reaches 100%.

Monitor CPU usage