XenServer

Monitor and manage your deployment

XenServer provides detailed monitoring of performance metrics. These metrics include CPU, memory, disk, network, C-state/P-state information, and storage. Where appropriate, these metrics are available on a per host and a per VM basis. These metrics are available directly, or can be accessed and viewed graphically in XenCenter or other third-party applications.

XenServer also provides system and performance alerts. Alerts are notifications that occur in response to selected system events. These notifications also occur when one of the following values goes over a specified threshold on a managed host, VM, or storage repository: CPU usage, network usage, memory usage, control domain memory usage, storage throughput, or VM disk usage. You can configure the alerts by using the xe CLI or by using XenCenter. To create notifications based on any of the available Host or VM performance metrics see Performance alerts.

Monitor XenServer performance

Customers can monitor the performance of their XenServer hosts and Virtual Machines (VMs) using the metrics exposed through Round Robin Databases (RRDs). These metrics can be queried over HTTP or through the RRD2CSV tool. In addition, XenCenter uses this data to produce system performance graphs. For more information, see Analyze and visualize metrics.

The following tables list all of the available host and VM metrics.

Notes:

  • Latency over a period is defined as the average latency of operations during that period.
  • The availability and utility of certain metrics are SR and CPU dependent.
  • Performance metrics are not available for GFS2 SRs and disks on those SRs.

Available host metrics

Metric Name Description Condition XenCenter Name
avgqu_sz_<sr-uuid-short> Average I/O queue size (requests). At least one plugged VBD in SR <sr-uuid-short> on the host sr-uuid-short Queue Size
cpu<cpu>-C<cstate> Time CPU cpu spent in C-state cstate in milliseconds. C-state exists on CPU CPU cpu C-state cstate
cpu<cpu>-P<pstate> Time CPU cpu spent in P-state pstate in milliseconds. P-state exists on CPU CPU cpu P-state pstate
cpu<cpu> Utilization of physical CPU cpu (fraction). Enabled by default. CPU cpu exists CPU cpu
cpu_avg Mean utilization of physical CPUs (fraction). Enabled by default. None Average CPU
hostload Host load per physical CPU, where load refers to the number of vCPU(s) in a running or runnable state. None Host CPU Load
inflight_<sr-uuid-short> Number of I/O requests currently in flight. Enabled by default. At least one plugged VBD in SR sr on the host sr Inflight Requests
io_throughput_read_<sr-uuidshort> Data read from SR (MiB/s). At least one plugged VBD in SR sr on the host sr Read Throughput
io_throughput_write_<sr-uuidshort> Data written to the SR (MiB/s). At least one plugged VBD in SR sr on the host sr Write Throughput
io_throughput_total_<sr-uuidshort> All SR I/O (MiB/s). At least one plugged VBD in SR sr on the host sr Total Throughput
iops_read_<sr-uuid-short> Read requests per second. At least one plugged VBD in SR sr on the host sr Read IOPS
iops_write_<sr-uuid-short> Write requests per second. At least one plugged VBD in SR sr on the host sr Write IOPS
iops_total_<sr-uuid-short> I/O requests per second. At least one plugged VBD in SR sr on the host sr Total IOPS
iowait_<sr-uuid-short> Percentage of the time waiting for I/O. At least one plugged VBD in SR sr on the host sr IO Wait
latency_<sr-uuid-short> Average I/O latency (milliseconds). At least one plugged VBD in SR sr on the host sr Latency
loadavg Domain0 load average. Enabled by default None Control Domain Load
memory_free_kib Total amount of free memory (KiB). Enabled by default. None Not present in XenCenter. Replaced by Used Memory.
Not reported by the toolstack. Calculated by XenCenter. Total amount of used memory (KiB). Enabled by default. None Used Memory
memory_reclaimed Host memory reclaimed by squeeze (B). None Reclaimed Memory
memory_reclaimed_max Host memory available to reclaim with squeeze (B). None Potential Reclaimed Memory
memory_total_kib Total amount of memory (KiB) in the host. Enabled by default. None Total Memory
network/latency Interval in seconds between the last two heartbeats transmitted from the local host to all online hosts. Disabled by default. HA Enabled Network Latency
statefile/<vdi_uuid>/latency Turn-around time in seconds of the latest State-File access from the local host. Disabled by default. HA Enabled HA State File Latency
pif_<pif>_rx Bytes per second received on physical interface pif. Enabled by default. PIF exists XenCenter-pifname Receive (see note)
pif_<pif>_tx Bytes per second sent on physical interface pif. Enabled by default. PIF exists XenCenter-pifname Send (see note)
pif_<pif>_rx_errors Receive errors per second on physical interface pif. Disabled by default. PIF exists XenCenter-pifname Receive Errors (see note)
pif_<pif>_tx_errors Transmit errors per second on physical interface pif . Disabled by default PIF exists XenCenter-pifname Send Errors (see note)
pif_aggr_rx Bytes per second received on all physical interfaces. Enabled by default. None Total NIC Receive
pif_aggr_tx Bytes per second sent on all physical interfaces. Enabled by default. None Total NIC Send
pvsaccelerator_evicted Bytes per second evicted from the cache PVSAccelerator Enabled PVS-Accelerator eviction rate
pvsaccelerator_read_hits Reads per second served from the cache PVSAccelerator Enabled PVS-Accelerator hit rate
pvsaccelerator_read_misses Reads per second that cannot be served from the cache PVSAccelerator Enabled PVS-Accelerator miss rate
pvsaccelerator_traffic_client_sent Bytes per second sent by cached PVS clients PVSAccelerator Enabled PVS-Accelerator observed network traffic from clients
pvsaccelerator_traffic_server_sent Bytes per second sent by cached PVS servers PVSAccelerator Enabled PVS-Accelerator observed network traffic from servers
pvsaccelerator_read_total Reads per second observed by the cache PVSAccelerator Enabled PVS-Accelerator observed read rate
pvsaccelerator_traffic_proxy_saved Bytes per second sent by PVSAccelerator instead of the PVS server PVSAccelerator Enabled PVS-Accelerator saved network traffic
pvsaccelerator_space_utilization Percentage of space used by PVSAccelerator on this host, compared to the total size of the cache storage PVSAccelerator Enabled PVS-Accelerator space utilization
sr_<sr>_cache_size Size in bytes of the IntelliCache SR. Enabled by default. IntelliCache Enabled IntelliCache Cache Size
sr_<sr>_cache_hits Cache hits per second. Enabled by default. IntelliCache Enabled IntelliCache Cache Hits
sr_<sr>_cache_misses Cache misses per second. Enabled by default. IntelliCache Enabled IntelliCache Cache Misses
xapi_allocation_kib Memory (KiB) allocation done by the XAPI daemon. Enabled by default. None Agent Memory Allocation
xapi_free_memory_kib Free memory (KiB) available to the XAPI daemon. Enabled by default. None Agent Memory Free
xapi_healthcheck/latency Turn-around time in seconds of the latest XAPI status monitoring call on the local host. Disabled by default. High availability Enabled XenServer High Availability Latency
xapi_live_memory_kib Live memory (KiB) used by XAPI daemon. Enabled by default. None Agent Memory Live
xapi_memory_usage_kib Total memory (KiB) allocated used by XAPI daemon. Enabled by default. None Agent Memory Usage

Available VM metrics

Metric Name Description Condition XenCenter Name
cpu<cpu> Utilization of vCPU cpu (fraction). Enabled by default vCPU cpu exists CPU
memory Memory currently allocated to VM (Bytes).Enabled by default None Total Memory
memory_target Target of VM balloon driver (Bytes). Enabled by default None Memory target
memory_internal_free Memory used as reported by the guest agent (KiB). Enabled by default None Free Memory
runstate_fullrun Fraction of time that all vCPUs are running. None vCPUs full run
runstate_full_contention Fraction of time that all vCPUs are runnable (that is, waiting for CPU) None vCPUs full contention
runstate_concurrency_hazard Fraction of time that some vCPUs are running and some are runnable None vCPUs concurrency hazard
runstate_blocked Fraction of time that all vCPUs are blocked or offline None vCPUs idle
runstate_partial_run Fraction of time that some vCPUs are running, and some are blocked None vCPUs partial run
runstate_partial_contention Fraction of time that some vCPUs are runnable and some are blocked None vCPUs partial contention
vbd_<vbd>_write Writes to device vbd in bytes per second. Enabled by default VBD vbd exists Disk vbd Write
vbd_<vbd>_read Reads from device vbd in bytes per second. Enabled by default. VBD vbd exists Disk vbd Read
vbd_<vbd>_write_latency Writes to device vbd in microseconds. VBD vbd exists Disk vbd Write Latency
vbd_<vbd>_read_latency Reads from device vbd in microseconds. VBD vbd exists Disk vbd Read Latency
vbd <vbd>_iops_read Read requests per second. At least one plugged VBD for non-ISO VDI on the host Disk vbd Read IOPs
vbd <vbd>_iops_write Write requests per second. At least one plugged VBD for non-ISO VDI on the host Disk vbd Write IOPS
vbd <vbd>_iops_total I/O requests per second. At least one plugged VBD for non-ISO VDI on the host Disk vbd Total IOPS
vbd <vbd>_iowait Percentage of time waiting for I/0. At least one plugged VBD for non-ISO VDI on the host Disk vbd IO Wait
vbd <vbd>_inflight Number of I/O requests currently in flight. At least one plugged VBD for non-ISO VDI on the host Disk vbd Inflight Requests
vbd <vbd>_avgqu_sz Average I/O queue size. At least one plugged VBD for non-ISO VDI on the host Disk vbd Queue Size
vif_<vif>_rx Bytes per second received on virtual interface number vif. Enabled by default. VIF vif exists vif Receive
vif_<vif>_tx Bytes per second transmitted on virtual interface vif. Enabled by default. VIF vif exists vif Send
vif_<vif>_rx_errors Receive errors per second on virtual interface vif. Enabled by default. VIF vif exists vif Receive Errors
vif_<vif>_tx_errors Transmit errors per second on virtual interface vif Enabled by default. VIF vif exists vif Send Errors

Note:

The value of <XenCenter-pif-name> can be any of the following:

  • NIC <pif> - if <pif> contains pif_eth#, where ## is 0–9
  • <pif> - if <pif> contains pif_eth#.## or pif_xenbr## or pif_bond##
  • <Internal> Network <pif> - if <pif> contains pif_xapi##, (note that <Internal> appears as is)
  • TAP <tap> - if <pif> contains pif_tap##
  • xapi Loopback - if <pif> contains pif_lo

Analyze and visualize metrics

The Performance tab in XenCenter provides real time monitoring of performance statistics across resource pools in addition to graphical trending of virtual and physical machine performance. Graphs showing CPU, memory, network, and disk I/O are included on the Performance tab by default. You can add more metrics, change the appearance of the existing graphs or create extra ones. For more information, see Configuring metrics in the following section.

  • You can view up to 12 months of performance data and zoom in to take a closer look at activity spikes.

  • XenCenter can generate performance alerts when CPU, memory, network I/O, storage I/O, or disk I/O usage exceed a specified threshold on a host, VM, or SR. For more information, see Alerts in the following section.

Note:

Install the XenServer VM Tools to see full VM performance data.

Configure performance graphs

To add a graph:

  1. On the Performance tab, click Actions and then New Graph. The New Graph dialog box is displayed.

  2. In the Name field, enter a name for the graph.

  3. From the list of Datasources, select the check boxes for the datasources you want to include in the graph.

  4. Click Save.

To edit an existing graph:

  1. Navigate to the Performance tab, and select the graph that you would like to modify.

  2. Right-click on the graph and select Actions, or click the Actions button. Then select Edit Graph.

  3. On the graph details window, make the necessary changes, and click OK.

Configure the graph type

Data on the performance graphs can be displayed as lines or as areas. To change the graph type:

  1. On the Tools menu, click Options and select Graphs.

  2. To view performance data as a line graph, click the Line graph option.

  3. To view performance data as an area graph, click the Area graph option.

  4. Click OK to save your changes.

Comprehensive details for configuring and viewing XenCenter performance graphs can be found in the XenCenter documentation in the section Monitoring System Performance.

Configure metrics

Note:

C-states and P-states are power management features of some processors. The range of states available depends on the physical capabilities of the host, as well power management configuration.

Both host and VM commands return the following:

  • A full description of the data source

  • The units applied to the metric

  • The range of possible values that may be used

For example:

    name_label: cpu0-C1
    name_description: Proportion of time CPU 0 spent in C-state 1
    enabled: true
    standard: true
    min: 0.000
    max: 1.000
    units: Percent
<!--NeedCopy-->

Enable a specific metric

Most metrics are enabled and collected by default, to enable those metrics that are not, enter the following:

xe host-data-source-record data-source=metric name host=hostname
<!--NeedCopy-->

Disable a specific metric

You might not want to collect certain metrics regularly. To disable a previously enabled metric, enter the following:

xe host-data-source-forget data-source=metric name host=hostname
<!--NeedCopy-->

Display a list of currently enabled host metrics

To list the host metrics currently being collected, enter the following:

xe host-data-source-list host=hostname
<!--NeedCopy-->

Display a list of currently enabled VM metrics

To host the VM metrics currently being collected, enter the following:

xe vm-data-source-list vm=vm_name
<!--NeedCopy-->

Use RRDs

XenServer uses RRDs to store performance metrics. These RRDs consist of multiple Round Robin Archives (RRAs) in a fixed size database.

Each archive in the database samples its particular metric on a specified granularity:

  • Every 5 seconds for 10 minutes
  • Every minute for the past two hours
  • Every hour for the past week
  • Every day for the past year

The sampling that takes place every five seconds records actual data points, however the following RRAs use Consolidation Functions instead. The consolidation functions supported by XenServer are:

  • AVERAGE
  • MIN
  • MAX

RRDs exist for individual VMs (including dom0) and the XenServer host. VM RRDs are stored on the host on which they run, or the pool coordinator when not running. Therefore the location of a VM must be known to retrieve the associated performance data.

For detailed information on how to use XenServer RRDs, see the XenServer Software Development Kit Guide.

Analyze RRDs using HTTP

You can download RRDs over HTTP from the XenServer host specified using the HTTP handler registered at /host_rrd or /vm_rrd. Both addresses require authentication either by HTTP authentication, or by providing a valid management API session references as a query argument. For example:

Download a Host RRD.

wget http://server/host_rrd?session_id=OpaqueRef:SESSION HANDLE>
<!--NeedCopy-->

Download a VM RRD.

wget http://server/vm_rrd?session_id=OpaqueRef:SESSION HANDLE>&uuid=VM UUID>
<!--NeedCopy-->

Both of these calls download XML in a format that can be imported into the rrdtool for analysis, or parsed directly.

Analyze RRDs using rrd2csv

In addition to viewing performance metrics in XenCenter, the rrd2csv tool logs RRDs to Comma Separated Value (CSV) format. Man and help pages are provided. To display the rrd2csv tool man or help pages, run the following command:

man rrd2csv
<!--NeedCopy-->

Or

rrd2csv --help
<!--NeedCopy-->

Note:

Where multiple options are used, supply them individually. For example: to return both the UUID and the name-label associated with a VM or a host, call rrd2csv as shown below:

rrd2csv -u -n

The UUID returned is unique and suitable as a primary key, however the name-label of an entity might not necessarily be unique.

The man page (rrd2csv --help) is the definitive help text of the tool.

Alerts

You can configure XenServer to generate alerts based on any of the available Host or VM Metrics. In addition, XenServer provides preconfigured alerts that trigger when hosts undergo certain conditions and states. You can view these alerts using XenCenter or the xe CLI.

View alerts using XenCenter

You can view different types of alerts in XenCenter by clicking Notifications and then Alerts. The Alerts view displays various types of alerts, including Performance alerts, System alerts, and Software update alerts.

Performance alerts

Performance alerts can be generated when one of the following values exceeds a specified threshold on a managed host, VM, or storage repository (SR): CPU usage, network usage, memory usage, control domain memory usage, storage throughput, or VM disk usage.

By default, the alert repeat interval is set to 60 minutes, it can be modified if necessary. Alerts are displayed on the Alerts page in the Notifications area in XenCenter. You can also configure XenCenter to send an email for any specified performance alerts along with other serious system alerts.

Any customized alerts that are configured using the xe CLI are also displayed on the Alerts page in XenCenter.

Each alert has a corresponding priority/severity level. You can modify these levels and optionally choose to receive an email when the alert is triggered. The default alert priority/severity is set at 3.

Priority Name Description Default Email Alert
1 Critical Act now or data may be permanently lost/corrupted. Yes
2 Major Act now or some services may fail. Yes
3 Warning Act now or a service may suffer. Yes
4 Minor Notice that something just improved. No
5 Information Day-to-day information (VM Start, Stop, Resume and so on) No
? Unknown Unknown error No

Configure performance alerts

  1. In the Resources pane, select the relevant host, VM, or SR, then click the General tab and then Properties.

  2. Click the Alerts tab. You can configure the following alerts:

    • CPU usage alerts for a host or VM: Check the Generate CPU usage alerts check box, then set the CPU usage and time threshold that trigger the alert

    • Network usage alerts for a host or VM: Check the Generate network usage alerts check box, then set the network usage and time threshold that trigger the alert.

    • Memory usage alerts for a host: Check the Generate memory usage alerts check box, and then set the free memory and time threshold that trigger the alert.

    • Control domain memory usage alerts for a host: Check the Generate control domain memory usage alerts check box, and then set the control domain memory usage and time threshold that trigger the alert.

    • Disk usage alerts for a VM: Check the Generate disk usage alerts check box, then set the disk usage and time threshold that trigger the alert.

    • Storage throughput alerts for an SR: Check the Generate storage throughput alerts check box, then set the storage throughput and time threshold that trigger the alert.

      Note:

      Physical Block Devices (PBD) represent the interface between a specific XenServer host and an attached SR. When the total read/write SR throughput activity on a PBD exceeds the threshold you have specified, alerts are generated on the host connected to the PBD. Unlike other XenServer host alerts, this alert must be configured on the SR.

  3. To change the alert repeat interval, enter the number of minutes in the Alert repeat interval box. When an alert threshold has been reached and an alert generated, another alert is not generated until after the alert repeat interval has elapsed.

  4. Click OK to save your changes.

For comprehensive details on how to view, filter and configure severities for performance alerts, see Configuring Performance Alerts in the XenCenter documentation.

System alerts

The following table displays the system events/conditions that trigger an alert to be displayed on the Alerts page in XenCenter.

Name Priority/Severity Description
license_expires_soon 2 XenServer License agreement expires soon.
ha-statefile_lost 2 Lost contact with the high availability Storage Repository, act soon.
ha-heartbeat_approaching_timeout 5 High availability approaching timeout, host may reboot unless action is taken.
ha_statefile_approaching_timeout 5 High availability approaching timeout, host may reboot unless action is taken.
haxapi_healthcheck_approaching_timeout 5 High availability approaching timeout, host may reboot unless action is taken.
ha_network_bonding_error 3 Potential service loss. Loss of network that sends high availability heartbeat.
ha_pool_overcommited 3 Potential service loss. High availability is unable to guarantee protection for configured VMs.
ha_poor_drop_in_plan_exists_for 3 High availability coverage has dropped, more likely to fail, no loss present yet.
ha_protected_vm_restart_failed 2 Service Loss. High availability was unable to restart a protected VM.
ha_host_failed 3 High availability detected that a host failed.
ha_host_was_fenced 4 High availability rebooted a host to protect against VM corruption.
redo_log_healthy 4 The XAPI redo log has recovered from a previous error.
redo_log_broken 3 The XAPI redo log has encountered an error.
ip_configured_pif_can_unplug 3 An IP configured NIC can be unplugged by XAPI when using high availability, possibly leading to high availability failure.
host_sync_data_failed 3 Failed to synchronize XenServer performance statistics.
host_clock_skew_detected 3 The host clock is not synchronized with other hosts in the pool.
host_clock_went_backwards 1 The host clock is corrupted.
pool_master_transition 4 A new host has been specified as pool coordinator.
pbd_plug_failed_on_server_start 3 The host failed to connect to Storage at boot time.
auth_external_init_failed 2 The host failed to enable external AD authentication.
auth_external_pool_non-homogeneous 2 Hosts in a pool have different AD authentication configuration.
multipath_period_alert 3 A path to an SR has failed or recovered.
bond-status-changed 3 A link in a bond has disconnected or reconnected.

Software update alerts

  • XenCenter old: XenServer expects a newer version but can still connect to the current version
  • XenCenter out of date: XenCenter is too old to connect to XenServer
  • XenServer out of date: XenServer is an old version that the current XenCenter cannot connect to
  • License expired alert: XenServer license has expired
  • Missing IQN alert: XenServer uses iSCSI storage but the host IQN is blank
  • Duplicate IQN alert: XenServer uses iSCSI storage, and there are duplicate host IQNs

Configure performance alerts by using the xe CLI

Note:

Triggers for alerts are checked at a minimum interval of five minutes. This interval avoids placing excessive load on the system to check for these conditions and reporting of false positives. Setting an alert repeat interval smaller than five minutes results in the alerts still being generated at the five minute minimum interval.

The performance monitoring perfmon tool runs once every five minutes and requests updates from XenServer which are averages over one minute. These defaults can be changed in /etc/sysconfig/perfmon.

The perfmon tool reads updates every five minutes of performance variables running on the same host. These variables are separated into one group relating to the host itself, and a group for each VM running on that host. For each VM and host, perfmon reads the parameter other-config:perfmon and uses this string to determine which variables to monitor, and under which circumstances to generate a message.

For example, the following shows an example of configuring a VM “CPU usage” alert by writing an XML string into the parameter other-config:perfmon:

xe vm-param-set uuid=vm_uuid other-config:perfmon=\

'<config>
    <variable>
        <name value="cpu_usage"/>
        <alarm_trigger_level value="0.5"/>
    </variable>
</config>'
<!--NeedCopy-->

Note:

You can use multiple variable nodes.

After setting the new configuration, use the following command to refresh perfmon for each host:

xe host-call-plugin host=host_uuid plugin=perfmon fn=refresh
<!--NeedCopy-->

If this refresh is not done, there is a delay before the new configuration takes effect, since by default, perfmon checks for new configuration every 30 minutes. This default can be changed in /etc/sysconfig/perfmon.

Valid VM elements

  • name: The name of the variable (no default). If the name value is either cpu_usage, network_usage, or disk_usage, the rrd_regex and alarm_trigger_sense parameters are not required as defaults for these values are used.

  • alarm_priority: The priority of the alerts generated (default 3).

  • alarm_trigger_level: The level of value that triggers an alert (no default).

  • alarm_trigger_sense: The value is high if alarm_trigger_level is a maximum value otherwise low if the alarm_trigger_level is a minimum value (the default high).

  • alarm_trigger_period: The number of seconds that values (above or below the alert threshold) can be received before an alert is sent (the default is 60).

  • alarm_auto_inhibit_period: The number of seconds this alert will be disabled after an alert is sent (the default is 3600).

  • consolidation_fn: Combines variables from rrd_updates into one value. For cpu-usage the default is average, for fs_usage the default isget_percent_fs_usage and for all others - sum.

  • rrd_regex: Matches the names of variables from xe vm-data-sources-list uuid=vm_uuid, to compute performance values. This parameter has defaults for the named variables:

    • cpu_usage
    • network_usage
    • disk_usage

If specified, the values of all items returned by xe vm-data-source-list whose names match the specified regular expression are consolidated using the method specified as the consolidation_fn.

Valid host elements

  • name: The name of the variable (no default).
  • alarm_priority: The priority of the alerts generated (default 3).
  • alarm_trigger_level: The level of value that triggers an alert (no default).
  • alarm_trigger_sense: The value is high when alarm_trigger_level is a maximum value otherwise low if the alarm_trigger_level is a minimum value. (default high)
  • alarm_trigger_period: The number of seconds that values (above or below the alert threshold) can be received before an alert is sent (default 60).
  • alarm_auto_inhibit_period: The number of seconds that the alert is disabled for after an alert is sent. (default 3600).
  • consolidation_fn: Combines variables from rrd_updates into one value (default sum - or average)
  • rrd_regex: A regular expression to match the names of variables returned by the xe vm-data-source-list uuid=vm_uuid command to use to compute the statistical value. This parameter has defaults for the following named variables:
    • cpu_usage
    • network_usage
    • memory_free_kib
    • sr_io_throughput_total_xxxxxxxx (where xxxxxxxxis the first eight characters of the SR-UUID).

SR Throughput: Storage throughput alerts must be configured on the SR rather than the host. For example:

xe sr-param-set uuid=sr_uuid other-config:perfmon=\
'<config>
    <variable>
        <name value="sr_io_throughput_total_per_host"/>
        <alarm_trigger_level value="0.01"/>
    </variable>
</config>'
<!--NeedCopy-->

Generic example configuration

The following example shows a generic configuration:

<config>
    <variable>
    <name value="NAME_CHOSEN_BY_USER"/>
    <alarm_trigger_level value="THRESHOLD_LEVEL_FOR_ALERT"/>
    <alarm_trigger_period value="RAISE_ALERT_AFTER_THIS_MANY_SECONDS_OF_BAD_VALUES"/>
    <alarm_priority value="PRIORITY_LEVEL"/>
    <alarm_trigger_sense value="HIGH_OR_LOW"/>
    <alarm_auto_inhibit_period value="MINIMUM_TIME_BETWEEN_ALERT_FROM_THIS_MONITOR"/>
    <consolidation_fn value="FUNCTION_FOR_COMBINING_VALUES"/>
    <rrd_regex value="REGULAR_EXPRESSION_TO_CHOOSE_DATASOURCE_METRIC"/>
    </variable>

    <variable>
    ...
    </variable>

    ...
</config>
<!--NeedCopy-->

Configure email alerts

You can configure XenServer to send email notifications when XenServer hosts generate alerts. The mail-alarm utility in XenServer uses sSMTP to send these email notifications. You can enable basic email alerts by using XenCenter or the xe Command Line Interface (CLI). For further configuration of email alerts, you can modify the mail-alarm.conf configuration file.

Use an SMTP server that does not require authentication. Emails sent through SMTP servers that require authentication cannot be delivered.

Enable email alerts by using XenCenter

  1. In the Resources pane, right-click on a pool and select Properties.

  2. In the Properties window, select Email Options.

  3. Select the Send email alert notifications check box. Enter your preferred destination address for the notification emails and SMTP server details.

  4. Choose your preferred language from the Mail language list. The default language for performance alert emails is English.

Enable email alerts by using the xe CLI

To configure email alerts, specify your preferred destination address for the notification emails and SMTP server:

xe pool-param-set uuid=pool_uuid other-config:mail-destination=joe.bloggs@example.com
xe pool-param-set uuid=pool_uuid other-config:ssmtp-mailhub=smtp.example.com:<port>
<!--NeedCopy-->

XenServer automatically configures the sender address as noreply@<hostname>. However, you can set the sender address explicitly:

xe pool-param-set uuid=pool_uuid other-config:mail-sender=serveralerts@example.com
<!--NeedCopy-->

When you turn on email notifications, you receive an email notification when an alert with a priority of 3 or higher is generated. Therefore, the default minimum priority level is 3. You can change this default with the following command:

xe pool-param-set uuid=pool_uuid other-config:mail-min-priority=level
<!--NeedCopy-->

Note:

Some SMTP servers only forward mails with addresses that use FQDNs. If you find that emails are not being forwarded it might be for this reason. In which case, you can set the server host name to the FQDN so this address is used when connecting to your mail server.

To configure the language for the performance alert emails:

xe pool-param-set uuid=pool_uuid other-config:mail-language=ja-JP
<!--NeedCopy-->

The default language for performance alert emails is English.

Further configuration

To further configure the mail-alarm utility in XenServer, create an /etc/mail-alarm.conf file containing the following:

root=postmaster
authUser=<username>
authPass=<password>
mailhub=@MAILHUB@
<!--NeedCopy-->

/etc/mail-alarm.conf is a user-supplied template for sSMTP’s configuration file ssmtp.conf and is used for all alerts generated by XenServer hosts. It consists of keys where key=@KEY@ and @KEY@ is replaced by the corresponding value of ssmtp-key in pool.other_config. These values are then passed to ssmtp, allowing you to control aspects of the sSMTP configuration using values from pool.other_config. Note how @KEY@ (uppercase) corresponds to ssmtp-key (lowercase, prefixed by ssmtp-).

For example, if you set the SMTP server:

xe pool-param-set uuid=pool_uuid other-config:ssmtp-mailhub=smtp.example.com
<!--NeedCopy-->

and then add the following to your /etc/mail-alarm.conf file:

mailhub=@MAILHUB@
<!--NeedCopy-->

mailhub=@MAILHUB@ becomes mailhub=smtp.example.com.

Each SMTP server can differ slightly in its setup and may require extra configuration. To further configure sSMTP, modify its configuration file ssmtp.conf. By storing relevant keys in the mail-alarm.conf file, you can use the values in pool.other_config to configure sSMTP. The following extract from the ssmtp.conf man page shows the correct syntax and available options:

NAME
    ssmtp.conf – ssmtp configuration file

DESCRIPTION
    ssmtp reads configuration data from /etc/ssmtp/ssmtp.conf The file con-
    tains keyword-argument pairs, one per line. Lines starting with '#'
    and empty lines are interpreted as comments.

The possible keywords and their meanings are as follows (both are case-
insensitive):

    Root
    The user that gets all mail for userids less than 1000. If blank,
    address rewriting is disabled.

    Mailhub
        The host to send mail to, in the form host | IP_addr port :
        <port>. The default port is 25.

    RewriteDomain
    The domain from which mail seems to come. For user authentication.

    Hostname
        The full qualified name of the host. If not specified, the host
        is queried for its hostname.

    FromLineOverride
        Specifies whether the From header of an email, if any, may over-
        ride the default domain. The default is "no".

    UseTLS
    Specifies whether ssmtp uses TLS to talk to the SMTP server.
    The default is "no".

    UseSTARTTLS
        Specifies whether ssmtp does a EHLO/STARTTLS before starting TLS
        negotiation. See RFC 2487.

    TLSCert
        The file name of an RSA certificate to use for TLS, if required.

    AuthUser
        The user name to use for SMTP AUTH. The default is blank, in
        which case SMTP AUTH is not used.

    AuthPass
        The password to use for SMTP AUTH.

    AuthMethod
        The authorization method to use. If unset, plain text is used.
        May also be set to "cram-md5".
<!--NeedCopy-->

Custom fields and tags

XenCenter supports the creation of tags and custom fields, which allows for organization and quick searching of VMs, storage and so on. For more information, see Monitoring System Performance.

Custom searches

XenCenter supports the creation of customized searches. Searches can be exported and imported, and the results of a search can be displayed in the navigation pane. For more information, see Monitoring System Performance.

Determine throughput of physical bus adapters

For FC, SAS and iSCSI HBAs you can determine the network throughput of your PBDs using the following procedure.

  1. List the PBDs on a host.
  2. Determine which LUNs are routed over which PBDs.
  3. For each PBD and SR, list the VBDs that reference VDIs on the SR.
  4. For all active VBDs that are attached to VMs on the host, calculate the combined throughput.

For iSCSI and NFS storage, check your network statistics to determine if there is a throughput bottleneck at the array, or whether the PBD is saturated.

Monitor host and dom0 resources with NRPE

Users with the Pool Admin role can use any third-party monitoring tool that supports the Nagios Remote Plugin Executor (NRPE) to monitor resources consumed by your XenServer host and dom0 - the control domain of your host. For more information about dom0, see Memory usage.

You can use the following check plugins to monitor host and dom0 resources:

Metric NRPE check name Description Default warning threshold Default critical threshold Performance data returned
Host CPU Load check_host_load Gets and checks the current load per physical CPU of the host, where load refers to the number of vCPU(s) in a running or runnable state. 3 4 Current system load of the CPU of the host (calculated by taking the average load of the physical CPU of the host).
Host CPU Usage (%) check_host_cpu Gets and checks the current average overall CPU usage of the host. 80% 90% The percentage of host CPU that is currently free and the percentage that is in use.
Host Memory Usage (%) check_host_memory Gets and checks the current memory usage of the host. 80% 90% The percentage of host memory that is currently free and the percentage that is in use.
Host vGPU Usage (%) check_vgpu Gets and checks all the current running Nvidia vGPU usage of the host. 80% 90% The percentage of running vGPU that is currently free and the percentage that is in use.
Host vGPU Memory Usage (%) check_vgpu_memory Gets and checks all the current running Nvidia vGPU memory usage (including the shared memory and graphic memory) of the host. 80% 90% The percentage of running vGPU memory (including the shared memory and graphic memory) that is currently free and the percentage that is in use.
Dom0 CPU Load check_load Gets and checks the current system load average per CPU of dom0, where load refers to the number of processes in a running or runnable state. 2.7,2.6,2.5 3.2,3.1,3 Host CPU load data calculated by taking the average of the last 1, 5, and 15 minutes.
Dom0 CPU Usage (%) check_cpu Gets and checks the current average overall CPU usage of dom0. 80% 90% The average overall CPU usage of dom0 as a percentage.
Dom0 Memory Usage (%) check_memory Gets and checks the current memory usage of dom0. 80% 90% The percentage of dom0 memory that is currently free and the percentage that is in use.
Dom0 Free Swap (%) check_swap Gets and checks the current swap usage of dom0. 20% 10% The percentage of MB on dom0 that is currently free.
Dom0 Root Partition Free Space (%) check_disk_root Gets and checks the current root partition usage of dom0. 20% 10% The percentage of MB on the dom0 root partition that is currently free.
Dom0 Log Partition Free Space (%) check_disk_log Gets and checks the current log partition usage of dom0. 20% 10% The percentage of MB on the dom0 log partition that is currently free.
Toolstack Status check_xapi Gets and checks the status of the XenServer management toolstack (also known as XAPI).     XAPI elapsed uptime in seconds.
Multipath Status check_multipath Gets and checks the status of the storage paths.     The status of the storage paths. OK indicates that all paths are active, WARNING indicates that some paths have failed but more than one path is active, CRITICAL indicates that there is only one path active or that all paths have failed, UNKNOWN indicates that host multipathing is disabled and that the status of the paths cannot be fetched.

NRPE is an on-premises service that runs in dom0 and listens on TCP port (default) 5666 for check execution requests from a monitoring tool. After a request arrives, NRPE parses it, finds the corresponding check command including the parameter’s details from the configuration file, and then runs it. The result of the check is sent to the monitoring tool, which stores the results of past checks and provides a graph showing the historical performance data.

Prerequisites

To be able to use NRPE to monitor host and dom0 resources, the monitoring tool you are using must meet the following prerequisites:

  • The monitoring tool must be compatible with NRPE version 4.1.0.
  • To allow communication between NRPE and the monitoring tool, the monitoring tool must support TLS 1.2 with ciphers ECDHE-RSA-AES256-GCM-SHA384 and ECDHE-RSA-AES128-GCM-SHA256, and the EC curve is secp384r1.

Constraints

  • You can configure NRPE settings for an entire pool or for a standalone host that is not part of a pool. Currently, you cannot configure NRPE settings for an individual host in a pool.
  • If you add a host to a pool that already has NRPE enabled and configured on it, XenCenter does not automatically apply the pool’s NRPE settings to the new host. You must reconfigure NRPE settings on the pool after adding the new host or configure the new host with same NRPE settings before adding it to the pool.

    Note:

    When reconfiguring NRPE settings on a pool after adding a new host, ensure the host is up and running.

  • If a host is removed from a pool with NRPE enabled and configured on it, XenCenter does not alter the NPRE settings on the host or the pool.

Configure NRPE by using the xe CLI

You can configure NRPE by using the xe CLI or XenCenter. For more information on how to configure NRPE by using XenCenter, see Monitoring host and dom0 resources with NRPE.

After making configuration changes to NRPE, restart the NRPE service by using:

xe host-call-plugin host-uuid=<host uuid> plugin=nrpe fn=restart
<!--NeedCopy-->

Enable NRPE

NRPE is disabled by default in XenServer. To enable NRPE on a host’s control domain (dom0), run the following commands in the xe CLI:

  1. Get the host UUID of the host that you want to monitor:

    xe host-list

  2. Enable NRPE on the host:

    xe host-call-plugin host-uuid=<host uuid> plugin=nrpe fn=enable

    If the operation runs successfully, this command outputs Success. When XenServer restarts, NRPE starts automatically.

To stop, start, restart, or disable NRPE:

xe host-call-plugin host-uuid=<host uuid> plugin=nrpe fn=<operation>
<!--NeedCopy-->

where operation is stop, start, restart, or disable.

Monitoring servers

This is a comma-delimited list of IP addresses or host names that are allowed to talk to the NRPE daemon. Network addresses with a bit mask (for example 192.168.1.0/24) are also supported.

View the current list of monitoring servers:

xe host-call-plugin host-uuid=<host uuid> plugin=nrpe fn=get-config args:allowed_hosts
<!--NeedCopy-->

Allow the monitoring tool to execute checks:

xe host-call-plugin host-uuid=<host uuid> plugin=nrpe fn=set-config args:allowed_hosts=<IP address or hostname>
<!--NeedCopy-->

Query all NRPE settings:

xe host-call-plugin host-uuid=<host uuid> plugin=nrpe fn=get-config
<!--NeedCopy-->

Configure multiple NRPE settings:

xe host-call-plugin host-uuid=<host uuid> plugin=nrpe fn=set-config args:allowed_hosts=<IP address or hostname> args:ssl_logging=<SSL log level> args:debug=<debug log level>
<!--NeedCopy-->

Logs

Debug logging

By default, debug logging is disabled.

To check whether debug logging is enabled, run the following command:

xe host-call-plugin host-uuid=<host uuid> plugin=nrpe fn=get-config args:debug
<!--NeedCopy-->

If debug: 0 is returned, debug logging is disabled.

To enable debug logging:

xe host-call-plugin host-uuid=<host uuid> plugin=nrpe fn=set-config args:debug=1
<!--NeedCopy-->
SSL logging

By default, SSL logging is disabled:

ssl_logging=0x00
<!--NeedCopy-->

To check whether SSL logging is enabled, run the following command:

xe host-call-plugin host-uuid=<host uuid> plugin=nrpe fn=get-config args:ssl_logging
<!--NeedCopy-->

To enable SSL logging:

xe host-call-plugin host-uuid=<host uuid> plugin=nrpe fn=set-config args:ssl_logging=0x2f
<!--NeedCopy-->

Warning and critical thresholds

For some of these check plugins, you can set warning and critical threshold values so that if the value returned by a check plugin exceeds the threshold values, an alert is generated. The warning threshold indicates a potential issue and the critical threshold indicates a more serious issue that requires immediate attention. Although default values are set for the warning and critical thresholds, you can adjust the threshold values.

To query the default warning and critical threshold values for all the checks, run the following xe CLI command which returns a list of all the checks and their associated warning and critical thresholds:

xe host-call-plugin host-uuid=<host uuid> plugin=nrpe fn=get-threshold
<!--NeedCopy-->

You can also query the threshold values for a specific check. For example, to get the warning and critical threshold values for the check_memory check plugin, run the following xe CLI command:

xe host-call-plugin host-uuid=<host uuid> plugin=nrpe fn=get-threshold args:check_memory
<!--NeedCopy-->

You can also change the default value of a threshold. For example, to change the default threshold values for the check_memory check plugin, run the following xe CLI command:

xe host-call-plugin host-uuid=<host uuid> plugin=nrpe fn=set-threshold args:check_memory args:w=75 args:c=85
<!--NeedCopy-->
Monitor and manage your deployment