6 min read

(For more resources related to this topic, see here.)

Now that we have learned about the various tools we can use to troubleshoot vSphere Storage and tackled the common issues that appear when we are trying to connect to our datastores, it’s time for us to look at another type of issue with storage: contention. Storage contention is one of the most common causes of problems inside a virtual environment and is almost always the cause of slowness and performance issues.

One of the biggest benefits of virtualization is consolidation: the ability to take multiple workloads and run them on a smaller number of systems, clustered with shared resources, and with one management interface. That said, as soon as we begin to share these resources, contention is sure to occur. This article will help with some of the common issues we face pertaining to storage contention and performance.

Identifying storage contention and performance issues

One of the biggest causes of poor storage performance is quite often the result of high I/O latency values. Latency in its simplest definition is a measure of how long it takes for a single I/O request to occur from the standpoint of your virtualized applications. As we will find out later, vSphere further breaks the latency values down into more detailed and precise values based on individual components of the stack in order to aid us with troubleshooting.

But is storage latency always a bad thing? The answer to that is “it depends”. Obviously, a high latency value is one of the least desirable metrics in terms of storage devices, but in terms of applications, it really depends on the type of workload we are running. Heavily utilized databases, for instance, are usually very sensitive when it comes to latency, often requiring very low latency values before exhibiting timeouts and degradation of performance.

There are however other applications, usually requiring throughput, which will not be as sensitive to latency and have a higher latency threshold. In all cases, we as vSphere administrators will always want to do our best to minimize storage latency and should be able to quickly identify issues related to latency.

As a vSphere administrator, we need to be able to monitor latency in our vSphere environment. This is where esxtop can be our number one tool. We will focus on three counters: DAVG/cmd, KAVG/cmd, and GAVG/cmd, all of which are explained in the following table:

When looking at the thresholds outlined in the preceding table, we have to understand that these are developed as more of a recommendation rather than a hard rule. Certainly, 25 ms of device latency isn’t good, but it will affect our applications in different ways, sometimes bad, sometimes not at all.

The following sections will outline how we can view latency statistics as they pertain to disk adapters, disk devices, and virtual machines.

Disk adapter latency statistics

By activating the disk adapter display in esxtop, we are able to view our latency statistics as they relate to our HBAs and paths. This is helpful in terms of troubleshooting as it allows us to determine if the issue resides only on a single HBA or a single path to our storage array, as shown in the following screenshot:

Use the following steps to activate the disk adapter latency display:

  1. Start esxtop by executing the esxtop command.
  2. Press d to switch to the disk adapter display.
  3. Press f to select which columns you would like to display.
  4. Toggle the fields by pressing their corresponding letters. In order to view latency statistics effectively, we need to ensure that we have turned on Adapter Name (A), Path Name (B), and Overall Latency Stats (G) at the very least.

esxtop counters are also available for read and write latency specifically along with the overall latency statistics. This can be useful when troubleshooting storage latency as you may be experiencing quite a bit more write latency than read latency which can help you isolate the problems to different storage components.

Disk device latency statistics

The disk device display is crucial when troubleshooting storage contention and latency issues as it allows us to segregate any issues that may be occurring on a LUN by LUN basis.

Use the following steps to activate the disk device latency display:

  1. Start esxtop by executing the esxtop command.
  2. Press u to switch to the disk device display.
  3. Press f to select which columns you would like to display.
  4. Toggle the fields by pressing their corresponding letters. In order to view latency statistics effectively, we need to ensure that we have turned on Device Name (A) and Overall Latency Stats (I) at the very least.

By default, the Device column is not long enough to display the full ID of each device. For troubleshooting, we will need the complete device ID. We can enlarge this column by pressing L and entering the length as an integer that we want.

Virtual machine latency statistics

The latency statistics displayed inside the virtual machine display are not displayed using the same column headers as the previous two views. Instead, they are displayed as LAT/rd and LAT/wr. These counters are measured in milliseconds and represent the amount of time it takes to issue an I/O request from the virtual machine. This is a great view that can be used to determine a couple of things. One, is it just one virtual machine that is experiencing latency? And two, is the latency observed on mostly reads or writes?

Use the following steps to activate the virtual machine latency display:

  1. Start esxtop by executing the esxtop command.
  2. Press v to switch to the virtual machine disk display.
  3. Press f to select which columns you would like to display.
  4. Toggle the fields by pressing their corresponding letters. In order to view latency statistics effectively, we need to ensure that we have turned on VM Name (B), Read Latency Stats (G), and Write Latency Stats (H).

Summary

Storage contention and performance issues are one of the most common causes of slowness and outages within vSphere. Due to the number of software and hardware components involved in the vSphere storage stack, it’s hard for us to pinpoint exactly where the root cause of a storage contention issue is occurring. Using some of the tools, examples, features, and common causes explained in this article, we should be able to isolate issues, making it easier for us to troubleshoot and resolve problems.

Resources for Article :


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here