(For more resources related to this topic, see here.)
For those hoping to find a magical catch-all formula that will work in every scenario, you’ll have to keep looking. Remember every environment is unique, and even where similarities may arise, the use case your organization has, will most likely be different from another organization. Beyond your specific VM resource requirements, the hosts you are installing ESXi on will also vary; the hardware available to you will affect your consolidation ratio (the number of virtual machines you can fit on a single host). For example, if you have 10 servers that you want to virtualize, and you have determined each requires 4 GB of RAM, you might easily virtualize those 10 servers on a host with 48 GB of memory. However, if your host only has 16 GB of memory, you may need two or three hosts in order to achieve the required performance.
Another important aspect to consider is when to collect resource utilization statistics about your servers. Think about the requirements you have for a specific server; let’s use your finance department as an example. You can certainly collect resource statistics over a period of time in the middle of the month, and that might work just fine; however, the people in your finance department are more likely to utilize the system heavily during the first few days of the month as they are working on their month end processes. If you collect resource statistics on the 15th, you might miss a huge increase in resource utilization requirements, which could lead to the system not working as expected, making unhappy users.
One last thing before we jump into some example statistics; you should consider collecting these statistics over at least two periods for each server:
First, during the normal business hours of your organization or the specific department, during a time when systems are likely to be heavily utilized
The second round should include an entire day or week so you are aware of the impact of after hours tasks such as backups and anti-virus scans on your environment
It’s important to have a strong understanding of the use cases for all the systems you will be virtualizing. If you are running your test during the middle of the month, you might miss the increase of traffic for systems utilized heavily only at the end of the month, for example, accounting systems. The more information you collect, the better prepared you will be to determine your resource utilization requirements.
There are quite a few commercial tools available to help determine the specific resource requirements for your environment. In fact, if you have an active project and/or budget, check with your server and storage vendor as they can most likely provide tools to assess your environment over a period of time to help you collect this information. If you work with a VMware Partner or the VMware Professional Services Organization (PSO), you could also work with them to run a tool called VMware Capacity Planner. This tool is only available to partners who have passed the corresponding partner exams.
For purposes of this article, however, we will look at the statistics we can capture natively within an operating system, for example, using Performance Monitor on Windows and the sar command in Linux.
If you are an OS X user, you might be wondering why we are not touching OS X. This is because while Apple allows virtualizing OS X 10.5 and later, it is only supported on the Apple hardware and is not likely an everyday use case. If your organization requires virtualizing OSX, ESXi 5.1 is supported on specific Mac Pro desktops with Intel Xeon 5600 series processors and 5.0 is supported on Xserve using Xeon 5500 series processors. The current Apple license agreement allows virtualizing OSX 10.5 and up; of course, you should check for the latest agreement to ensure you are adhering to the license agreement.
Monitoring common resource statistics
From a statistics perspective, there are four main types of resources you generally monitor: CPU, memory, disk, and network. Unless you have a very chatty application, network utilization is generally very low, but this doesn’t mean we won’t check on it; however, we probably won’t dedicate as much time to it as we do for CPU, memory, and disk.
As we think about the CPU and memory, we generally look at utilization in terms of percentages. When we look at example servers, you will see that having an accurate inventory of the physical server is important so we can properly gauge the virtual CPU and memory requirements when we virtualize. If a physical server has dual quad core CPUs and 16 GB of memory, it does not necessarily mean we want to provide the same amount of virtual resources.
Disk performance is where many people spend the least amount of time, and those people generally have the most headaches after they have virtualized. Disk performance is probably the most critical aspect to think about when you are planning your virtualization project. Most people only think of storage in terms of storage capacity, generally gigabytes (GB) or terabytes (TB). However, from a server perspective, we are mostly concerned with the amount of input and output per second, otherwise known as IOPS and throughput. We break down IOPS in into reads and writes per second and then their ratio by comparing one with the other. Understanding your I/O patterns will help you design your storage architecture to properly support all your applications. Storage design and understanding is an art and science by itself.
Let’s break this down into a practical example so we can see how we are applying these concepts. In this example, we will look at two different types of servers that are likely to have various resource requirements: Windows Active Directory Domain Controller and a CentOS Apache web server. In this scenario, let’s assume that each of these server operating systems and applications are running on dedicated hardware, that is, they are not yet virtual machines.
The first step you should take, if you do not have this already, is to document the physical systems, their components, and other relevant information such as computer or DNS name, IP address (es), location, and so on. For larger environments, you may also want to document installed software, user groups or departments, and so on.
Collecting statistics on Windows
On Windows servers, your first step would be to start performance monitoring. Perform the following steps to do so:
Navigate to Start | Run and enter perfmon.
Once the Performance Monitor window opens, expand Monitoring Tools and click on Performance Monitor. Here, you could start adding various counters; however, as of Windows 2008/Windows 7, Performance Monitor includes Data Collector Sets.
Expand the Data Collector Sets folder and then the System folder; right-click on System Performance and select Start. Performance Monitor will start to collect key statistics about your system and its resource utilization.
When you are satisfied that you have collected an appropriate amount of data, click on System Performance and select Stop. Your reports will be saved into the Reports folder; navigate to Reports| System, click on the System Performance folder, and finally double-click on the report to see the report.
In the following screenshot for our domain controller, you can see we were using 10 percent of the total CPU resources available, 54 percent of the memory, a low 18 IOPS, and 0 percent of the available network resources (this is not really uncommon; I have had busy application servers that barely break 2 percent).
Now let’s compare what we are utilizing with the actual physical resources available to the server. This server has two dual core processors (four total cores) running at 2 GHz per core (8 GHz total available), 4 GB of memory, two 200 GB SAS drives configured in a RAID 1, and a 1 Gbps network card.
Here, performance monitor shows averages, but you should also investigate peak usage. If you scroll down in the report, you will find a menu labeled CPU. Navigate to CPU | Process. Here you will see quite a bit of data, more than the space we have to review in this book; however, if you scroll down, you will see a section called Processor User Time by CPU. Here, your mean (that is, average) column should match fairly closely to the report overview provided for the total, but we also want to look at any spikes we may have encountered. As you can see, this CPU had one core that received a maximum of 35 percent utilization, slightly more than the average suggested.
If we take the average CPU utilization at 10 percent of the total CPU, it means we will theoretically require only 800 MHz of CPU power, something a single physical core could easily support. The, memory is also using only half of what is available, so we can most likely reduce the amount of memory to 3 GB and still have room for various changes in operating conditions we might have not encountered during our collection window. Finally, having only 18 IOPS used means that we have plenty of performance left in the drives; even a SATA 7200 RPM drive can provide around 80 IOPS.
Collecting statistics on Linux
Now let’s look at the Linux web server to see how we can collect this same set of information using sar in an additional package with sysstat that can monitor resource utilization over time. This is similar to what you might get from top or iotop. The sysstat package can easily be added to your system by running yum install sysstat, as it is a part of the base repository (yum install sysstat is command format).
Once the sysstat package is installed, it will start collecting information about resource utilization every 10 minutes and keep this information for a period of seven days. To see the information, you just need to run the sar command; there are different options to display different sets of information , which we will look at next.
Here, we can see that our system is idle right now by viewing the %idle column.
A simple way to generate some load on your system is to run dd if=/dev/zero of=/dev/null, which will spike your CPU load to 100 percent, so, don’t do this on production systems! Let’s look at the output with some CPU load. In this example, you can see that the CPU was under load for about half of the 10-minute collection window.
One problem here is that unless the CPU spike, in this case to 100 percent, was not consistent for at least 10 minutes, we would potentially miss these spikes using sar with a 10-minute window. This is easily changed by editing /etc/cron.d/sysstat, which tells the system to run this every 10 minutes. During a collection window, one or two minutes may provide more valuable detail. In this example, you can see I am now logging at a five-minute interval instead of 10, so I will have a better chance to find maximum CPU usage during my monitoring period.
Now, we are not only concerned with the CPU, but we also want to see memory and disk utilization. To access those statistics, run sar with the following options:
The sar –r command will show RAM (memory) statistics. At a basic level, the items we are concerned with here would be the percentage of memory used, which we could use to determine how much memory is actually being utilized.
The sar –b command will show disk I/O. From a disk perspective, sar –b will tell us the total number of transactions per second (tps), read transactions per second (rtps), and write transactions per second (wtps).
As you can see, you are able to natively collect quite a bit of relevant data about resource utilization on our systems. However, without the help of a vendor or VMware PSO who has access to VMware Capacity Planner, another commercial tool, or a good automation system, this can become difficult to do on a large scale (hundreds or thousands of servers), but certainly not impossible.
Resources for Article:
- Windows 8 with VMware View [Article]
- Troubleshooting Storage Contention [Article]
- Networking Performance Design [Article]