Monitoring and Troubleshooting Networking

0
3297
21 min read

This article by Muhammad Zeeshan Munir, author of the book VMware vSphere Troubleshooting, includes troubleshooting vSphere virtual distributed switches, vSphere standard virtual switches, vLANs, uplinks, DNS, and routing, which is one of the core issues a seasonal system engineer has to deal with on a daily basis. This article will cover all these topics and give you hands-on step-by-step instructions to manage and monitor your network resources. The following topics will be covered in this article:

  • Different network troubleshooting commands
  • VLANs troubleshooting
  • Verification of physical trunks and VLAN configuration
  • Testing of VM connectivity
  • VMkernel interface troubleshooting
  • Configuration command (Vicfg-vmknic and esxcli network ip interface)
  • Use of Direct Console User Interface (DCUI) to verify configuration

(For more resources related to this topic, see here.)

Network troubleshooting commands

Some of the commands that can be used for networking troubleshooting include net-dvs, Esxcli network, vicfg-route, vicfg-vmknic, vicfg-dns, vicfg-nics, and vicfg-vswitch.

You can use the net-dvs command to troubleshoot VMware distributed dvSwitches. The command shows all the information regarding the VMware distributed dvSwtich configuration. The net-dvs command reads the information from the /etc/vmware/dvsdata.db file and displays all the data in the console. A vSphere host keeps updating its dvsdata.db file every five minutes.


  1. Connect to a vSphere host using PuTTY.
  2. Enter your user name and password when prompted.
  3. Type the following command in the CLI:
    net-dvs
  4. You will see something similar to the following screenshot:

In the preceding screenshot, you can see that the first line represents the UUID of a VMware distributed switch. The second line shows the maximum number of ports a distributed switch can have. The line com.vmware.common.alias = dvswitch-Network-Pools represents the name of a distributed switch. The next line com.vmware.common.uplinkPorts: dvUplink1 to dvUplinkn shows the uplink ports a distributed switch has. The distributed switch MTU is set to 1,600 and you can see the information about CDP just below it. CDP information can be useful to troubleshoot connectivity issues.

You can see com.vmware.common.respools.list listing networking resource pools, while com.vmware.common.host.uplinkPorts shows the ports numbers assigned to uplink ports. Further details about these uplink ports are explained as follows for each uplink port by their port number. You can also see the port statistics as displayed in the following screenshot. When you perform troubleshooting, these statistics can help you to check the behavior of the distributed switch and the ports. From these statistics, you can diagnose if the data packets are going in and out. As you can see in the following screenshot, all the metrics regarding packet drops are zero. If you find in your troubleshooting that the packets are being dropped, you can easily start finding the root cause of the problem:

Unfortunately, the net-dvs command is very poorly documented, and usually, it is hard to find useful references. Moreover, it is not supported by VMware. However, you can use it with –h switch to display more options.

Repairing a dvsdata.db file

Sometimes, the dvsdata.db file of a vSphere host becomes corrupted and you face different types of distributed switch errors, for example, unable to create proxy DVS. In this case, when you try to run the net-dvs command on a vSphere host, it will fail with an error as well. As I have mentioned earlier, the net-dvs command reads data from the /etc/vmware/dvsdata.db file—it fails because it is unable to read data from the file. The possible cause for the corruption of the dvsdata.db file could be network outage; or when a vSphere host is disconnected from vCenter and deleted, it might have the information in its cache.

You can resolve this issue by restoring the dvsdata.db file by following these steps:

  1. Through PuTTY, connect to a functioning vSphere host in your infrastructure.
  2. Copy the dvsdata.db file from the vSphere host. The file can be found in /etc/vmware/dvsdata.db.
  3. Transfer the copied dvsdata.db file to the corrupted vSphere host and overwrite it.
  4. Restart your vSphere host.
  5. Once the vSphere host is up and running, use PuTTY to connect to it.
  6. Run the net-dvs command. The command should be executed successfully this time without any errors.

ESXCLI network

The esxcli network command is a longtime friend of the system administrator and the support staff for troubleshooting network related issues. The esxcli network command will be used to examine different network configurations and to troubleshoot problems. You can type esxcli network to quickly see a help reference and the different options that can be used with the command.

Let’s walk through some useful esxcli network troubleshooting commands. Type the following command into your vSphere CLI to list all the virtual machines and the networks they are on. You can see that the command returned World ID, virtual machine name, number of ports, and the network:

esxcli network vm list
World ID  Name  Num Ports  Networks
--------  ---------------------------------------------------  ---------  ---------------
14323012  cluster08_(5fa21117-18f7-427c-84d1-c63922199e05)          1  dvportgroup-372

Now use the World ID of a virtual machine returned by the last command to list all the ports the virtual machine is currently using. You can see the virtual switch name, MAC address of the NIC, IP address, and uplink port ID:

esxcli network vm port list -w 14323012
   Port ID: 50331662
   vSwitch: dvSwitch-Network-Pools
   Portgroup: dvportgroup-372
   DVPort ID: 1063
   MAC Address: 00:50:56:01:00:7e
   IP Address: 0.0.0.0
   Team Uplink: all(2)
   Uplink Port ID: 0
   Active Filters:

Type the following command in the CLI to list the statistics of the virtual switch—you need to replace the port ID as returned by the last command after –p flag:

esxcli network port stats get -p 50331662
Packet statistics for port 50331662
   Packets received: 10787391024
   Packets sent: 7661812086
   Bytes received: 3048720170788
   Bytes sent: 154147668506
   Broadcast packets received: 17831672
   Broadcast packets sent: 309404
   Multicast packets received: 656
   Multicast packets sent: 52
   Unicast packets received: 10769558696
   Unicast packets sent: 7661502630
   Receive packets dropped: 92865923
   Transmit packets dropped: 0

Type the following command to list complete information about the network card of the virtual machine:

esxcli network nic stats get -n vmnic0
NIC statistics for vmnic0
   Packets received: 2969343419
   Packets sent: 155331621
   Bytes received: 2264469102098
   Bytes sent: 46007679331
   Receive packets dropped: 0
   Transmit packets dropped: 0
   Total receive errors: 78507
   Receive length errors: 0
   Receive over errors: 22
   Receive CRC errors: 0
   Receive frame errors: 0
   Receive FIFO errors: 78485
   Receive missed errors: 0
   Total transmit errors: 0
   Transmit aborted errors: 0
   Transmit carrier errors: 0
   Transmit FIFO errors: 0
   Transmit heartbeat errors: 0
   Transmit window errors: 0

A complete reference of the ESXCli Network command can be found here at https://goo.gl/9OMbVU.

All the vicfg-* commands are very helpful and easy to use. I will encourage you to learn in order to make your life easier. Here are some of the vicfg-* commands relevant to network troubleshooting:

  • vicfg-route: We will use this command for how to add or remove IP routes and how to create and delete default IP Gateways.
  • vicfg-vmknic: We will use this command to perform different operations on VMkernel NICs for vSphere hosts.
  • vicfg-dns: This command will be used to manipulate DNS information.
  • vicfg-nics: We will use this command to manipulate vSphere Physical NICs.
  • vicfg-vswitch: We will use this command to to create, delete, and modify vswitch information.

Troubleshooting uplinks

We will use the vicfg-nics command to manage physical network adapters of vSphere hosts. The vicfg-nics command can also be used to set up the speed, VMkernel name for the uplink adapters, duplex setting, driver information, and link state information of the NIC.

Connect to your vMA appliance console and set up the target vSphere host:

vifptarget --set crimv3esx001.linxsol.com

List all the network cards available in the vSphere host. See the following screenshot for the output:

vicfg-nics –l

You can see that my vSphere host has five network cards from vmnic0 to vmnic5. You are able to see the PCI and driver information. The link state for the all the network cards is up. You can also see two types of network card speeds: 1000 Mbs and 9000 Mbs. There is also a card name in the Description field, MTU, and the Mac address for the network cards. You can set up a network card to auto-negotiate as follows:

vicfg-nics --auto vimnic0

Now let’s set the speed of vmnic0 to 1000 and its duplex settings to full:

vicfg-nics --duplex full --speed 1000 vmnic0

Troubleshooting virtual switches

The last command we will discuss in this article is vicfg-vswitch. The vicfg-vswitch command is a very powerful command that can be used to manipulate the day-to-day operations of a virtual switch. I will show you how to create and configure port groups and virtual switches.

Set up a vSphere host in the vMA appliance in which you want to get information about virtual switches:

vifptarget --set crimv3esx001.linxsol.com

Type the following command to list all the information about the switches the vSphere host has. You can see the command output in the screenshot that follows:

vicfg-vswitch -l

You can see that the vSphere host has one virtual switch and two virtual NICs carrying traffic for the management network and for the vMotion. The virtual switch has 128 ports, and 7 of them are in used state. There are two uplinks to the switch with MTU set to 1500, while two VLANS are being used: one for the management network and one for the vMotion traffic. You can also see three distributed switches named OpenStack, dvSwitch-External-Networks, and dvSwitch-Network-Pools.

Prefixing dv with the distributed switch name is a command practice, and it can help you to easily recognize a distributed switch.

I will go through adding a new virtual switch:

vicfg-vswitch --add vSwitch002

This creates a virtual switch with 128 ports and MTU of 1500. You can use the –mtu flag to specify a different MTU. Now add an uplink adapter vnic02 to the newly created virtual switch vSwitch002:

vicfg-vswitch --link vmnic0 vSwitch002

To add a port group to the virtual switch, use the following command:

vicfg-vswitch --add-pg portgroup002 vSwitch002

Now add an uplink adapter to the port group:

vicfg-vswitch --add-pg-uplink vmnic0 --pg portgroup002 vSwitch002

We have discussed all the commands to create a virtual switch and its port groups and to add uplinks. Now we will see how to delete and edit the configuration of a virtual switch. An uplink NIC from the port group can be deleted using –N flag. Remove vmnic0 from the portgroup002:

vicfg-vswitch --del-pg-uplink vmnic0 --pg portgroup002 vSwitch002

You can delete the recently created port group as follows:

vicfg-vswitch --del-pg portgroup002 vSwitch002

To delete a switch, you first need to remove an uplink adapter from the virtual switch. You need to use the –U flag, which unlinks the uplink from the switch:

vicfg-vswitch --unlink vmnic0 vSwitch002

You can delete a virtual switch using the –d flag. Here is how you do it:

vicfg-vswitch --delete vSwitch002

You can check the Cisco Discovery Protocol (CDP) settings by using the –get-cdp flag with the vicfg-vswitch command. The following command resulted in putting the CDP in the Listen state, which indicates that the vSphere host is configured to receive CDP information from the physical switch:

[email protected]:~[crimv3esx001.linxsol.com]> vicfg-vswitch --get-cdp vSwitch0
listen

You can configure CDP options for the vSphere host to down, listen, or advertise. In the Listen mode, the vSphere host tries to discover and publish this information received from a Cisco switch port, though the information of the vSwitch cannot be seen by the Cisco device. In the Advertise mode, the vSphere host doesn’t discover and publish the information about the Cisco switch; instead, it publishes information about its vSwitch to the Cisco switch device.

vicfg-vswitch --set-cdp both vSwitch0

Troubleshooting VLANs

Virtual LANS or VLANs are used to separate the physical switching segment into different logical switching segments in order to segregate the broadcast domains. VLANs not only provide network segmentation but also provide us a method of effective network management. It also increases the overall network security, and nowadays, it is very commonly used in infrastructure. If not set up correctly, it can lead your vSphere host to no connectivity, and you can face some very common problems where you are unable to ping or resolve the host names anymore. Some common errors are exposed, such as Destination host unreachable and Connection failed. A Private VLAN (PVLAN) is an extended version of VLAN that divides logical broadcast domain into further segments and forms private groups. PVLANs are divided into primary and secondary PVLANs.

Primary PVLAN is the VLAN distributed into smaller segments that are called primary. These then host all the secondary PVLANs within them. Secondary PVLANs live within primary VLANS, and individual secondary VLANs are recognized by VLAN IDs linked to them. Just like their ancestor VLANs, the packets that travel within secondary VLANS are tagged with their associated IDs. Then, the physical switch recognizes if the packets are tagged as isolated, community, or promiscuous.

As network troubleshooting involves taking care of many different aspects, one aspect you will come across in the troubleshooting cycle is actually troubleshooting VLANS. vSphere Enterprise Plus licensing is a requirement to connect a host using a virtual distributed switch and VLANs. You can see the three different network segments in the following screenshot. VLAN A connects all the virtual machines on different vSphere hosts; VLAN B is responsible for carrying out management network traffic; and VLAN C is responsible for carrying out vMotion-related traffic. In order to create PVLANs on your vSphere host, you also need the support of a physical switch:

For detailed information about the vSphere network, refer to the VMware official networking guide for vSphere 5.5 at http://goo.gl/SYySFL.

Verifying physical trunks and VLAN configuration

The first and most important step to troubleshooting your VLAN problem is to look into the VLAN configuration of your vSphere host. You should always start by verifying it. Let’s walk through how to verify the network configuration of the management network and VLAN configuration from the vSphere client:

  1. Open and log in to your vSphere client.
  2. Click on the vSphere host you are trying to troubleshoot.
  3. Click on the Configuration menu and choose Networking and then Properties of the switch you are troubleshooting.
  4. Choose the network you are troubleshooting from the list, and click on Edit.
  5. This will open a new window. Verify the VLAN ID for Management Network.
  6. Match the ID of the VLAN provided by your network administrator.

Verifying VLAN configuration from CLI

Following are the steps for verifying VLAN configuration from CLI:

  1. Log in to vSphere CLI. Type the following command in the console:
    esxcfg-vswitch -l
  2. Alternatively, in the vMA appliance, type the vicfg-vswitch command—the output is similar for both commands:
    vicfg-vswitch –l
  3. The output of the excfg-vswitch –l command is as follows:
    Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
    vSwitch0         128         7           128               1500    vmnic3,vmnic2
      PortGroup Name        VLAN ID  Used Ports  Uplinks
      vMotion               2231     1           vmnic3,vmnic2
      Management Network    2230     1           vmnic3,vmnic2 
    ---Omitted output---
  4. The output of the vicfg-vswitch –l command is as follows:
    Switch Name     Num Ports       Used Ports      Configured Ports    MTU     Uplinks
    vSwitch0        128             7               128                 1500    vmnic2,vmnic3
    
       PortGroup Name                VLAN ID   Used Ports      Uplinks
       vMotion                       2231      1               vmnic2,vmnic3
       Management Network            2230      1               vmnic3,vmnic2
    --Omitted output---
  5. Match it with your network configuration. If the VLAN ID is incorrect or missing, you can add or edit it using the following command from the vSphere CLI:
    esxcfg-vswitch –v 2233 –p "Management Network" vSwitch0
  6. To add or edit the VLAN ID from the vMA appliance, use the following command:
    vicfg-vswitch --vlan 2233 --pg "Management Network" vSwitch0

Verifying VLANs from PowerCLI

Verifying information about VLANs from the PowerCLI is fairly simple. Type the following command into the console after connecting with vCenter using Connect-VIServer:

Get-VirtualPortGroup –VMHost crimv3esx001.linxsol.com | select Name, VirtualSwitch VLanID

Name                                           VirtualSwitch                              
    VlanId ----                                                -------------                                     ----- vMotion                                        vSwitch0                                   
   2231 Management Network                 vSwitch0                                       2233

Verifying PVLANs and secondary PVLANs

When you have configured PVLANs or secondary PVLANs in your vSphere infrastructure, you may arrive at a situation where you need to troubleshoot them. This topic will provide you some tips to obtain and view information about PVLANs and secondary PVLANs, as follows:

  1. Log in to the vSphere client and click on Networking.
  2. Select a distributed switch and right-click on it.
  3. From the menu, choose Edit Settings and click on it. This will open the Distributed Switch Settings window.
  4. Click on the third tab named Private VLAN.
  5. In the section on the left named Primary private VLAN ID, verify the VLAN ID provided by your network engineer.
  6. You can verify the VLAN ID of the secondary PVLAN in the next section on the right.

Testing virtual machine connectivity

Whenever you are troubleshooting, virtual-machine-to-virtual-machine testing is very important. It helps you to isolate the problem domain to a smaller scope. When performing virtual-machine-to-virtual-machine testing, you should always move virtual machines to a single vSphere host. You can then start troubleshooting the network using basic commands, such as ping. If ping works, you are ready to test it further and move the virtual machines to other hosts, and if it still doesn’t work, it most likely is a configuration problem of a physical switch or is likely to be a mismatched physical trunk configuration. The most common problem in this scenario is a problematic physical switch configuration.

Troubleshooting VMkernel interfaces

In this section, we will see how to troubleshoot VMkernel interfaces:

  • Confirm VLAN tagging
  • Ping to check connectivity
  • Vicfg-vmknic
  • Escli network ip interface for local configuration
  • Escli network ip interface list
  • Add or remove
  • Set
  • Escli network ip interface ipv4 get

You should know how to use these commands to test if everything is working. You should be able to ping to ensure connectivity exists.

We will use the vicfg-vmknic command to configure vSphere VMkernel NICs. Let’s create a new VMkernel NIC in a vSphere host using the following steps:

  1. Log in to your VMware vSphere CLI.
  2. Type the following command to create a new VMkernel NIC:
    vicfg-vmknic –h crimv3esx001.linxsol.com --add --ip 10.2.0.10 –n 255.255.255.0 'portgroup01'

    You can enable vMotion using the vicfg-vmknic command as follows:

    vicfg-vmknic –enable-vmotion.

    You will not be able to enable vMotion from ESXCLI.vMotion protect migration of your virtual machines with zero down time.

  3. You can delete an existing VMkernel NIC as follows:
    vicfg-vmknic –h crimv3esx001.linxsol.com --delete 'portgroup01'
  4. Now check by typing the following command which VMkernel NICs are available in the system:
    vicfg-vmknic -l

Verifying configuration from DCUI

When you successfully install vSphere, the first yellow screen that you see is called the vSphere DCUI. DCUI is a frontend management system that helps perform some basic system administration tasks. It also offers the best way to troubleshoot some problems that may be difficult to troubleshoot through vMA, vCLI, or PowerCLI. Further, it is very useful when your host becomes irresponsive from the vCenter or is not accessible from any of the management tools.

Some useful tasks that can be performed using the DCUI are as follows:

  • Configuring the Lockdown mode
  • Checking connectivity of Management Network by Ping
  • Configuring and restarting network settings
  • Restarting management agents
  • Viewing logs
  • Resetting vSphere configuration
  • Changing root password

Verifying network connectivity from DCUI

The vSphere host automatically assigns the first network card available to the system for the management network. Moreover, the default installation of the vSphere host does not let you set up VLAN tags until the VMkernel has been loaded. Verifying network connectivity from the DCUI is important but easy. To do so, follow these steps:

  1. Press F2 and enter your root user name and password. Click OK.
  2. Use the cursor keys to go down to the Test Management Network option.
  3. Click Enter, and you will see a new screen. Here you can enter up to three IP addresses and the host name to be resolved.
  4. You can also type your gateway address on this screen to see if you are able to reach to your gateway.
  5. In the host name, you can enter your DNS server name to test if the name resolves successfully.
  6. Press Esc to get back and Esc again to log off from the vSphere DCUI.

Verifying management network from DCUI

You can also verify the settings of your management network from the DCUI.

  1. Press F2 and enter your root user name and password. Click OK.
  2. Use the cursor keys to go down to option Configure Management Network option and click Enter.
  3. Click Enter again after selecting the first option Network Adapters. On the next screen, you will see a list of all the network adapters your system has.
  4. It will show you the Device Name, Hardware Type, Label, Mac Address of the network card, and the status as Connected or Disconnected.
  5. From the given network cards, you can select or deselect any of the network card by pressing the space Bar on your keyboard.
  6. Press Esc to get back and Esc again to log off from the vSphere DCUI.

As you can see in the preceding screenshot, you can also configure the IP address and DNS settings for your vSphere host. You can also use DCUI to configure VLANs and DNS Suffix for your vSphere host.

Summary

In this article, for troubleshooting, we took a deep dive into the troubleshooting commands and some of the monitoring tools to monitor network performance.

The various platforms to execute different commands help you to isolate your troubleshooting techniques. For example, for troubleshooting a single vSphere host, you may like to use esxcli, but for a bunch of vSphere hosts you would like to automate scripting tasks from PowerCLI or from a vMA appliance.

Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here