Building the Western Digital vSAN Performance Testbed
In my blog titled “Definitive Best Practices: Inside Western Digital’s vSAN Performance Testbed” I gave a brief introduction to the VMware vSAN performance testbed, describing why we had built it, and the evolution of the baseline performance numbers. Here I would like to explain further how the cluster was actually configured so you can easily (relatively speaking) emulate my work.
In this blog, I’ll discuss simple vSphere post-install tasks, network configuration details, and finally building the vSAN cluster. This will all be done on my baseline configuration.
I will not be covering in detail how to install ESXi, how to setup and deploy a vCenter Server, or how to configure a vSphere distributed switch. Instead, I will discuss the design principles and steps for properly configuring the vSAN performance testbed.
The following services should be available for ease of deployment:
Windows Based vCenter Server Version 6.5 Build 4564107 (or vCenter Server Appliance)
ESXi Version 6.5 Build 4564106
vSphere Post-Install Tasks
In my test environments, I usually follow some basic configuration standards. These are not all required (or recommended) for production clusters, but each configuration choice should be well-thought-out. As the Data Propulsion LabsTM (DPLTM) is constantly deploying test clusters for one reason or another, I took a moment to script these post-install tasks in PowerCLI to save time. That being said, look at the VMware Host Profiles, as they can be a time saver as well.
All scripted parts assume that hosts have already been added to vCenter Server. My full script does the following things:
Adds hosts to the vCenter server based on original IP address
Performs post-install tasks
Inserts DHCP reservations and DNS records
Configures a temporary VMKernel adapter with a temporary IP address
Removes hosts and re-adds hosts to vCenter Server with a temporary IP address
Modifies and resets the primary VMKernel adapter
Removes hosts and re-adds hosts with Fully Qualified Domain Names (FQDNs)
As that is quite a list, I’ll leave the subject of scripting deployment tasks to another full blog. I’m including some PowerCLI excerpts in this blog, to help wet the scripting wizard’s whistle.
As not all my testing requires vSphere distributed switches, my script first configures basic networking.
I ensured the appropriate vmnic was selected for management network traffic. In this case, all traffic in the cluster will be run off the Mellanox ConnectX-3 Pro Host Channel Adapters, so I selected vmnic4.
As I’m a minimalist and prefer to avoid potential networking headaches, I disabled IPv6 on the hosts (this requires a reboot).
I chose to implement a vSphere Distributed Switch (VDS) as it would allow for a finer-grain control over vSAN traffic, as well as ensuring performance during periods of contention with vSphere Network I/O Control (NIOC).
VDS Version 6.5.0
Health Check: Enabled
Because I was running all traffic through a single switch and implementing a VDS, I needed to deploy VLANs.
VLAN 3: vSAN
VLAN 4: vMotion
VLAN 152: Management and Virtual Machine traffic
Eliminating a potential bottleneck with 56Gb Ethernet allowed me to simplify my network configuration. The Mellanox ConnectX-3 Pro adapters I used were dual-port. With only one adapter per host, I only had to configure 2 uplinks to my VDS.
Uplink 1: vmnic4
Uplink 2: vmnic1000402
Pay particular attention to the “Teaming and Failover” configuration of each port group. It is essential, when possible, to isolate the vSAN traffic to its own uplink port. The highlighted route in each screen shot below shows the active uplink per port group.
DPL-Lab: VLAN 152
Active Uplink: 1
Passive Uplink: 2
DPL-VMotion: VLAN 4
Active Uplink: 1
Passive Uplink: 2
VSAN: VLAN 3
Active Uplink: 2
Passive Uplink: 1
This part was relatively simple, using the web client.
During cluster creation, I selected the “Turn ON” checkbox for Virtual SAN, and selected “Manual” for the “Add disks to storage” drop-down list.
I added hosts to the cluster through a simple drag and drop.
For disk groups, in this first cluster, I used the 12Gb/s SAS SSDs from the HGST UltrastarTM SS200 family, which have been certified for use for both caching tier and capacity tier within vSAN. This drive’s family is targeted for mixed-use and read-intensive applications with high-endurance and high-capacity options. This particular SSD family also offers great price/performance metrics that align quite well with vSAN capabilities. I created (2) disk groups per node via Cluster -> Configure -> Virtual SAN -> Disk Management as follows:
vSAN 6.2 Introduced a great data reliability feature called Software Checksum. This is an End-to-End Software Checksum within vSAN that uses CRC-32C. It is enabled by default, as it protects against possible issues in the underlying storage medium. If your application already ensures data integrity at this level, this feature can be disabled. To really push vSAN to the breaking point, I have disabled Object Checksum in the Virtual SAN Default Storage Policy.
I hope that sharing these configuration settings, and the reasons I chose each one, helps you understand my performance testing methodology here at the DPL at Western Digital. It took many testing iterations and conversations, with experts here and at VMware, to arrive at this configuration. With any luck, this will help you define your own testbeds and understanding my upcoming performance blogs. I look forward to hearing from you on these settings, or any that you’ve discovered yourself.
Jonathan Flynn started in the high-performance supercomputing industry and joined the storage industry in 2007 at Fusion-io (now Western Digital Corp.). His expertise in storage and networking infrastructure has helped him promote enterprise solid state storage.At Western Digital, Jonathan is a Technologist, promoting virtualization in Western Digital’s Data Propulsion Labs.Jonathan has a BS in Computer Science from the University of Utah and works in Salt Lake City.