In the Cloud

Fast Data at Lower Costs: Software-Defined Storage Using InfiniFlash™ System and IBM Spectrum Scale (GPFS)

Enterprises are increasingly taking advantage of the Internet of Things (IoT) whereby embedded sensors across many devices collect massive amounts data. As this data is collected, enterprises use business intelligence tools and Big Data analytics to derive business value from this data. However, IT budgets and existing storage infrastructure can’t scale to meet the demands of the growth of data or the need for fast data – data that is ready to be analyzed and mined as close to real time as possible.

In this blog series we will explore the different options for building world-class data center storage infrastructure by combining software-defined storage solutions with InfiniFlash for unprecedented scaling capabilities and breakthrough economics.

InfiniFlash System

The InfiniFlash system from SanDisk® is a new, high-performance storage platform offering massive capacity and high density using a low cost enclosure in order to address the demands of capacity workloads at scale. It delivers breakthrough economics for customers with big data storage requirements, particularly in software-defined storage infrastructures. InfiniFlash provides significant cost savings over traditional (monolithic) storage solutions that can often result in vendor lock-in and premium pricing.

IBM General Parallel File System (GPFS) with 256TB of InfiniFlash

In our first blog of this series we take a look at a system built using IBM General Parallel File System (GPFS) and the InfiniFlash IF100 with 256TB of flash for a high-speed clustered file system. IBM GPFS is one of the most mature and widely used software-defined storage solutions in the enterprise today and offers excellent performance when paired with all-flash enclosures. For our infrastructure we have installed GPFS Server 4.1.0.7 on a Dell R720 with 64GB of RAM and have directly connected it to the InfiniFlash system over four SAS cables as shown in the diagram below:

Figure 1 GPFS NSD Server on Dell R720 connected to InfiniFlash system
Figure 1: GPFS NSD Server on Dell R720 connected to InfiniFlash system

Performance testing

Performance testing was conducted using fio (Flexible I/O Tester). Our initial tests were conducted to simulate a particular customer’s workload that required high bandwidth streaming writes combined with 100% random reads across the entire dataset. This test simulated a workload that involves capturing streaming sensor data while simultaneously allowing analysts to query and run data analysis across the entire dataset. We ran two separate tests; the first combining 64K 100% random reads with 1MB sequential writes and the second test combining 256K 100% random reads with 1MB sequential writes as shown in the two graphs below.

Figure 2 Single Server GPFS Tests on InfiniFlash 64K 100% random reads with 1MB sequential writes
Figure 2: Single Server GPFS Tests on InfiniFlash 64K 100% random reads with 1MB sequential writes
Figure 3 Single Server GPFS Tests on InfiniFlash 256K 100% random reads with 1MB sequential writes
Figure 3: Single Server GPFS Tests on InfiniFlash 256K 100% random reads with 1MB sequential writes

For these initial tests we ran the workload simulation directly on the server nodes. Next we configured two GPFS Clients and connected them to the GPFS NSD Server over InfiniBand (IB) as shown in the diagram below.

Figure 4 Single GPFS NSD Server with two GPFS Clients
Figure 4: Single GPFS NSD Server with two GPFS Clients

Having connected the clients to the GPFS NSD Server through a Mellanox IB switch, we then ran identical workloads, first using just a single client, and then using two clients to evaluate how well the solution scaled. As you can see in the charts below, the bandwidth results roughly doubled when adding the second client.

Figure 5 Single GPFS Client w/ Single NSD Server - 64K 100% random reads with 1MB sequential writes
Figure 5 Single GPFS Client w/ Single NSD Server – 64K 100% random reads with 1MB sequential writes

 

Figure 6   Single GPFS Client w/ Single GPFS NSD Server - 256K 100% random reads with 1MB sequential writes
Figure 6: Single GPFS Client w/ Single GPFS NSD Server – 256K 100% random reads with 1MB sequential writes
Figure 7 Two GPFS Clients w/ Single NSD Server - 64K 100% random reads with 1MB sequential writes
Figure 7: Two GPFS Clients w/ Single NSD Server – 64K 100% random reads with 1MB sequential writes

 

Figure 8 Two GPFS Client w/ Single NSD Server - 256K 100% random reads with 1MB sequential writes
Figure 8: Two GPFS Client w/ Single NSD Server – 256K 100% random reads with 1MB sequential writes

Conclusion

Our testing shows that IBM GPFS running on the InfiniFlash IF100 256TB provides an extremely scalable solution that delivers maximum performance for workloads that combine large amounts of sequential streaming data with fast random data access. This new storage solution enables enterprises with unprecedented scaling at lower costs to extract more insight from collected data, attached and embedded devices.

Subscribe to this blog

Enter your email address to subscribe to this blog.

Christopher Howard
Chris has over twenty years of experience in enterprise software and hardware technologies with a focus on providing and implementing leading edge solutions. At Western Digital, Chris serves as the Chief Technologist for the US Federal promoting and implementing SanDisk® data center storage solutions at US Government Agencies. Previously, Chris spent eight years at IBM as a Senior IT Specialist and served on IBM’s certification board for fellow IT Specialists. Chris graduated magna cum laude from Virginia Polytechnic Institute and State University in 1993 with a B.S. in Electrical Engineering.