Data Warehouse (DW) in the Era of Flash: Massive 90-95TB DW in 4U
In the previous post in this series, I wrote about two 45-55TB data warehouse systems that fit entirely in a 2U or 4U server. In this blog post we’ll look at two more Fast Track systems, this time rated for 90-95TB, in 4-socket, 4U servers.
The largest SQL Server 2012 Data Warehouse (DW) FTRA that I’ve seen published was FT-rated for 85TB, published in July 2013. It used 328 HDDs and required 24U of rack space.
In this post I will describe two servers, FT-rated for 90TB and 95TB, that take just 4U.
These are massive systems, larger than most companies have need for. IDC reports the amount of data doubles every two years, so yesterday’s 20TB database is today’s 45TB database, and tomorrow’s 90TB database. What seems massive today may soon be routine, so it pays to stay aware of what’s possible.
HP ProLiant DL580 Gen8
The first system I’d like to look at is based on the HP ProLiant DL580 Gen8 server, FT-rated for a 90TB data warehouse. (Microsoft’s DWFT certification appears on page 4.) This is a 4-socket server that takes 4U of rack space.
This HP system stores the data warehouse on six HP 5.2TB FH/HL Light Endurance (LE) PCIe Workload Accelerators and the log and staging files are stored on two HP 1.3TB HH/HL Light Endurance (LE) PCIe Workload Accelerators. Both are HP’s rebranding of SanDisk® Fusion ioMemory PX600 series cards.
The “measured throughput” is 366 queries/hour/TB using SQL Server’s row store, or 2,721 queries/hour/TB using column store.
Multiplying queries by terabytes gives us 32,940 queries/hour using SQL Server’s row store features, and 244,890 queries/hour using SQL Server’s in-memory columnstore features. CPU utilization is 92% using row store, and 96% using columnstore.
Lenovo System x3850 X6
The second system is based on the Lenovo System x3850 X6, FT-rated for 95TB data warehouse. (Microsoft’s DWFT certification appears on page 4.) This is a 4-socket server that takes 4U of rack space, and Lenovo classifies this as an Advanced data warehouse system, with a capacity range of >40TB.
This system stores the data warehouse and tempdb on six IBM 5200GB Enterprise io3 Flash Adapter, which is Lenovo’s branding of SanDisk Fusion ioMemory PX600-5200.
The “measured throughput” is 433 queries/hour/TB using SQL Server’s row store, or 3,417 queries/hour/TB using column store.
Multiplying queries by terabytes gives us 41,135 queries/hour using SQL Server’s row store features, and 324,615 queries/hour using SQL Server’s in-memory columnstore features. CPU utilization is 73% using row store, and 92% using columnstore.
Points of comparison
These two systems are similar in physical size, number of ioMemory cards and capacity, but they differ in core count.
The HP system uses six 5.2TB cards and two 1.3TB cards, with four 15-core processors for a total of 60 cores.
The Lenovo system uses six 5.2TB cards (and 4 SSDs for log files), with four 18-core processors for a total of 72 cores.
The Lenovo system’s 12 additional cores are why it delivers higher measured throughput (Queries/hour/TB).
What are the Benefits of These Systems?
Like the 20-28TB and 45-55TB systems I referenced in my previous blogs, these two systems also deliver similar, important benefits:
- Balanced. These systems are pre-configured and validated to be balanced data warehouse systems, avoiding the over- or under-spending in one area of the system; it’s just right for this capacity.
- Complete. These are complete solutions, ordered off the server vendors’ price lists. They include rebranded SanDisk flash, which is covered by the server vendor’s warranty and support services, and SQL Server 2014 Enterprise Edition for a complete set of business intelligence tools.
- Performance. The modern servers and CPUs take full advantage of the SanDisk flash storage to deliver the performance that lets benefit most from this data warehouse system. More users, more queries – more business insights, better business results.
- Simple and Compact. Everything you need is installed in a 2U or 4U server. No additional infrastructure is required to deal with external storage.
- Reliable and Economical. Using SanDisk flash instead of HDDs improves system reliability. SanDisk flash is significantly more reliable than HDDs with up to 10,000 times fewer uncorrectable bit errors, requires much less power to operate, and generates far less heat so you save on cooling cost.
Either of these systems is a great data warehouse solution for customers needing a 90-95TB data warehouse.
Data Warehouse in the Era of Flash – Better, Stronger, Faster
In my initial post in this series, I discussed how using flash for data warehouse storage eliminates the need to optimize the data layout on disk (a hold-over from HDDs) every time you import new data.
That means you can seamlessly import sales data daily or hourly, and ensure your staff can leverage the latest, most up-to-date data as they make the decisions that guide your business. That also frees your DBAs to do more valuable things than repeatedly defragment HDDs, or replacing broken HDDs.
Data Warehouse in the era of flash is better, stronger and faster. Let our team show you how to take advantage of it.
Learn more about flash and data warehouse in our new Data Warehouse Content Hub: datawarehouse.sandiskoneblog.wpengine.com and if you’re attending SQL PASS Summit make sure to visit SanDisk and see our demos in booth #222. I look forward to seeing you there!
If you have any questions feel free to reach out to me in the comments below or write me directly at Peter.Plamondon@sandiskoneblog.wpengine.com