Data Warehouse in the Era of Flash – Better, Stronger, Faster

Data Warehouse in the Era of Flash – Better, Stronger, Faster

The historical data each company collects, contains insights that can improve business outcomes – that is, if you can sift them out of the data. Ten years ago, Walmart analyzed the terabytes of shopper history data they’d collected and found that ahead of a hurricane, sales of strawberry Pop-Tarts spiked to 7x above normal. With that insight, Walmart sent additional toaster pastries to stores in the path of a hurricane and boosted their sales.

Today, many companies have terabytes of historical data on their own customers, but find it difficult to extract business value from that data. Why? Complexity and cost. What’s important to understand when seeking simple and lower-cost solutions, is that the storage system of a data warehouse impacts your ability to extract the business insights you’re looking for.

Storage Data Blocks

Your data warehouse likely holds historical data about customers, the products they ordered, when they placed their order, etc. When an employee enters a query, e.g. customers who ordered a product in the last quarter of the fiscal year, the data warehouse software will read a great deal of data. Traditional hard disk drive (HDD)-based storage systems are optimized to store the data in large, contiguous blocks so the HDDs can deliver that data most efficiently.

When you update the data warehouse with new data, perhaps last week’s sales data, that disrupts the carefully optimized data layout on HDDs, and the data warehouse performance drops. After one or a few updates, the loss of performance is unacceptable, and the data layout on the HDDs must be optimized again to put the data back into large, contiguous blocks. This is called defragmentation.

As noted in Microsoft’s Fast Track Data Warehouse Reference Guide (FTDW Ref Guide):

Storage System: Managing fragmentation is crucial to system performance over time for FTDW reference architectures that place primary database storage on hard disk drives (HDD). For this reason, a detailed storage and file system configuration is specified.

The Trouble With Defragmentation

If you’ve ever “defragmented” the hard drive on your PC, you know that’s not a fast process. The more data, the longer it takes to optimize, and that’s true for a data warehouse as well. The data warehouse may be off-line during this optimization step, or you’ll experience significantly reduced performance, depending on which version of SQL Server you’re using.

If you want to see which products customers bought last week, you need to import last week’s sales data into your data warehouse – and likely take the data warehouse off-line entirely, or experience sluggish response, while the data warehouse is re-optimized to full performance.

Would you like to see what customers bought yesterday, or an hour ago? That means daily or hourly imports – and even more frequent optimization.

There’s got to be a better way! And there is:

Flash Storage to the Rescue

The situation described above, of downtime for re-optimization, is done to accommodate the characteristics of HDDs, not your needs or desired workflow. And the good news is that it does not apply to flash storage.

As noted in the FTDW Ref Guide:

“Recommendations for page fragmentation management … can be ignored, because logical database fragmentation does not impact solid state I/O performance.”

That means you can import yesterday’s sales data, or the data from an hour ago, and immediately use that data to drive your decisions, with no downtime for re-optimization.

Using flash also accelerates your data warehouse performance, so more employees can ask more questions each day, and get more (and better) business insights to drive your business results.

A Quick Word about What Makes Flash Cost-Effective

SSDs are more than 100x faster than hard drives. For workloads requiring high IOPS or random access, a few SSDs deliver better performance than a rack of hard drives. That means less hardware to buy, less infrastructure to manage, less rack space, less maintenance and electricity and simplified management. SSDs also require much less electricity and cooling than their spinning counterparts, and all these benefits adds up to deliver big savings of both costs and time.

Learn about The Accelerating Economics of Flash and the Retreat of Hard Disk Drives in Brian Cox’s blog post.

Real World Examples

TekSouth, a business intelligence and software development company that serves federal, commercial and local clients, updated the US Air Force Commander’s Resource Integration System (CRIS), the Authoritative Data Source for financial management of unclassified appropriated historical data, which receives updates from 250 data feeds.

TekSouth’s system combines SQL Server 2008 R2 with SanDisk® Fusion ioMemory. By using Fusion ioMemory they can now support daily updates, with 3x the number of ad-hoc queries from 3x more concurrent users than their previous system. And the story gets even better. Fusion ioMemory allowed TekSouth to remove 27 arrays with over 400 HDDs, resulting in a 16:1 footprint consolidation and reduction of power and cooling to 1/16th what it had been.

Sounds compelling? You can learn more about TekSouth’s experience in this case study.

A Cost-Effective Data Warehouse Is About Balance

To deploy a cost-effective data warehouse, you need balance – the right amount of CPU, system memory, storage and networking. Not too much or you’ve over-spent; not too little or you’ve created a bottleneck. How can you be confident you’re deploying a balanced system? Microsoft has a few suggestions:

SQL Server Data Warehouse Fast Track (DWFT) program is designed to provide customers standard and proven system architectures optimized for a range of enterprise data warehousing needs. DWFT is a joint effort between Microsoft and many hardware partners. The goal is to help enterprise customers deploy Data Warehouse solutions with recommended hardware configuration appropriate for the requirements of the workload with reduced risk, cost and complexity.*

The DWFT program publishes Microsoft-validated Reference Architectures for tested systems, and SanDisk and our OEM partners have a range of validated systems to meet the needs of most customers, with simple, compact, cost-effective data warehoused solutions. 

Want to learn more? Take a look at some of our latest posts on data warehouse.

I’ll be continuing a series of blogs on data warehouse and flash. Here are some of the topics I’ll review in my upcoming posts:

  • Data warehouse systems in 2U and 4U servers, Fast Track-certified for data warehouses from 20TB to 95TB using SanDisk flash
  • Recent TPC-H world record systems that deliver remarkable performance and cost-per-query, using SanDisk flash.

*Source: Data Warehouse Fast Track Reference Guide for SQL Server 2014

Related Stories

What is the 3-2-1 Backup Strategy?