Columnstore indexes are a powerful memory-optimized column-oriented index intended for data warehouse workloads—but far too few architects know about them or their benefits. Columnstore indexes routinely improve performance by 10x, 100x (such as in the demo below), and even 1,000x—rendering results in seconds that classic row store data structures required minutes to satisfy. Microsoft recommends columnstore indexes as the default choice for data warehouses—and I concur. In this blog I provide:
- A compelling video demo of the power of columnstore
- A brief introduction to columnstore indexes for data warehousing
- Better together: Columnstore on SanDisk® flash (TPC-H world records and Microsoft Data Warehouse Reference Architectures)
- Next steps
The Power of Columnstore
I work in a division of SanDisk called the Data Propulsion Lab (DPL). There we do what I call “flipping the /faster bit”, and columnstore does exactly that for SQL Server data warehouses.
So how much faster is it? See for yourself by viewing the following video. And if you find it compelling that columnstore improves warehouse query performance times from over two minutes down to two seconds, then I invite you to read the rest of the post.
Introduction to Columnstore Indexes
Columnstore indexes were introduced in SQL Server 2012. Even though as stated Microsoft recommends columnstore indexes as the default choice for data warehouses, they remain largely unknown.
Transparent to the app, columnstore has been shown to render queries at sub-second response times which previously took many minutes or longer. In fact, columnstore indexes can increase SQL Server data warehouse query performance by one, two, or even three orders of magnitude.
In SQL Server 2012 some adoption blockers exist, yet columnstore is nonetheless a game changer for many apps.
In SQL Server 2014, columnstore was wholly re-architected and improved, potential blockers have been largely removed, and it is profoundly changing the way we interact with our data.
In SQL Server 2016, Microsoft has continued to enhance this remarkable feature.
How It Works
If we’re looking for a subset of columns from one or a few rows, given the right indexes, SQL Server can do a superlative job of providing an answer. Yet if a query by design needs to hit lots of rows—reporting, aggregations, grouping, scans, etc.—SQL Server has never had a good mechanism—until columnstore.
Contrasted with traditional row stores in which data is physically stored row-by-row, columnstore stores values for all rows for a given column. A given data page will store values only for the given column to which that page is allocated. Storage is pivoted relative to the row stores we’ve all grown up with. Columnstore literally turns row storage on its head—and this isn’t the only thing that’s different.
Better Together: Columnstore on Flash
SanDisk flash optimizes this fantastic technology—“better together”. Our technology is integral to TPC‑H world records such as these recent exercises with our partner, Lenovo: Fusion ioMemory Sets Two TPC-H World Records with Lenovo and Microsoft
SanDisk flash is the storage of choice for many configurations, including these seven Data Warehouse Fast Track Reference Architectures (DWFTRA) certified by Microsoft in collaboration with HP and Lenovo:
Table 1. List of publicly available Microsoft Data Warehouse Fast Track Reference Architecture Certifications
OEM | Server | Rack Units | DB Size (TB) | Measured Throughput (Queries/Hr/TB) | Microsoft Reference Architecture Certification | Date |
---|---|---|---|---|---|---|
HP | DL380 G8 | 2U | 28 | 202 | #2014-003* | 9/3/2014 |
HP | DL380 G8 | 2U | 45 | 198 | #2014-003* | 9/3/2014 |
HP | DL580 G8 | 4U | 90 | 366 | #2014-002 | 9/3/2014 |
Lenovo | x3650 M5 | 2U | 20 | 265 | #2014-009 | 10/15/2014 |
Lenovo | x3850 X6 | 4U | 55 | 360 | #2014-023 | 3/10/2015 |
Lenovo | x3850 X6 | 4U | 60 | 371 | #2014-001 | 8/11/2014 |
Lenovo | x3850 X6 | 4U | 95 | 433 | #2014-028 | 5/5/2015 |
*#2014-003 reflects two certifications issued on the same day and available in the same document.
One of the most remarkable characteristics of these ready-to-order affordable solutions is that they pack tens of terabytes of usable data into a single 2U or 4U server. Contrast this to conventional storage solutions requiring literally hundreds of spindles—spindles typically hosted by enterprise SANs costing millions for hardware and licensing. (See this 99sec clip from Lenovo demonstrating 100TB of raw SanDisk flash in their 2U x3650 M5.)
Next Steps
Stay tuned for additional information on SQL Server columnstore on SanDisk flash, including real-life performance examples and additional insight using our solutions. We’ll review the architecture, as well as the challenges, workarounds, travails, and big wins at customer sites. See whether columnstore can change your users’ world and why columnstore may be a compelling reason to upgrade.
Columnstore indexes flip the DW /faster bit, manifesting my motto, “Change the world or go home!”