Back in my DBA (Database Administrator) days I remember benchmarking database applications with concurrent users, often up to 1,000 and in extreme cases even up to 10,000 users. These days, if we look at the number of users that flock to Facebook or Amazon’s websites, and the amount of time they spend on these sites, obviously they would have dramatically exceeded my DBA tasks. As social web platforms and services reach such massive scale and user base, they prompted serious discussions about the scalability of relational databases, the need for ACID (Atomicity, Consistency, Isolation and Durability) compliance in particular, and costs associated to support that in these social media applications. In fact, new types of databases have emerged to address this problem, and they are popularly referred to as “NoSQL”.
These databases are designed to store huge amounts of data on distributed system architecture. Google, Facebook and Amazon pioneered the engineering work to address the limitations of relational databases with NoSQL, and it has been later adopted by a wider community to address similar challenges in enterprise database applications. Oracle NoSQL belongs to the same family of this NoSQL category.
Rubber Meets Road: Flash Storage for NoSQL Databases
Performance is one of the key criteria when organizations deploy these NoSQL databases for applications, with needs for high throughput event processing, click through data processing etc. For NoSQL applications, storage becomes single point of performance bottleneck if data is serviced by traditional hard disk drives. As a result, deployments often resort to increasing memory in server, but this drives the Total cost of ownership (TCO) too high.
To fill this price performance gap, NoSQL databases can be complemented with faster solid state disks (SSDs) to match these application performance requirements. And to prove this, we setup an industry standard Yahoo! Cloud Serving Benchmark (YCSB) for Oracle NoSQL database and evaluated how SanDisk® Cloudspeed SATA SSDs performed when compared to hard disk drives.
Intro to Oracle NoSQL:
Oracle NoSQL is based on a Key-Value pair data model that supports all CRUD (Create, Read, Update, and Delete) operations. Like other NoSQL databases, Oracle NoSQL supports data partitioning functionality using sharding and high availability using replication nodes. NoSQL database driver is aware of NoSQL database network and storage topology and hence provides optimized data access to the clients. Depending on application scalability and latency requirements, it enables flexibility in choosing consistency model from ‘No consistency’ to ‘Absolute consistency’.
Why SanDisk CloudSpeed SATA SSD
The SanDisk CloudSpeed SATA SSD product family offers a full portfolio of SSDs with options ranging from entry level to enterprise grade high performance requirements. CloudSpeed SATA SSDs are optimized for read intensive, mixed-use, write intensive application workloads in enterprise and cloud computing environments like NoSQL applications. Using its 6Gb/s SATA interface, it provides data transfer rates up to 450/400 MB/s sequential read/write and performance up to 80K/25K IOPS random read/write under 2 milli second latency. This is an important factor for NoSQL applications to offer high performance platform at a reasonable cost. In addition to this CloudSpeed drives are protected by SanDisk’s Guardian Technology platform, increasing their durability to ensure endurance over time, and prevention of data loss or corruption, so business can achieve the best Return On Investment (ROI) from their purchase.
Testing CloudSpeed SSD Performance:
We used industry standard Yahoo Cloud Serving Benchmark with default Oracle NoSQL configuration to evaluate the performance benefits of SanDisk SSDs. Standard YCSB parameters were employed for testing. As shown in figure 1, the same workloads were repeated on both SSDs and HDDs by pointing the Oracle NoSQL KVStore to appropriate disks. Throughput and latency test results were captured for analysis and reporting. (An upcoming white paper will provide a detailed description of the testing environment setup and configuration).
Figure 1: YCSB Oracle NoSQL Testing with SanDisk CloudSpeed SSD’s
Testing Results – Throughput and Latency for Various Data Set Sizes:
I’d like to highlight a couple of key points on the outcome of the testing as seen in the figure 2 and 3 (for those who seeking a complete analysis, stay tuned as we publish the Oracle NoSQL white paper in the coming weeks in time for Oracle OpenWorld):
For Workload A (Update Heavy) as shown in figure 2, SanDisk CloudSpeed SSDs provides a 23X performance advantage when compared to HDDs for 32GB data set size (dataset size fits in DRAM). As the data set size increased to 128GB (dataset size exceeds DRAM) it provides an even higher performance benefit, 31X that of HDDs.
Exceptional performance is delivered by SSDs with very minimal latency as shown in figure 3. This behavior was observed for both small and large data set sizes, noting that HDDs increased their latency when the data set size was increased.
Figure 2: Workload A: SSD vs HDD Performance Throughput Comparison
Figure 3: Workload A: SSD vs HDD Latency Comparison
Organizations that need to scale Oracle NoSQL for real time access and high performance will see great benefits in using SSDs to run their application. Our testing shows that SanDisk CloudSpeed SSDs can provide increased transaction throughput, superior latency advantage and better scaling for Oracle NoSQL, regardless of data set sizes.
Visit SanDisk at Oracle OpenWorld to learn more how SanDisk is expanding the possibilities of storage. See our list of activities and speaking sessions on our Oracle OpenWorld landing page. If you’d like to learn more about database administration with SSDs or have questions, contact me at Prasad.Venkatachar@sandiskoneblog.wpengine.com, or join the conversation on Twitter with @SanDiskDataCtr. I look forward to see you at Oracle OpenWorld!
Prasad has extensive experience in IT Services, Presales and Performance benchmarking.At SanDisk, he is responsible for building solutions, reference architecture, deployment guides and best practices for both relational databases and NoSQL databases using ESS portfolio of SSD products.Prior to SanDisk, he was with Hewlett Packard as solution architect and was part of HP worldwide presales team. During this time he was architecting, designing and deployment of database solutions to various HP customers in retail, banking, telecom and insurance industries.He is certified on Oracle database releases from 8i to 12c and is also certified on the IBM DB2 relational database, ITIL V3 and VMware Virtualization products. His other areas of expertise include his work on SAP HANA and NoSQL databases.He received a Bachelor of Engineering in Electronics and Communication from UVCE Bangalore, India, and an MBA in Symbiosis Institute, India.