Turn Big Data into Fast Data Using Oracle NoSQL and Fusion ioMemory

Turn Big Data into Fast Data Using Oracle NoSQL and Fusion ioMemory

I will be heading out to Oracle OpenWorld 2015 next week together with SanDisk® experts to show how flash memory is enabling customers to slash costs and turn their database into productivity powerhouses. Among the demos in the SanDisk booth #811, we will be showing Oracle Big Data NoSQL and Oracle Database 12c acceleration with flash storage. I will also be speaking in the booth theater about Oracle database consolidation benefits using SanDisk flash storage solutions – so if you’re at the event, please make sure to stop by!

One of the latest trends in Big Data is not just the volume of data, but how fast the data is analyzed. The rise of Apache Spark for real time analytics is a great example of how fast data is gaining great importance. Similarly, we witness a growing landscape of NoSQL databases that deliver faster data storage and updates for web-scale applications. Oracle NoSQL belongs to the family of high performance NoSQL databases modeled by the Key Value paradigm.

NoSQL in the Enterprise

Graph on Data Collection for Big Data
Captured from ‘Decoding Big Data for Business Growth’ infographic by Single Grain

Even though NoSQL was triggered by the need of hyperscale and web 2.0 companies, the adoption of NoSQL databases is increasingly important for enterprises. I recommend reading this Forrester paper on why NoSQL has gained strong momentum and what are the key insights in evaluating NoSQL options.

The volume of data generated, from multiple data sources, needs a horizontally scalable system that can scale in multiple systems and can process both structured and unstructured data. This data needs to be captured at the same speed at which it is generated (from blogs, text files, videos, images etc.). But even more important for businesses is that these NoSQL database systems need to be cost effective so they can scale appropriately and deliver a good return on investment. Technology giants like Amazon, Facebook, and Google have proven a cost-effective, successful adoption of NoSQL in their data centers – so how can enterprises best benefit from the insights of these database systems?

Oracle NoSQL Testing with Fusion ioMemory Application Accelerators

Last year I worked on benchmarking Oracle NoSQL using SanDisk CloudSpeed SSDs. You can read more about the results here on the IT Blog, as well as on Oracle’s website. This year, I continued this work on optimized Oracle NoSQL cluster deployment for high performance scalability benefits. I’d like to share some of this work in this blog.

Fusion ioMemory PCIe application accelerators from SanDisk deliver extreme performance to enable a cost-effective, fast Big Data platform for NoSQL implementation. The Fusion ioMemory portfolio offers many industry-leading solutions, and for our testing we used the SX350 line, which is the 3rd Generation of the world-renowned Fusion ioMemory platform, and a cost-effective solution for accelerating read-intensive workloads. The SX350 is available in capacities ranging from 1.25TB to 6.4TB of addressable, persistent flash. It provides a random read/write performance up to 345K/585K IOPS, with 79-microsecond read latency and 15-microsecond write latency.

Let us evaluate the performance benefits of using Fusion IoMemory for Oracle NoSQL database.

Oracle NoSQL
Figure 2: Oracle NoSQL cluster with 2 shards and Admin console Topology viewer report

YCSB Testing Configuration – Flash Application Accelerators vs. HDDs

The Yahoo! Cloud Serving Benchmark (YCSB) is the standard benchmarking tool for evaluating NoSQL database systems. I used this tool to assess the performance and scalability benefits of Fusion ioMemory when compared to traditional hard disks drives for Oracle NoSQL workloads. KV store, the Oracle NoSQL data storage container, was initially configured with traditional spinning disks and later switched to Fusion ioMemory. An XFS file system was created for hosting the Oracle NoSQL data store.

The Oracle NoSQL cluster was set up on three Lenovo System x3650 servers. Each of these servers has 16 Intel Xeon E5-2690 cores with 128 GB RAM. A 10 GBE network interconnect was used for intra-node communication in the Oracle NoSQL cluster and the YCSB client. The YCSB client was configured to run on a dedicated client server with a similar system configuration to that of Oracle NoSQL servers. The YCSB test consists of loading the dataset into the Oracle NoSQL cluster and executing diverse workloads of various read/write ratios.

The following workload types have been used in our testing:

  • Workload A: Write-heavy, 50% write / 50% read
  • Workload B: Read-heavy, 5% write / 95% read
  • Workload C: Read-only, 100% read

Each of the above workloads can be executed using the following distribution types:

  • Uniform: All database records are uniformly accessed.
  • Zipfian: A few records in the database are accessed more often than other records.

The YCSB default data size is 1 KB records (1 field, 1000 bytes each, plus key).

The figure below shows the YCSB configuration setup for the three-node Oracle NoSQL cluster.

Figure 3: YCSB test configuration setup for Oracle NoSQL Cluster benchmark with Fusion ioMemory devices
Figure 3: YCSB test configuration setup for Oracle NoSQL Cluster benchmark with Fusion ioMemory devices

Test Results

You can see our test results in the graphs below:

Figure 4: Workload B (Mixed Workloads) Throughput and Latency Chart
Figure 4: Workload B (Mixed Workloads) Throughput and Latency Chart
Figure 5: Workload C (Read-Only Workloads) Throughput and Latency Chart
Figure 5: Workload C (Read-Only Workloads) Throughput and Latency Chart
 Figure 6: Oracle NoSQL, 2 and 3 shard scalability results
Figure 6: Oracle NoSQL, 2 and 3 shard scalability results

As you can see from figure 4 and 5, Fusion ioMemory provides excellent performance benefits for both mixed workload and read only workloads. Figure 6 signifies the scalability benefits from 2 shard to 3 shard configuration. Replication nodes in Oracle NoSQL cluster are organized into shards. A Shard contains a single replication node called the Master Node which is responsible for performing database writes while one or more replicas of the database is used for read-only operations.

Test Results And The Cost Benefit Analysis

I would like to highlight a few key points based on the test results and the cost benefit analysis:

  1. Mixed Workload: Fusion ioMemory delivers 19x to 48x performance benefit for Oracle NoSQL mixed workload operations at 16X to 45X reduction in latency, for larger data sets ranging from 256 GB to 1 TB. This improved performance is beneficial for applications such as photo tagging, where adding a tag is a write operation, while the rest of the operations are read intensive.
  2. Read-Only Workloads: Fusion ioMemory provided a 17x to 48x performance advantage over hard drives for read-only workloads, with latency improvements from 30x to 100x, compared to the same operations on HDDs. Caching user profiles in web applications is an example of this type of workload, and these performance benefits can greatly accelerate retrieving user profiles.
  3. Shard environment: As we scaled Oracle NoSQL from 2 shard to 3 shard environment with Fusion ioMemory, we realized a 45% to 79% improvement in throughput and a 35% latency reduction for dataset sizes of 256 GB to 1 TB.

Conclusion

Enterprises are increasingly adopting NoSQL databases in various industry verticals such as healthcare (patient sensors), automotive (auto sensors), utilities (smart meter analysis), etc. It is important to architect these NoSQL database systems to deliver maximum performance and scalability benefits in the most cost effective manner. Using Oracle NoSQL with Fusion ioMemory application accelerators delivers business the benefits of fast, Big Data. This combined solution deployment enables cost-effective consolidation by requiring fewer numbers of servers in a NoSQL cluster, and delivers higher performance and throughput when compared to (far more) servers utilizing large number of hard disk drives.

To learn more about SanDisk and Oracle solutions, join us at Oracle OpenWorld in booth #811. Learn more about what the Oracle solutions we’ll be showcasing here.

Related Stories

AI Evens the Playing Field in Sports