Hadoop architecture
Big Data

Which Hadoop Architecture is Right for Me?

Hadoop technology just turned 10. During my career, I was fortunate to work for Yahoo where Hadoop was born– from a technical paper to the first operational enterprise data platform. I built the very first research engineering team from the ground up to form the first operational grid clusters with Hadoop running and fully operational, handling operational loads in the order of petabytes of data. I was also part of the team working on the open-sourcing of Apache Hadoop.

Over the years, Hadoop has gained tremendous momentum, giving birth to many distributions with wide adoption across enterprises. It has become completely integrated into the de facto Big Data platform stack. It’s robust, very reliable, scalable, and enterprise grade. However, as progress marches on, some components of the traditional architecture are coming of age, and a new approach to Hadoop architecture is emerging.

Shared vs. Distributed HDFS

HDFS has served as the primary storage system used by Hadoop. It is a distributed file system that provides high performance access to data across Hadoop clusters. It has become the distributed file system of choice for many enterprises managing large pools of Big Data, and enabling Big Data analytics applications.

But the nature of progress is that technology is in continuous evolution. More compelling systems emerge with better architectures and better storage.

So what’s the right Hadoop architecture for your Big Data analytics – shared or distributed?

What’s Right for Me?

In a recent webinar, I compared and contrasted the two current approaches. The original HDFS approach utilizes storage co-located with the compute servers. An emerging alternative relies on dedicated storage resources shared by the compute cluster.

I wanted to provide definitive guidelines to planners and architects in order to help them identify the best solutions for their needs when implementing Hadoop.

You can stream the webinar, on-demand, for free. Feel free to reach out in the comments below with your questions.

WEBINAR: Shared or Distributed HDFS – What’s Right for Me?
Stream it here

Stay up to date

Get weekly insights on Big Data, Cloud and Virtualization from the IT Blog

 

Subscribe to this blog

Enter your email address to subscribe to this blog.

Janet George
Janet is a technical leader with more than 15 years of experience in Big Data Platform, machine learning, distributed computing, compliers and Artificial Intelligence. At Western Digital, Janet builds global core competencies, shaping, driving and implementing the Big Data platform, products and technologies, using advanced analytics and pattern matching with semiconductor manufacturing data from the ground up.  Prior, Janet served as managing director/chief scientist/Big Data expert at Accenture technology labs, responsible for Big Data Platform, Machine Learning, Cognitive Computing and open-innovation. She has served as head of Yahoo Labs/Research Engineering inventing next generation platforms, cloud infrastructures and machine learning for Big Data as well as at eBay and Apple amongst others. Janet holds a Bachelors and Advanced Master Degree with distinction in Computer Science, mathematics, with a thesis focus on Artificial Intelligence.