Hadoop technology just turned 10. During my career, I was fortunate to work for Yahoo where Hadoop was born– from a technical paper to the first operational enterprise data platform. I built the very first research engineering team from the ground up to form the first operational grid clusters with Hadoop running and fully operational, handling operational loads in the order of petabytes of data. I was also part of the team working on the open-sourcing of Apache Hadoop.
Over the years, Hadoop has gained tremendous momentum, giving birth to many distributions with wide adoption across enterprises. It has become completely integrated into the de facto Big Data platform stack. It’s robust, very reliable, scalable, and enterprise grade. However, as progress marches on, some components of the traditional architecture are coming of age, and a new approach to Hadoop architecture is emerging.
HDFS has served as the primary storage system used by Hadoop. It is a distributed file system that provides high performance access to data across Hadoop clusters. It has become the distributed file system of choice for many enterprises managing large pools of Big Data, and enabling Big Data analytics applications.
But the nature of progress is that technology is in continuous evolution. More compelling systems emerge with better architectures and better storage.
So what’s the right Hadoop architecture for your Big Data analytics – shared or distributed?
What’s Right for Me?
In a recent webinar, I compared and contrasted the two current approaches. The original HDFS approach utilizes storage co-located with the compute servers. An emerging alternative relies on dedicated storage resources shared by the compute cluster.
I wanted to provide definitive guidelines to planners and architects in order to help them identify the best solutions for their needs when implementing Hadoop.
You can stream the webinar, on-demand, for free. Feel free to reach out in the comments below with your questions.
WEBINAR: Shared or Distributed HDFS – What’s Right for Me? Stream it here
Stay up to date
By providing your email address, you agree to the terms of Western Digital's Privacy Statement
Janet George is a Fellow and Chief Data Scientist at Western Digital. She is a technical leader with more than 15 years of experience in Big Data Platform, machine learning, distributed computing, compilers, and Artificial Intelligence. Previously, she served as managing director, chief scientist, and Big Data expert at Accenture technology labs. She has also served as head of Yahoo Labs Research Engineering, inventing next-generation platforms, cloud infrastructures, and machine learning for Big Data, as well as at eBay and Apple. Janet holds a Bachelors and Advanced Master Degree with distinction in Computer Science, Mathematics, with a thesis focus on AI and ML.