When Western Digital started its Fourth Industrial Revolution (4IR) efforts seven years ago, no one was eager to let machine learning take over factory floors.
Manufacturing millions of storage units is a complex operation. Hard disk drives have hundreds of components and design parameters many times smaller than the width of human DNA. One small mistake can lead to thousands of defunct units or bring production to a halt.
Letting algorithms take over the production line seemed risky. So, the company decided to start by tackling testing processes, an operation that takes place only after manufacturing and assembly are complete.
Testing times
Storage devices undergo exhaustive testing and stringent validation processes. Hard drives may undergo several weeks of testing. Jackie Jung, Western Digital’s global operations VP of strategy and chief of staff leading the company’s 4IR efforts, carefully examined the operation and asked, “can we predict more so that we can test more efficiently?”
“Testing processes needed a rethinking”
Together with a group of domain experts, Jung began investigating different variables that could help predict hard drive health.
A few months later, the team had advanced algorithms crunching through more than 2,000 parameters that could reliably point to a hard drive’s health condition in real time. This allowed testing and optimization processes to be tailored for each drive produced.
It took another few months for the team to carefully integrate the algorithm-based enhancement into high volume manufacturing. Once in production, testing efficiency improved by a remarkable 15%.
It was a data science home run and the beginning of a prodigious transformation of Western Digital’s manufacturing, supply chain, and operations. But on this journey, the company found itself up against another testing challenge.
Hard data
Enterprise hard drives are where the ginormous data of the digital economy lives (think search engines, social apps, and the massive data of cloud-based services). Cloud giants maintain so much data that they buy hard drives in the thousands, tens of thousands, and some even hundreds of thousands.
“The world’s largest cloud companies can deploy a million drives”
Any quality issue for these titans is both a potential disaster to their services and a gargantuan headache. Imagine having to service 30,000 data center components. It’s not outlandish; it’s the brave world of hyperscale.
To ensure utmost reliability, the testing and qualification of enterprise hard drives is rigorous, exhaustive, and extraordinarily long.
The sea of testers
Qualifying an enterprise-grade hard drive can take up to four weeks. With Western Digital’s manufacturing site in Prachinburi, Thailand, producing more than 100,000 enterprise hard drives every day, the factory depends on a huge amount of testing slots to keep operations from bottlenecking.
Altogether, there are more than 1 million slots today and the number will be rapidly increasing. These slots are sprawled across automated machines the size of a bus. Some with more than 12,000 slots.
If the slots were stacked on top of each other, they’d reach more than 40 miles high, far into the Earth’s mesosphere (where meteors burn up).
Keeping this army of sophisticated machines as productive as possible is no small challenge. Even if only 0.5% of tester slots report an error, that’s still over 6,000 slots that need attention every day.
A web of connected assets
Peter Pang knows every crevice in the belly of the HDD test slot beast. As the director of automation and test systems engineering at Western Digital, his team of engineers is responsible for everything from developing the test routines to working with hundreds of technicians to keep systems running.
He also knew that the daily maintenance of these testers was labor-intensive and slowing things down. Whenever a slot would shut down due to a failure, a technician would frequently need to grab a 12-foot ladder, climb up, remove the slot, and take it to a lab to investigate the issue.
So, Pang teamed up with manufacturing engineers, analytics teams and IT. Together, the cross-functional team turned the ocean of standalone testers into a web of connected assets.
“We had to link standalone testers from different vendors and generations spanning 15 years”
Testers can now provide real-time data about hardware health and operational conditions. Slots can be addressed from a remote station without technicians having to leave their desk.
And, machine learning algorithms have learned how to pinpoint root cause issues and automatically perform self-healing operations.
The amount of data collected every second by this bustling operation is staggering; the outcome impressive — a 6% improvement in hardware utilization or 78,000 slots.
“Being a smart factory is not about how much data is collected,” said Pang, “it’s about the quality of data and understanding what to do with it.”
Data science cavaliers
Since their first success, Jung and Western Digital’s senior director of data science, George Ng, have been at the heart of Western Digital’s Herculean transformation. The company has upskilled thousands of its factory workforce, multiplied productivity, and reduced emissions through data-driven logistics.
AI, analytics, and automation have been a gamechanger. They have helped improved hard drive reliability and increased enterprise SSD quality eightfold. Altogether, the successful implementation of 4IR technologies saves Western Digital hundreds of millions of dollars every year.
These achievements have not gone unnoticed. The company’s HDD factory in Thailand and its flash manufacturing site in Malaysia were recognized by the World Economic Forum’s Global Lighthouse Network.
This network of world leaders in the adoption of 4IR technologies looks at more than cutting-edge technologies. It’s about how shifting models of operation can make for better agility, resilience, and help deliver on the promise companies have to people and the environment.
“Innovation isn’t just about technology,” said Ng. “It’s a purpose.”