The Hadoop Ecosystem: Putting the Pieces Together Has code image
Jonathan Seidman
Everybody's talking about Hadoop and Big Data, and a number of companies are undertaking efforts to explore how Hadoop can be applied to optimize their data-management and processing processes, as well as address challenges with ever-growing data volumes. Unfortunately, there's still a lack of understanding of how Hadoop can be leveraged, not to mention how the tools in the Hadoop ecosystem can be used together to implement data-processing pipelines.

This class will seek to provide clarity by first discussing some typical real-world use cases for Hadoop that are allowing companies to address challenges and derive tangible value. We'll then dive deeper to discuss specific tools in the Hadoop ecosystem such as Hive, Pig, Oozie, Flume, Sqoop and Mahout. More importantly, we'll discuss some example architectures to understand how these tools can be used together to create processing pipelines that implement some of these use cases. Since Hadoop isn't a panacea, we'll also discuss criteria for determining when Hadoop is a suitable fit and when it isn't, as well as some suggestions for getting started with a Hadoop pilot project.

Level : Intermediate