Introduction to MapReduce
Sriram Mohan
MapReduce is a programming framework introduced by Google in the early 2000s. It is targeted at solving problems that have to work on huge datasets. Rather than devising an algorithm that works on the entire dataset, the MapReduce framework works on several chunks of the same dataset in parallel during the Map phase and combines the results together during the Reduce phase. MapReduce can take advantage of locality of data, processing data on or near the storage assets to decrease transmission of data. In this class, we will cover:
  • An introduction to the MapReduce paradigm in Hadoop
  • A Code-walkthrough in Java. This part will cover everything from building your first MapReduce application in Java to writing custom MapReduce applications to perform complex sorts and joins.
  • How does MapReduce work in Hadoop? This part will provide an explanation for the internal workings (Shuffle-Sort) of MapReduce, job scheduling and failure handling in classical MapReduce.
  • Use cases for MapReduce. Are there scenarios that fit the MapReduce paradigm?
Level : Intermediate