Hadoop by Example
This class is designed to demonstrate the most commonly used MapReduce design patterns for various problems. Performance and scalability will be taken into consideration.
The class will present a general overview of the problems that can be solved using MapReduce, scalability and performance tuning for clusters of different sizes. The techniques described here can be used on all Hadoop distributions.
The following technical problems will be covered:
• “Hello world!” of the MapReduce universe—a word count example
• Mapping only MapReduce jobs and their usage for ETL-type jobs
• Global sorting techniques
• Sequencing files and its usage in MapReduce jobs
• Mapping files and its usage in MapReduce jobs
• Reduce-side join and its advantages and limitations
• Map-side join and its advantages and limitations
Each technique will be provided with a code example that can be used as a template. No prior knowledge about the topic is required, however, some Java knowledge is recommended.
Level : Intermediate