Hadoop by Example
This class is designed to demonstrate the most commonly used Map/Reduce design patterns for various problems. Performance and scalability will be taken into consideration.
The class will present a general overview of the problems that can be solved using Map/Reduce, scalability and performance tuning for clusters of different sizes. The techniques described here can be used on all Hadoop distributions.
The following technical problems will be covered:
• “Hello world!” of the Map/Reduce universe—a word count example
• Mapping only Map/Reduce jobs and their usage for ETL-type jobs
• Global sorting techniques
• Sequencing files and its usage in MapReduce jobs
• Mapping files and its usage in MapReduce jobs
• Reduce-side join and its advantages and limitations
• Map-side join and its advantages and limitations
Each technique will be provided with a code example that can be used as a template. No prior knowledge about the topic is required; however, some Java knowledge is recommended.
Level : Intermediate