Hive, Pig, Cascading and Codd: A Crash Course in Map/Reduce Relational Languages via an Appeal to History
This tutorial will teach you how to simultaneously implement the relational operators as defined by famed computer scientist E.F. Codd in Hive, Pig and Cascading. You, the developer, will focus on the abstract concepts of the Relational Algebra in order to learn ALL the languages simultaneously. Theory can sometimes be dry, but, in this case, revisiting Codd’s original intentions — and his seminal papers — can accelerate our learning of these new Big Data Relational Languages:
• What are the Relational Algebra operators?
- Cross, union, intersect and divide
• What is the Pig, Hive and Cascading syntax for these operators?
• How do high-level languages and libraries like Pig and Hive compile these operators to MapReduce?
• Why do the syntax of Pig, Hive and Cascading differ, and what are each trying to emphasize or deemphasize?
• Running Exercises: Practice HiveQL, Pig and Cascading/Cascalog/Scalding concepts as they are introduced.
• When to use Pig, Hive or Cascading over another
• Code and examples of each language
The following prerequisites ensure that you will gain the maximum benefit from the class:
• Programming experience: This is a developer’s course. We will write Hive, Pig, Cascading/Scalding/Cascalog applications. Prior programming experience is recommended.
• Linux shell experience: Basic Linux shell (bash) commands will be used extensively. Some prior experience is recommended.
• Experience with SQL databases: SQL experience is helpful for learning these languages, but not essential.
The main format of this tutorial will be follow-along, although all code will be provided in case you want to code simultaneously. You may log into remote EMR instances to build, test and run the applications if you wish, but it will be on your account, and the instructor may not be able to help you with debugging. You will also be provided with all the exercise software, so you can view it on your laptop if desired.
Level : Advanced