Introduction to Apache Pig, Parts I & II Has code image
Jeffrey Breen
This two-part class provides an intensive introduction to Pig for data transformations. You will learn how to use Pig to manage data sets in Hadoop clusters, using an easy-to-learn scripting language. The specific topics of the 120-minute class will be calibrated to your needs, but we will generally cover: What is Pig and why would I use it? understanding the basic concepts of data structures in Pig; and understanding the basic language constructs in Pig. We'll also create basic Pig scripts.

Prerequisites: This class will be taught in a Linux environment, using the Hive command-line interface (CLI). Please come prepared with the following:
•  Linux shell experience; the ability to log into Linux servers and use basic Linux shell (bash) commands is required
•  Basic experience connecting to an Amazon EC2/EMR cluster via SSH
•  Windows users should have a knowledge of CYGWIN and Putty
•  A basic knowledge of Vi would be helpful but not necessary

Also, bring your laptop with the following software installed in advance:
•  Putty (Windows only): You will log into a remote cluster for this class. Mac OS X and Linux environments include ssh (secure shell) support. Windows users will need to install Putty. Download from here.
•  A Text Editor: An editor suitable for editing source code, such as SQL queries. On Windows, WordPad (but not Word) or NotePad++ (but not Notepad) are suitable.

Level : Overview