This two-part class provides an intensive introduction to Pig for data transformations. You will learn how to use Pig to manage data sets in Hadoop clusters, using an easy-to-learn scripting language. The specific topics of the 120-minute class will be calibrated to your needs, but we will generally cover:
- What is Pig and why would I use it?
- Understanding the basic concepts of data structures in Pig
- Understanding the basic language constructs in Pig. We'll also create basic Pig scripts.
Prerequisites: This class will be taught in a Linux environment, using the Hive command-line interface (CLI). Please come prepared with the following:
- Linux shell experience; the ability to log into Linux servers and use basic Linux shell (bash) commands is required
- Basic experience connecting to an Amazon EC2/EMR cluster via SSH
- Windows users should have a knowledge of Cygwin and Putty
- A basic knowledge of Vi would be helpful but not necessary
Also, bring your laptop with the following software installed in advance:
- Putty (Windows only): You will log into a remote cluster for this class. Mac OS X and Linux environments include SSH (Secure Shell) support. Windows users will need to install Putty. Download putty.zip from here.
- A text editor: An editor suitable for editing source code, such as SQL queries. On Windows, WordPad (but not Word) or Notepad++ (but not Notepad) are suitable.