Hadoop Data Warehousing with Hive Has code image
Dean Wampler
In this hands-on tutorial, you’ll learn how to use Hive for Hadoop-based data warehousing. You’ll also learn some tricks of the trade and how to handle known issues.

We’ll spend most of the tutorial using a series of hands-on exercises with actual Hive queries, so you can learn by doing. We’ll go over all the main features of Hive’s query language, HiveQL, and how Hive works with data in Hadoop. We’ll also contrast Hive with relational and non-relational database options.

Hive is very flexible about table schemas, file formats, and where the files are stored. We’ll discuss real-world scenarios for the different options. We’ll briefly examine how you can write Java user defined functions (UDFs) and other plugins that extend Hive for data formats that aren’t supported natively.

You’ll learn Hive’s place in the Hadoop ecosystem, such as how it compares to other available tools. We’ll discuss data organization and configuration topics that ensure best performance and ease of use in production environments.

Side notes: This tutorial is suitable for beginning data analysts and software developers. Bring your laptop pre-installed with a suitable secure shell (ssh) client, such as Putty (installer zip) for Windows. MacOS and Linux systems come preconfigured with ssh.

Prerequisite knowledge: Some prior SQL experience will be assumed.

Level : Overview