Analyzing Big Data with Hive
Shrikanth Shankar
While Apache Hive is designed to allow users to leverage their SQL skills for Big Data analysis, it’s still a relatively new data warehouse infrastructure based on Hadoop and MapReduce operations. In this tutorial, you will see how Hive can be optimized to outperform expectations.


We will begin with a brief overview of Big Data and Apache Hive, its pros and cons, and focus on the key differences between Hive and traditional data warehouses built on top of relational databases. This introduction builds the foundational perspective for you to understand the key strategies of the operational segment.

During the hands-on portion of the tutorial, we will cover a variety of techniques to increase performance and simplify Hive. Operational topics may include Data Modeling in Hive, Hive Query Language constructs, features and syntax, the Hive Execution Model using MapReduce, and Advanced Optimization. Best Practices of core operations will also be discussed and demonstrated, as well as an opportunity for a Q&A. The tutorial will conclude with recommendations and insight on the future of Hive, including developing tools such as Apache Tez.  

Level : Intermediate