This tutorial will teach you how to solve Big Data problems using Hadoop in a fast, scalable and cost-effective way. It is designed for technical personnel and managers who are evaluating and considering using Hadoop to solve data-scalability problems.
We will start with Hadoop basics and discuss best practices for using Hadoop in enterprises dealing with large datasets. We will look into the current data problems you are dealing with and potential use cases of using Hadoop in your infrastructure. The presentation covers the Hadoop architecture and its main components: Hadoop Distributed File System (HDFS) and Map/Reduce. We will present case studies on how other enterprises are using Hadoop, and look into what it takes to get Hadoop up and running in your environment.
Two case studies will cover near-real-time data-processing scenarios and Hadoop cluster implementations for large clusters (2,000 to 4,000 nodes). The near-real-time case study can be used as guidance for building the infrastructure of near-real-time architecture. All components used in architecture are open-source under the Apache license, and will provide cost-effective solutions for solving Big Data problems.
By attending this tutorial, you will:
- Understand Hadoop main components and Architecture
- Be comfortable working with Hadoop Distributed File System
- Understand Map/Reduce abstraction and how it works
- Understand components of a Map/Reduce job
- Know best practices for using Hadoop in the enterprise
We will demonstrate the real-life code for basic Map/Reduce jobs and code for working with HDFS during the tutorial.