Taming Elephants, Bees and Pigs – The Big Data Circus
Ashish Thusoo
This class will discuss the reasons and motivations behind the Big Data revolution and how it has evolved from previous data-processing technologies. Hadoop's technical advantages and emphasis on scale over raw performance is primarily driven by the growth in variety of data sources, for example.

Based on real-world experience while at Facebook, the instructor will talk about some of the key challenges of scale and the evolution of these technologies out of necessity, starting with Hadoop, expansion with SQL on top, and adding microstrategy and business intelligence layers. This class will cover specific issues and solutions at Facebook, such as latency gaps in the infrastructure solved by caching results in the MySQL tier, and the investment made to build low-latency query engines on HDFS. This will lead to a discussion of business demands and the technical responses.

Finally, we will discuss the future of Big Data and how technology is continuing to simplify the process and become accessible for all. We are able to address noise in the data and use the cloud to simplify what to use and what not to use by hiding these technologies behind a comprehensive data platform.

Attend this class to learn about the issues that were encountered in the trenches at Facebook. They were addressed by the trailblazers and can now be handled even by smaller, leaner companies.

Level : Intermediate