Big Data Science: Extracting Truth from Large, Multi-structured Data, Parts I & II
Come to this class for an overview of Big Data Science from planning to execution. We will start by covering the processes and best practices for successfully doing Data Science in a business setting. We will then review a use case and walk through the exploratory to confirmatory modeling stages. Each use case includes source code in R and Pig, with the goal of showing how to parallelize analysis in R over Hadoop.
The use case will highlight the advantages (and difficulties) of working with multi-structured data through text analysis, linear models and more. A theme throughout will be the importance of triangulating around truth when problems are intractable.
The following prerequisites ensure that you will gain the maximum benefit from the class:
• Programming experience: Big Data is still the Wild West of technologies, and programming skills are required to wrangle and analyze Big Data.
• Analysis experience: Although this is not a statistics course, understanding the principles of analysis as well as research methods will help in reapplying the lessons.
Level : Advanced