The Dark Sides of Predictive Modeling: Tricks and Pitfalls Has code image
Claudia Perlich
This class will teach advanced strategies to improve model performance, including multi-state modeling and advanced feature engineering. In addition, we will look at some of the most common pitfalls: over-fitting and leakage.

With the increasing set of tools available for data storage, management, consolidation and preprocessing, the art of exploratory data analysis is getting lost. Too often are we trying to make sense of data that is “cleaned” beyond recognition. It makes a major difference whether missing values have been removed or replaced by something else.
In the end, data analysts work with data that are not what they think they are; they are unaware of sampling biases or information “from the future” that really should not be allowed in the model. As a result, the models look initially very good, and fail when used in reality. We will also cover some strategies of managing really large amounts of data, including strategic sampling and grouping examples.

Prerequisites: You should have some experience in predictive modeling.

Level : Advanced