Intro to Machine Learning: A Crash Course, Parts I and II

Paco Nathan

This two-part, 120-minute class provides a crash-course introduction to Machine Learning. We'll start by defining the terminology, making comparisons with the related fields of statistical inference and optimization theory, and then review some history of ML, from early neural nets onward. We'll consider a process for feature engineering, with emphasis on using tools for data prep and visualization, plus how to grapple with dimensional reduction.

The remainder of the practice will be divided into three parts: Representation: a survey of useful algorithms, including probabilistic data structures, text analytics and NLP, plus issues to consider; Evaluation: distinguishing how some methods work better for given use cases, including issues of overfitting, bias, etc., and the use of quantitative measures; and Optimization: methods for improving on a good thing, including how to move from graph theory to sparse matrices, ensemble models, plus a look at ML competition platforms. We'll conclude with suggestions for where to continue further studies.

Prerequisites: some familiarity with programming, probability, statistics, linear algebra, and calculus. We will be programming in R and Python, along with some bits of Hadoop and Spark.

Note: This class is part lecture and part hands-on; you are required to bring a laptop.

Level : Intermediate