Programming with Scalding and Algebird
This is a hands-on coding tutorial. We will code up a few Scalding programs in different domains: portfolio optimization, healthcare, cosine similarity and random forests. While Scalding looks like a thin Scala API atop Cascading, this appearance is deceptive. The power of Scala combined with the mapping, grouping and joining primitives in Scalding, along with the Algebird abstract algebra library, allow for a whole new level of flexibility with Big Data. Matrix operations in Scalding are powered by Algebird, and using large dimension matrices as a primitive, we can tackle problems in diverse domains that employ linear algebra over very large datasets in a batch mode.
Note: You are expected to have installed Scala, Scalding and Algebird on your laptop before the tutorial commences.
Access the slides and Scala code here: https://github.com/krishnanraman/bigdata.
Level : Advanced