Analyzing Tweets with HBase, Parts I and II
This two-hour class will cover how to use the Twitter API to download and model tweets in HBase, and then run natural-language processing against them. We will first cover the architecture fundamentals of HBase, including log-structured merge trees, data models, memstores, HFiles and Bloom filters. Next, tweets will be populated into HBase. Finally, we will explore some of the more interesting analysis that can be done with the tweets and NLP. All code for this class is publicly released under a Creative Commons license.
Level : Intermediate