Organizing Big Data from the Internet
While many of us make extensive use of Big Data already, arguably the biggest and most useful data set is the public Internet itself. Methods for harnessing this data can be cumbersome, but with the right framework in mind, it doesn't have to be. This class will present a framework for connecting to public data sets, organizing discovered information into a structured format, and using this structured data to identify entities and resolve them to those in your own data set. The result is that your original data set is enriched by the Internet, with more entries, and richer data on each entry.
This class will provide a clear view of the steps and implementation needed to orchestrate data feeds for a category of interest, and how to use big data processes to organize and resolve incoming data to existing data sets. By utilizing the public Internet as a data source for filling in sparse data, big data becomes more complete and actionable.
Level : Overview