From Big Data to Smart Data
Kurt Cagle
Tools for Big Data management typically concentrate on the volume or velocity side of the equation, but give short shrift to the variety and veracity aspects, despite the fact that these often determine the value of the processed data. A quiet revolution is taking place on the fringes of the Big Data movement as Hadoop and other Big Data tools have begun to be used to make data contextual—for resources to know not only about themselves (self-awareness) but also to know about how they relate to the rest of the world (external awareness).

This class looks at the collision of Big Data and Semantic Technologies, showing how Hadoop and similar tools can be used to work with RDF triple stores and SPARQL to describe, relate and infer the information that you need. SPARQL can be thought of as SQL for the Web, providing a way to query distributed and federated data systems, to discover the inherent data models of complex information and to use this to help these systems learn new things. In this class, you will dive into semantic technologies (RDF, Turtle, OWL, SPARQL) to see how such data works and what benefits it provides. Then it will look at MapReduce solutions both for ingesting RDF content and using this to drive inferencing of information.

Note: This class requires an understanding of MapReduce principles, data modeling, NoSQL databases and common industry interchange protocols (XML, JSON, RESTful Architectures). Knowledge of XQuery or XSLT2 is useful but not required. Given the volume of material to cover, this will be given as a lecture, but you will be able to test against a live database.

Level : Advanced