How to Integrate Structured and Unstructured Data with Avro
This class is designed to demonstrate how to use the Apache Avro Java serialization libraries with Hadoop frameworks to speed up integration of big volumes of unstructured data sets.
The class will teach you about the Avro framework. You will learn about:
• Avro schema definitions and data types
• Avro record creation
• How to create Avro schemas programmatically
• How to sort records by setting a property in schema
• How to read records from a file
• Hadoop MapReduce integration with Avro data
• Advantages of using Avro data over flat files or map files
• Specifics of the integrations with Mapper and Reducer code
• Avro format for MapReduce result output
• Cascading MapReduce jobs
• MapReduce for converting flat files into Avro data
We’ll use two real-life examples demonstrate the advantage of using Avro versus regular files.
Level : Advanced