A Raw Data Journal: A History of Everything
Chris Merrick
Every time you run an 'UPDATE' on your database, you lose information. If you want to analyze how your data is changing through time, or undo a mistake, you’re out of luck. Several next-generation database systems have been designed from the ground up to solve this problem by placing an emphasis on data versioning, but it can be done without living on the bleeding edge. In fact, you can start today with your existing database, without impacting performance, changing your application, or even going down for maintenance.

This class will introduce the concept of a "Raw Data Journal," which is a log of every state your data has ever been in, and share how you can start aggregating this journal in HDFS directly from your MySQL or PostgreSQL database. We'll then work through some examples of how to use the journal for analytics and disaster recovery.

Level : Intermediate