How to Fit a Petabyte in Apache HBase
There are many ways to load a petabyte of data in HBase, and this class will show you the best approaches! We will first review the solutions that are commonly adopted instinctively, and show why they fail. This will help you understand the practicalities of HBase's architecture.
The first technique that will be taught is better schema designs—that is, how to create keys that won't inflate the size of your data set. The second technique that will be presented is better management of the loading of the data through pre-splitting of the regions and tuning the cluster for that type of workload.
Finally, we’ll show you how to use bulk loading will be discussed as the best way to populate the database. The code that will be used for the demonstrations will be available on GitHub.
Level : Advanced