Advanced Schema Design Using HBase
Ravi Veeramachaneni
The first thing to consider when selecting a database is the characteristics of the data you are looking to leverage. If the data has simple tabular structure with a set of rows and columns like spreadsheet, then the relational model is sufficient. On the other hand, if data represents geospatial or engineering parts or others, they tend to be more complex. The data may have multiple levels of nesting and the complete data model can be complicated. In these cases, one should consider NoSQL databases as an option. The next big questions to ask are: “What is the volatility of the data model?” And is the  data model likely to change and evolve or is it most likely going to stay the same?" In other words, some flexibility is needed in the future.

HBase is a distributed, column-oriented data store to process large amounts of data in a scalable and cost effective way. Most of the developers, designers and architects have lot of background or experience in working on relational databases using SQL, making the transition to NoSQL challenging and often times confusing. Since HBase runs on top of Hadoop, HDFS and MapReduce (the core components of Hadoop) add additional complexity in making the transition to this new paradigm of data processing.

This class will help you understand the features available in HBase to design your schema and address the shortcomings of the technology, so that you can address them in your design at the early stages. The focus will be on sharing knowledge and real-world experiences to help you understand the full spectrum of technical and business challenges. The class will highlight recommendations and lessons learned in terms of schema design and transitioning to the new platform. At the end of the class, you will be in a better position to make right choices on schema design using HBase.

Level : Advanced