Rob Johnson and Bradley C. Kuszmaul
This tutorial will explore data structures and algorithms for big databases. The topics include:
- Data structures including B-trees, Log Structured Merge Trees, Streaming B-trees, and Fractal Trees.
- Bloom filters and Bloom filter alternatives designed for SSDs.
- Index design, including covering indexes.
- Getting good performance in memory.
- Cache efficiency including both cache-aware and cache-oblivious data structures and algorithms.
These algorithms and data structures are used both in NoSQL implementations such as HBase, LevelDB, MongoDB, and TokuMX, and in SQL-oriented implementations such as MySQl and TokuDB. We will explain the underlying data structures and algorithms that drive big databases. There will also be an emphasis on write-optimized data structures, such as Log Structured Merge Trees or Fractal Trees.
This tutorial includes explaining and analyzing data structures, so it might not be aimed at someone who hates seeing O(N log N); however, the content will be accessible so that anyone who can tolerate some math will benefit from attending.