Bigtable in a Nutshell
Intro
This post is part of a series of posts with notes as I’m studying for Google’s Professional Data Engineer Certification.
This particular post covers Bigtable in a nutshell.
Disclaimer
Please read this disclaimer.
Bigtable
- h-base compliant
- Requires configuration and management of nodes
- Designing/choosing a good row-key (index) is critical to avoiding hot-spots (where some nodes have significantly more to process than others)
- Rows are sorted lexigraphically by row-key
- Generally a long, compound key (e.g.
{id}#{source}#{timestamp}
) - Order of a compound key matters!
- Principles:
- Don’t put timestamps first
- Don’t hash values (keep values human-readable as row-keys are lexigraphically sorted)
- Pad integers (and sometimes timestamps) so all row keys will be the same length and be reasonably sorted lexigraphically
- I’ve observed that keys often follow one of the following formats:
{large-component}#{small-component}#{timestamp}
{large-component}#{small-component}#{reverse-timestamp}
- Supports column families
- Performance increase linearly w/ nodes