Bigtable in a Nutshell
This post is part of a series of posts with notes as I’m studying for Google’s Professional Data Engineer Certification.
This particular post covers Bigtable in a nutshell.
Please read this disclaimer.
- h-base compliant
- Requires configuration and management of nodes
- Designing/choosing a good row-key (index) is critical to avoiding hot-spots (where some nodes have significantly more to process than others)
- Rows are sorted lexigraphically by row-key
- Generally a long, compound key (e.g.
- Order of a compound key matters!
- Don’t put timestamps first
- Don’t hash values (keep values human-readable as row-keys are lexigraphically sorted)
- Pad integers (and sometimes timestamps) so all row keys will be the same length and be reasonably sorted lexigraphically
- I’ve observed that keys often follow one of the following formats:
- Supports column families
- Performance increase linearly w/ nodes