Bigtable in a Nutshell

#Google Cloud #Professional Data Engineer Certfication #Bigtable #No-SQL #Database

Intro

This post is part of a series of posts with notes as I’m studying for Google’s Professional Data Engineer Certification.

This particular post covers Bigtable in a nutshell.

Disclaimer

Please read this disclaimer.

Bigtable

h-base compliant
Requires configuration and management of nodes
Designing/choosing a good row-key (index) is critical to avoiding hot-spots (where some nodes have significantly more to process than others)
- Rows are sorted lexigraphically by row-key
- Generally a long, compound key (e.g. {id}#{source}#{timestamp})
- Order of a compound key matters!
- Principles:
  - Don’t put timestamps first
  - Don’t hash values (keep values human-readable as row-keys are lexigraphically sorted)
  - Pad integers (and sometimes timestamps) so all row keys will be the same length and be reasonably sorted lexigraphically
- I’ve observed that keys often follow one of the following formats:
  - {large-component}#{small-component}#{timestamp}
  - {large-component}#{small-component}#{reverse-timestamp}
Supports column families
Performance increase linearly w/ nodes

← prev next →