Bigtable in a Nutshell

Intro

This post is part of a series of posts with notes as I’m studying for Google’s Professional Data Engineer Certification.

This particular post covers Bigtable in a nutshell.

Disclaimer

Please read this disclaimer.

Bigtable

  • h-base compliant
  • Requires configuration and management of nodes
  • Designing/choosing a good row-key (index) is critical to avoiding hot-spots (where some nodes have significantly more to process than others)
    • Rows are sorted lexigraphically by row-key
    • Generally a long, compound key (e.g. {id}#{source}#{timestamp})
    • Order of a compound key matters!
    • Principles:
      • Don’t put timestamps first
      • Don’t hash values (keep values human-readable as row-keys are lexigraphically sorted)
      • Pad integers (and sometimes timestamps) so all row keys will be the same length and be reasonably sorted lexigraphically
    • I’ve observed that keys often follow one of the following formats:
      • {large-component}#{small-component}#{timestamp}
      • {large-component}#{small-component}#{reverse-timestamp}
  • Supports column families
  • Performance increase linearly w/ nodes