GCP PDE Exam Principles

Intro

This is a post in a series recording some notes as I’m studying for Google’s Professional Data Engineer Certification.

In this post, I provide some principles I am keeping in mind as I prepare for the exam.

Disclaimer

Please read this disclaimer.

GCP PDE Exam Principles

  • [security] Follow the Principle of Least Privilege
  • When between multiple options, the simplest answer is probably right
  • When between multiple options, the cheapest answer is probably right
  • An answer’s complexity is likely proporitional to the questions’ complexity
    • For example, simple questions like: “What is your favorite colour” probably has a simple answer
    • A more complex question (e.g. “What is the Air-Speed Velocity of an Unladen Swallow on a Tuesday in a head-wind of 30 kmph and 70% relative humidity”) probably requires a more complex answer
  • When creating Bigtable row keys, the key focus is to avoid hot spots
    • DON’T USE timestamp as start of rowkey for bigtable
  • Prefer pre-trained ML models to custom-trained
    • These solutions are typically good enough for basic use-cases and can be implemented more quickly
  • [security] Assign roles to groups and add users to groups (don’t apply roles to users)
  • If something is self-joined and running into problems when scaling, the answer likely involves normalizing
  • If question mentions “realtime” and pub/sub, it probably requires a push from pub/sub rather than a pull
  • [BQ] W/ multiple, wildcard tables, partitioning is best (over sharding)
  • Typically, serverless/cloud-native solution is favorable over managed services or dedicated machines/VMs
  • HDFS files should be stored in GCS
  • It is not uncommon for multiple services to be used together (b/c each service is often optimized to do one thing very well)
  • If the question involves “ANSI SQL”, the answer probably involves BigQuery