GCP PDE Recipies
Intro
This is a post in a series recording some notes as I’m studying for Google’s Professional Data Engineer Certification.
In this post, I provide some common recipies which are commonly used in Google Cloud.
Disclaimer
Please read this disclaimer.
GCP PDE Recipies
The Classic
Pub/Sub -> Dataflow -> BQ (batch inserts)
Variations:
- Pub/Sub -> Dataflow -> BQ (streaming inserts)
- Pub/Sub -> Dataflow -> Bigtable -> BQ (querying Bigtable using federated query)
- For use-cases which require analytics and the low-latency afforded by Bigtable
Uploading Data to Google Cloud
- gsutil
- on-prem. (if practical based on network bandwidth and data size)
- Good for < 1TB
- Storage Transfer Service
- From another cloud/on-prem. data center w/ sufficient bandwidth
- Good for > 1TB
- Transfer Appliance
- Physical hard-drive you fill and send back
- For large amounts of data on-prem. and/or in a low-bandwidth location which makes gsutil impractical