Mapping GCP Products to Open-Source Systems
Intro
This is the first post in a series recording some notes as I’m studying for Google’s Professional Data Engineer Certification.
The plan is that each post will discuss a particular service, relevant to the GCP PDE cert.
Disclaimer
Please read this disclaimer.
Mapping GCP Products to OSS
I find it helpful to map GCP Products to open-source software systems used by each GCP product.
Here’s a table with my current understanding:
GCP Product | Open-Source System | Description |
---|---|---|
BigTable | Hbase | |
CloudSQL | mysql/postgresql | CloudSQL provides managed instances of common databases |
Dataproc | Apache Hadoop/Spark | |
Cloud Storage | HDFS | A common use-case is to store HDFS (from DataProc or a Hadoop cluster running on a VM) in Cloud Storage |
Dataflow | Apache Beam | |
Cloud Composer | Apache Airflow | |
Memorystore | Redis and Memcached | |
Firestore | MongoDB | Firestore is not API comptabile w/ MongoDB but is conceptually similar |