Mapping GCP Products to Open-Source Systems

This is the first post in a series recording some notes as I’m studying for Google’s Professional Data Engineer Certification.

The plan is that each post will discuss a particular service, relevant to the GCP PDE cert.


Mapping GCP Products to OSS

I find it helpful to map GCP Products to open-source software systems used by each GCP product.

Here’s a table with my current understanding:

GCP ProductOpen-Source SystemDescription
CloudSQLmysql/postgresqlCloudSQL provides managed instances of common databases
DataprocApache Hadoop/Spark
Cloud StorageHDFSA common use-case is to store HDFS (from DataProc or a Hadoop cluster running on a VM) in Cloud Storage
DataflowApache Beam
Cloud ComposerApache Airflow
MemorystoreRedis and Memcached
FirestoreMongoDBFirestore is not API comptabile w/ MongoDB but is conceptually similar