I was recently going through the endless list of Cloud Products and Services offered by Google. While reading various articles over the Internet about the comparison of GCP with its’ competitors like AWS, Azure, I came across this well-written answer over Quora. I was instantly compelled to share this with everyone, where the author has not only accurately compared the various Cloud platforms but has put down facts proving why Google has always been a leader in every field it has dived in.
Listing down some major pointers that amazed me.
It is quite imperative to understand how GCP differs from other competitors such as AWS and Azure. Most people might not immediately comprehend this but GCP is technically far far superior to either of these can ever aspire to become. It makes sense to use Azure if your company thrives in the Microsoft ecosystem.
AWS offers myriads of products for different sets of industries but soon it can get overwhelming. I am not an expert on AWS but it is quite evident that AWS (Redshift specifically) cannot compete with BigQuery on sheer performance. BigQuery eats products such as Redshift for breakfast !! And there is a reason for that. AWS is built by an E-commerce website which did not build its products with a global focus. It is very important to understand this.
AWS’s infrastructure is mostly Regional such as North America, UK and few other places where it originally serviced its customers and thus they did not develop their infrastructure from ground up for Multi-regional products! E.g. they do not have anything like Pub/Sub which scales infinitely and that too globally!! Good luck with 2 million reads/writes per second for real-time streaming with AWS.
Google started its business as a search engine company. In the initial phase of their path, they faced numerous technical challenges regarding scaling out and had to develop solutions such as extremely very fast ways to scan all the web pages across the world. This required them to have a very fast internal network which should be able to connect different regions across the world with the least possible latency. That is the reason currently they have 1 Petabyte per second-speed internal network with most advanced fiber network technology!
Every 1 in 5 servers sold in the world is being owned by Google. Give a moment to wrap your mind around it for a sec! Such a powerful infrastructure enabled them to develop solutions such as BigQuery and BigTable where reading data from a separate storage such as cloud storage (read colossus) with almost zero second latency is a reality! So now data does not have to live on the compute nodes at all!
Even if Google offers its big query code as an open source, there would be no one in the world to use that code and replicate the technology. Reason being, no company can ever afford to own the kind of hardware data farms in their possession that Google has to ever be able to implement BQ. BigQuery scales to over 2000 nodes in a matter of few seconds. This kind of performance is atrociously fast and my jaw was dropped when we ran an SQL on 4 TB of Wikipedia data and output was generated in a matter of sub 30 Seconds!! Good luck doing that with AWS. This is one of the reasons why biggies such as Yahoo are moving from AWS to BigQuery.
Google is inherently an engineering company (not a social n/w or e-commerce company) and it shows in the way it offers its products. They have provided the white paper on GFS that led Doug Cutting to develop HDFS and thus Hadoop was born. But note that when Hadoop was still in its nascent stage and limping its way for the so-called distributed processing system on a massive scale, Google had already ditched MR as the main batch processing framework in 2004 for their products and had already started using much more advanced technologies such as colossus for storage and Dremel aka BigQuery.
Google has given many other technologies to Apache Open source foundation such as HBase, Beam(Batch and Stream), Mesos, Zookeeper etc. In GCP, they are simply offering all the technologies that they use within their organization and we all know they excel in running those kinds of complex data processing pipelines.
Google offers very few products (as compared to AWS) but they cover more or less everything that an average organization needs! Being an engineering company, their products are somewhat more skewed for requiring coding skills to configure your pipelines. E.g. streaming services such as Pub/Sub or DataFlow requires you to write a lot of code in Java or Python. But the good part is that they scale automatically and that too infinitely. You don’t have to worry about performance tuning or worrying about the quality of SQL etc. in your bigQuery cause resources is not a constraint.
Of course, you still need to make sure you take necessary steps to reduce the amount of data you are scanning in BQ to prevent charges (considering your organization has not taken the pay once and use unlimited BQ plan). Below are the main products that GCP offers:
- GCS (Google Cloud Storage)
- GCE (Google Compute Engine)
- Cloud SQL
- Cloud Spanner
- DataPrepData Studio
- Tensorflow and machine learning (supervised only) – Google is heavily marketing it and everyone knows this is the best-supervised ML solution there is. Period.
I would recommend cloning the git repositories provided by Google such as training-data-analyst etc. Not only does it have python/java code for Pub/Sub, Dataflow, Datalabs, Studio etc. which provides excellent hands on.. it can also be leveraged to develop your own application code for your organization. For dataFlow, Google provides multiple templates which take care of many use cases. Also, the code for each template is provided on Git and I will highly recommend that you use that for reference and modify it according to your needs.
I’ll definitely motivate everyone to go through the GCP documentation and have a look at the fabulous technology stack that it offers.