As more enterprises embrace multi- and hybrid-cloud strategies, Leverege wanted to evaluate how we might support our IoT platform on AWS. On GCP, our architecture consists of microservices running on GKE, using Pub/Sub as the message queue, Cloud Functions for serverless workloads, and Firebase and BigQuery for our real-time and OLAP databases. Not all of the products had one-to-one match in terms of feature parity, so we had to consider alternatives.
We summarize our findings below, but the bottom line is that this evaluation process taught us that GCP is still the best cloud for IoT. Ultimately, we decided not to support AWS and to continue to commit to GCP.
GCP uses Projects under Organizations to logical separation, whereas AWS requires new Accounts that can be grouped under AWS Organization. Once AWS Organization was set up, the difference between the two paradigms were negligible, but switching between Projects on GCP was easier (at least on the web console) compared to switching accounts on AWS.
The big difference within Projects/Accounts is that GCP does not group services by regions using their global backbone. This is a significant boost to companies running multi-region architectures, but since we were mostly using single region deployments, the difference was minimal.
GKE provides an excellent Kubernetes experience, taking care of the management of the master node, networking plugins, seamless upgrades, logging/monitoring, as well as autoscaling. On EKS, nothing comes pre-configured besides the master node. The first challenge we ran into was how difficult it was to spin up a new cluster via the console UI. In the end, we settled on Terraform to create the EKS cluster, but the experience was not as easy as simply clicking create cluster on the GKE console.
The next thing we noticed was the lack of Kubernetes tools. While GKE provides a web management UI for GKE workloads that is integrated with Stackdriver for monitoring, EKS expects users to install metrics server and Kubernetes Dashboard. Sharing access to Kubernetes Dashboard also requires another step to either use an Identity-Aware Proxy or a VPN to protect against unauthorized access. Running StatefulSets on EKS was also tricky as getting Cluster Autoscaler to respect EBS volume location meant having to deploy Cluster Autoscaler per node group based on workload type. Finally, running Kubernetes upgrades required manual patching of Kubernetes components on EKS whereas GKE automatically updated components based on a schedule within a maintenance window.
At Leverege, we use Firebase for two purposes: (1) as a NoSQL database to store last known state of IoT devices, (2) use push mechanisms to update all Firebase clients (e.g. iOS/Android apps, web applications). On AWS, there was no single product to replace this feature. One option was to use DynamoDB Streams and write custom listeners on all clients. The other recommended approach was to push data to AWS AppSync and use its features. None of the options were ideal given the complexities, so we decided to keep Firebase and treat it as a SaaS solution for a key-value store.
The huge draw of BigQuery on our end was the decoupling of storage and analysis. We can stream all IoT data into BigQuery, but only be charged for how much data was queried. Since our read to write ratio was heavily geared towards writes, BigQuery’s pricing model was optimized for our usage. On the other hand, RedShift bundles storage and analysis, so the cost of pre-provisioning nodes to run RedShift was prohibitive until we scaled our read workloads. The compromise here was to use Snowflake and run it on AWS. This way, we could still decouple storage and analysis, yet have the underlying system run on AWS infrastructure.
Cloud Pub/Sub is the only managed messaging service GCP provides. On AWS, there is Kinesis, SNS, and SQS that serve different needs. Since most of our payloads are small and require multiple consumers (e.g. data being written to both raw data store and historical data storage), Pub/Sub’s multiple publisher-subscriber relationships worked great with our message flow. On AWS, Kinesis was too expensive for small payloads, and SNS/SQS alone did not support the fan-out architecture we needed. The solution was to have multiple SQS queues subscribe to the same SNS topic to replicate the Pub/Sub behavior we needed.
Google holds the definitive edge in containers and Kubernetes. After all, Google invents Kubernetes and has the most experience running containers in the world. Although AWS had comparable products in other categories, our evaluation process led us to conclude that GCP is still the best cloud for IoT with its unrivaled security, leading AI/ML products, and the lowest overall costs.