One of the downsides of microservice architecture is increased complexity in monitoring. How does one monitor a cluster of distributed applications that are communicating with each other? First, we need to monitor the health of individual pods and applications. Is the pod scheduled and deployed as intended? Are the applications inside those pods running without errors and without degradation in performance? Second, we need to monitor the health of the entire Kubernetes cluster. Is Kubernetes properly handling the resource utilization of each node? What about the health of all the nodes?
In this chapter, we’ll examine some of the native tools Kubernetes provides as well as some Google Cloud Platform (GCP)-specific and open-source tools that we’ve found useful in production. Please note that this chapter is by no means a comprehensive overview of all available monitoring solutions for Kubernetes users. However, a combination of StackDriver and Prometheus/Grafana have proved to be a robust and reliable tool for our IoT deployments on Google Kubernetes Engine (GKE).