Kubernetes monitoring is a form of reporting that helps with proactive management of clusters. Monitoring a Kubernetes cluster eases management of containerized infrastructure by tracking utilization of cluster resources including memory, CPU, and storage. Cluster operators can monitor and receive alerts if the desired number of pods are not running, if resource utilization is approaching critical limits, or when failures or misconfiguration cause pods or nodes to become unable to participate in the cluster.
Kubernetes Metrics Server aggregates data collected from the kubelet on each node, passing it through the Metrics API, which can then combine with a number of visualization tools. Some key metrics to consider tracking include:
The explosive growth of containers in enterprise-level businesses has brought many advantages to developers, DevSecOps, and IT teams worldwide. However, the flexibility and scalability that Kubernetes brings in deploying containerized applications also presents new challenges. Since there is no longer a 1-to-1 correlation between an application and the server it runs on, keeping track of the health of applications — abstracted by containers and abstracted once again by Kubernetes — can be daunting without the proper tools.
Because pods and their containers are in constant motion and dynamically scheduled, monitoring tools rely on services as their endpoint. Services expose an IP address that can be accessed from outside the pods, allowing services to communicate on a continual basis even as individual pods and containers are created and deleted.
The health of Kubernetes nodes indicates the ability of the node to run pods assigned to it. Kubernetes provides a node problem detector DaemonSet, which aggregates information on problems from node metrics daemons and sends them back to the APIserver reported as node conditions and events.
Prometheus is an open source system by the CNCF for collecting metrics on Kubernetes health. Prometheus installs data exporter pods on each node in the cluster, and its server collects data from nodes, pods, and jobs. The collected time-series data is saved into a database, and alerts can be automatically generated based on predefined conditions.
Prometheus has its own dashboard with limited capabilities, which have been enhanced by the use of external visualization tools like Grafana, which use the Prometheus database to enable sophisticated inquires, debugging, and reporting that can be tailored for dev, test, and production teams.
Prometheus offers support for bridging in data from other tools and can connect to a broad range of third-party databases.
The Kubernetes dashboard provides a way to manage cluster resources and debug containerized applications using a simple web interface. The Kubernetes dashboard offers a simple overview of resources both cluster-wide and on individual nodes. It also provides a rundown of all the namespaces in the cluster as well as all of the storage classes that have been defined. Some uses for the dashboard include:
There are three measures of the health of a Kubernetes pod: liveness, readiness, and startup condition. They are determined by probes that are managed by the kubelet.
The liveness probe determines if a container within a pod needs to restart. It can help identify when pods have become unresponsive.
The readiness probes indicate to the cluster when containers in the pod are ready to begin processing traffic. The pod is considered ready only when all of the containers in a pod are ready. If a pod is not ready, it will be removed from Service load balancers and will not receive incoming traffic.
The startup probe indicates when the application in the pod has started successfully. If a startup probe is in effect, both liveness and readiness probes will be disabled until it ensures other probes do not interfere with the startup process.
cAdvisor collects metrics about resource usage, historical data, and resource isolation at the cluster level and down to the container level. Its data collection is used as the underlying technology for many other metrics collection systems.
Since Kubernetes applications always run in application pods, the health of the applications can be measured by the liveness and readiness probes in the respective pods. If applications report they are ready to process new requests, and the node on which they are running is not reporting any errors, the application is likely healthy.