Monitoring Azure Kubernetes Service (AKS) with Azure Monitor & Log Analytics

Part of running Kubernetes is being able to monitoring the cluster, the nodes, and the workloads running in it. Running production workloads regardless of PaaS, VM’s, or containers requires a solid level of reliability. Azure Kubernetes Service comes with monitoring provided from Azure bundled with the semi-managed service. Kubernetes also has built in monitoring that can also be utilized.

It is important to note that AKS is a free service and Microsoft aims to achieve at least 99.5% availability for the Kubernetes API server on the master node side.

But due to AKS being a free service Microsoft does not carry an SLA on the Kubernetes cluster service itself. Microsoft does provide an SLA for the availability of the underlying nodes in the cluster via the Azure Virtual Machines SLA. Without an official SLA for the Kubernetes cluster service it becomes even more critical to understand your deployment and have the right monitoring tooling and plan in place so when an issue arises the DevOps or CloudOps team can address, investigate, and resolve any issues with the cluster.

The monitoring service included with AKS gives you monitoring from two perspectives including the first one being directly from an AKS cluster and the second one being all AKS clusters in a subscription. The monitoring looks at two key areas “Health status” and “Performance charts” and consists of:

Insights – Monitoring for the Kubernetes cluster and containers.

Metrics – Metric based cluster and pod charts.

Log Analytics – K8s and Container logs viewing and search.

Azure Monitor

Azure Monitor has a containers section. Here is where you will find a health summary across all clusters in a subscription including ACS. You also will see how many nodes and system/user pods a cluster has and if there are any health issues with the a node or pod. If you click on a cluster from here it will bring you to the Insights section on the AKS cluster itself.

If you click on an AKS cluster you will be brought to the Insights section of AKS monitoring on the actual AKS cluster. From here you can access the Metrics section and the Logs section as well as shown in the following screenshot.

Insights

Insights is where you will find the bulk of useful data when it comes to monitoring AKS. Within Insights you have these 4 areas Cluster, Nodes, Controllers, and Containers. Let’s take a deeper look into each of the 4 areas.

Cluster

The cluster page contains charts with key performance metrics for your AKS clusters health. It has performance charts for your node count with status, pod count with status, along with aggregated node memory and CPU utilization across the cluster. In here you can change the date range and add filters to scope down to specific information you want to see.

Nodes

After clicking on the nodes tab you will see the nodes running in your AKS cluster along with uptime, amount of pods on the node, CPU usage, memory working set, and memory RSS. You can click on the arrow next to a node to expand it displaying the pods that are running on it.

What you will notice is that when you click on a node, or pod a property pane will be shown on the right hand side with the properties of the selected object. An example of a node is shown in the following screenshot.

Controllers

Click on the Controllers tab to see the health of the clusters controllers. Again here you will see CPU usage, memory working set, and memory RSS of each controller and what is running a controller. As an example shown in the following screenshot you can see the kubernetes dashboard pod running on the kubernetes-dashboard controller.

The properties of the kubernetes dashboard pod as shown in the following screenshot gives you information like the pod name, pod status, Uid, label and more.

You can drill in to see the container the pod was deployed using.

Containers

On the Containers tab is where all the containers in the AKS cluster are displayed. An as with the other tabs you can see CPU usage, memory working set, and memory RSS. You also will see status, the pod it is part of, the node its running on, its uptime and if it has had any restarts. In the following screenshot the CPU usage metric filter is used and I am showing a containers that has restarted 71 times indicating an issue with that container.

 In the following screenshot the memory working set metric filter is shown.

You can also filter the containers that will be shown through using the searching by name filter.

You also can see a containers logs in the containers tab. To do this select a container to show its properties. Within the properties you can click on View container live logs (preview) as shown in the following screenshot or View container logs. Container log data is collected every three minutes. STDOUT and STDERR is the log output from each Docker container that is sent to Log Analytics.

Kube-system is not currently collected and sent to Log Analytics. If you are not familiar with Docker logs more information on STDOUT and STDERR can be found on this Docker logging article here:  https://docs.docker.com/config/containers/logging.

Read more

OMS: Service Map overview

Recently the Operations Management Suite (OMS) team at Microsoft announced the private preview of Service Map in OMS formally known as Application Dependency Map. Service Map has been a long awaited feature in OMS. Service Map is a feature that is a part of OMS that discovers and maps Windows & Linux app and system dependencies. Service Map displays these dependencies in application maps within OMS. Service Map did not start with OMS. It actually started as a standalone product named Fact Finder and later was integrated with SCOM. The integration of FactFinder with SCOM allowed Bluestripe to automatically create Distributed Applications in SCOM. Well Microsoft acquired BlueStripe and the rest is history.

In this post I will set out to explore and break down Service Map, how it is installed, info about the agent, how it works, key points about it, how the data flows and more. NOTE: Click on any of the images in this post to display larger in a new window. Also this post is my first effort in taking one of my PowerPoint’s and converting into a post! The following graphic describes some of the benefits of having application maps including in your monitoring solutions along with information about FactFinder:

oms-servicemap-overview-1

Now let’s take a look at what Service Map does and how it looks.

oms-servicemap-overview-2

Now let’s take a look at one of the Service Maps aka Application Maps in OMS. Notice on the left hand side the breakdown of the interface. In Service Map there is a focus machine in the center. There are front end and back end connections into that focus machine. These are the dependencies flowing in and out of the focus machine giving the mappings. Notice on the left-hand side you can control the time controls and select either a Windows or Linux machine from the list. Finally, on the left-hand side are the details of the current selection. The current selection can be a machine or process.

oms-servicemap-overview-3

Also notice that SM integrates with Change Tracking, Alerts, Performance, Security, and updates. What this means is that when you have a focus machine selected you can click on the corresponding solution on the right hand. When you click on the solution i.e. updates or security the update or security dashboard widget will be shown and you can drill down from there for further detail.

oms-servicemap-overview-4

oms-servicemap-overview-5

A common question that comes up when discussion Service Map is how does it work. The following graphic displays the process from the solution add to the actual mapping within OMS.

oms-servicemap-overview-6

Other key information about Service Map is detailed in the following graphics.

oms-servicemap-overview-7

The next graphic looks at deploying the SM agent and locations for logs. The process is as simple as downloading and installing the agent from OMS.

Here is some more critical information you need to know about the SM agent.

oms-servicemap-overview-9

This next graphic details how Service Map dependency data flows into OMS.

oms-servicemap-overview-10

At this current time Service Map supported Operating Systems at this time are:

Windows Linux
  • Windows 10
  • Windows 8.1
  • Windows 8
  • Windows 7
  • Windows Server 2016
  • Windows Server 2012 R2
  • Windows Server 2012
  • Windows Server 2008 R2 SP1
  • Oracle Enterprise Linux 5.8-5.11, 6.0-6.7, 7.0-7.1
  • Red Hat Enterprise Linux 5.8-5.11, 6.0-6.7, 7.0-7.2
  • CentOS Linux (Centos Plus kernel is not supported)
  • SUSE Linux Enterprise Server 10SP4, 11-11SP4

Service Map’s computer and process inventory data is available for search in OMS Log Analytics. This is very cool as the log analytics and searching capability in OMS is powerful and most important very FAST. Having application components, service dependencies, and supporting infrastructure configuration data at your fingertips through the log analytics gives you a powerful troubleshooting and forensics tool. I am sure over time the query capabilities will be expanded to include even more.

 oms-servicemap-overview-11  oms-servicemap-overview-12
Type=ServiceMapComputer_CL Type=ServiceMapProcess_CL

A few Service Map Log Analytic query examples:

List the physical memory capacity of all managed computers:

Type=ServiceMapComputer_CL | select TotalPhysicalMemory_d, ComputerName_s | Dedup ComputerName_s

List computer name, DNS, IP, and OS version:

Type=ServiceMapComputer_CL | select ComputerName_s, OperatingSystemVersion_s, DnsNames_s, IPv4s_s | dedup ComputerName_s

List Process Map by process name:

Type=ServiceMapProcess_CL (ProductName_s=TeamViewer)

Thanks for reading and I hope you enjoyed this post on OM’s Service Map. Now go out and add the public preview right away.

Read more