Observability with .Net Core and Elastic APM


Implementing the concept of observability with .Net Core 3.1 and Elastic APM

As microservices development model grow, the need to understand precisely how to observe them grows accordingly. Understanding how and why problems can occur is something very important because we know that problems can happen and will happen. Given a basic example, how do we know that a sales microservice sent a request to a stock microservice? Surely, the expected result is that he will debit the stock after the sale, right? What if something unexpected happens, how do you know what happened during this transaction? This is a basic example for sure, but it can be even more complex, with a mix of different Technologies and distributed systems. Working with Docker and Kubernetes for example, will make the infrastructure more abstract, making it also more complicated to understand what is being executed.

In this type of solution, failure is inevitable, failures can always occur, and having something where provides us with conditions to identify where it failed, and why it failed, is extremely important for the health and evolution of your project. Therefore, I will talk a little about Observability, a concept that allows us to observe characteristics, attributes of the components involved, which necessarily need to be exposed to understand its operation.

An important point to say is that Observability is totally different from Monitoring, as it is unlikely that only monitoring CPU, memory, disk space, etc., will be possible to ascertain the root cause of the problem. Observability can be defined in 3 basic elements:

  • Tracing
  • Logging
  • Metrics

Tracing

Ability to track the complete route of a request. In distributed systems, it serves to be able to understand the flow of requests, visualize the services that were executed and the time it took, this is totally important to understand where the problem may be.

Logging

Capturing and storing logs is something we use to identify and resolve problems. It is very effective to understand the flow of the application. But in complex systems, many interdependencies make it difficult to understand when problems occur, just by analyzing them in isolation.

Metrics

Metrics are a numerical representation of the state of systems. They are fundamental to understand the behavior of systems. For complex systems, the importance of having metrics is the creation of alerts. Sending alerts has always been an important practice to understand what we can evolve in our system, knowing where the alert comes from, we can be more assertive in solving a problem.

Knowing these concepts that we learned above, we will create a scenario where we will put Observability into practice in a complex system with micro services.

There are tools that help us to obtain this Observability, for our practical example, we will use Elastic APM (Application Performance Monitoring), where we will orchestrate our micro-services to be observed and show us the necessary information, about each request that we will perform in API.

For example:

  • Understand where the service is spending time on and why it stops working in case of failure
  • See how services interact with each other and view bottlenecks
  • Discover bottlenecks and performance errors in advance to correct them
  • Increase the productivity of the development team
  • Monitor the end user experience in the browser

Now we go to the practical part, we will create the microservices for sale, inventory, and payment, and we will put the Observability practice in these microservices to verify  the points where these services interact.

Prerequisites

To be able to follow this tutorial it is expected that you have installed it on your machine:

  • Docker (you may use Docker for Windows, just remember to change to Linux containers!)
  • Visual Studio Code, VS 2017 or 2019
  • .NET Core 3.1

In addition to Elastic APM, we will also be using ElasticSearch and Kibana. Below is a drawing of how Elastic APM works, with ElasticSearch and Kibana

You can see that we are going to use 3 images from Elastic, which are Kibana, Elastic Search and Elastic APM.

Let’s open Kibana, at http://localhost: 5601, and then we’ll go to Elastic’s address: http://localhost:5601/app/apm#/services

To enable our micro-services to be observed, we need to add one of them at home, the Elastic APM agent reference

As we will use Elastic’s services as containers, I created a docker-compose for this solution, to orchestrate our microservices as well as containers.

https://github.com/marraia/Observabilidade/ blob/master/docker-compose.yml

After adding the reference, configure your appsettings.json file, like this: https://github.com/marraia/Observabilidad e/blob/master/src/Services/Stock.API/apps ettings.json

These settings are necessary for the agent to function in its micro-services. Right after that, add this line, in Startup.cs, in the Configure method, saying that you will use the Elastic APM agent: app.UseAllElasticApm(Configuration); and execute the following command: docker-compose up -d After execution, all containers should be working this way.

After that, we go to Elastic APM, to see what was observed during the execution of these requests that we made.

See that it shows us the steps that the API did to process the sale. Since the requisition, the withdrawal of the stock with the PUT requisition and the payment of the sale with the POST requisition. It also shows us the time that the whole request took, which was around 501ms, and the time of each request in the other microservices.

Now, let’s make another request with the card number with less than 16 digits. Because in the API, there is a business rule that if the card number is less than 16 digits, it returns an error. As it is being observed, it will show us this type of error in Elastic APM

With this information, it is much easier to understand what services are required in your process. In addition, with a simple analysis, you can check if one of the services may be experiencing performance problems, and adjust it, to improve the response time. You can also check which errors are happening and how many times they can be happening, to also adjust these problems quickly.

This tool can help you and a lot, also with other information, such as the number of requests is occurring in your API, with the Elastic APM dashboard.

You can find the source code on my GitHub at the following link:

https://github.com/marraia/Observabilidade