What is observability? – Enabling the Observability of Your Workloads

Mike Naughton | May 21st, 2021


Simply put, it’s all about understanding the current state a running system is in, from the work it is doing and the data that it is emitting. Developing a solid observability strategy is not a one-time thing and it will always have scope for optimizations as your business needs evolve. But, before you can even understand what is going on, it’s important to ensure that the system at least emits some data for us to be able to derive some reasoning out of it. But what kind of data?

Observability has three foundational pillars that allow you to convert data into information, and derive insights from that information, which ultimately leads to the actions that need to be taken:

  • Logs: These are the discrete events that have occurred across several components in your systems while serving a customer request. It’s invaluable to store these logs in a centralized data store so that information can be securely extracted, and analyzed, as and when needed. Logs are particularly helpful in situations where some components stop pushing metrics to an event store due to an unhandled exception. This could be caused by third-party dependencies, network issues, or other unforeseen circumstances. Logs are the ultimate source of truth for the events that are happening in a system. Therefore, it’s important to enforce appropriate security guardrails to prevent any modifications after they have been created. In AWS, it’s a good practice to encrypt these logs at rest using services such as Amazon Key Management Service (KMS).
  • Metrics: These are the raw data points that reflect the performance of your systems over time and are particularly useful in alarming, trend analysis, and scenario forecasting. For on-premises deployments, organizations typically leverage third-party tools and application performancemonitoring (APM) solutions that gather underlying infrastructure and application metrics. However, in AWS, most of the services automatically publish key metrics without any additional configurations or cost to the user. This information, which is available free of charge, can be used to define alarms for threshold breaches or mathematical analysis – for example, maximum IOPS for a disk in any 5-minute interval.
  • Traces: When working with multiple microservices, a holistic view is often necessary to understand the request flow and identify bottlenecks. This allows the software teams to zoom in on how the information traversed multiple systems for a particular request, and identify if something didn’t go as expected. These traces are also valuable input for service maps as they can be used to visually depict the dependencies each service has and some problematic areas.

AWS’s Swiss-army knife solution for all observability needs is Amazon CloudWatch. CloudWatch can be used for a variety of customer needs to continuously monitor the applications and underlying infrastructure.

Having understood the three main pillars of observability, let’s see the main benefits they help achieve.

Leave a Reply

Your email address will not be published. Required fields are marked *