Ensure that all components of your system emit events – Enabling the Observability of Your Workloads

Mike Naughton | March 5th, 2023


Be it logs, metrics, or traces, in the ideal case, there should be no component in your system that acts like a black box. Every service, be it managed or unmanaged, should log events in your central observability platform. This is also an important criterion when selecting a particular service from cloud providers. It always helps to validate what kind of logs are offered out of the box, what metrics are made available, how these events can be aggregated into your tool of choice, and so on.

It’s important to note that often, some components are indeed a black box as they were inherited from some other team, code is no longer available, or some form of technical debt does not allow you to add instrumentation at the code level. In such cases, you can explore sidecar patterns, where a minimum level of instrumentation comes from another software or tool. These tools are deployed alongside such components and can trace calls made to the underlying kernel or other libraries without causing any performance issues.

Next, we’ll dive into the test application. This will help us gain hands-on experience with some services from AWS and the open source community.

Defining your observability strategy for workloads hosted in AWS

There is never a perfect observability strategy that can give you the most granular view of your systems at scale. Rather, it’s an ongoing journey that keeps evolving as you release new features and updates to your software application. You continue to adapt the observability stack by ensuring that it highlights the adherence to key business goals at any point in time. To kick off your observability journey on AWS, you could consider the following aspects.

Deploying an observability stack for a test application hosted in ECS

In Chapter 7, Running Containers in AWS, we saw how easy it was to run containerized workloads on AWS. We developed a To-Do List Manager application using the Python Flask framework and deployed it on Amazon ECS, with infrastructure components rolled out with CDK. By leveraging ECS’s Fargate deployment model, we were able to offload the management of underlying container nodes to AWS while we focused on just the application logic and the security guardrails around it. Taking the same example further, we will learn how to go about adding observability constructs for such an application. As a best practice, this is something you should start focusing on when you’re writing the first few lines of code. Where would you process the logs, how would you visualize the metrics, what are the key thresholds you should be alerted for, and so on?

This section of this book will specifically focus on a hands-on implementation to help you experience the key areas of observability we have been discussing throughout this chapter. We will learn how to forward logs to CloudWatch, process metrics in Prometheus, and finally visualize everything on a Grafana dashboard. We will be extending our code base from the previous chapter, Running Containers in AWS, and integrating it with other tools that take the observability of our application to the next level. Let’s start with an overview of the changes we plan to implement.

Leave a Reply

Your email address will not be published. Required fields are marked *