Overview of the target architecture – Enabling the Observability of Your Workloads

Mike Naughton | January 30th, 2022


Before we dive into the code-level changes, let’s get a visual understanding of the components we plan to add around our test application stack, and how they communicate with each other. We will focus on capabilities that help us monitor the application logs and metrics on tools of our choice.

We will extend our existing application architecture with the following components to capture and display observability data:

  • CloudWatch Logs: Amazon ECS offers integrations with several logging platforms, with CloudWatch being one of them. We can make use of CloudWatch Logs to ingest all data written on STDOUT of our application containers running in ECS.
  • Amazon Managed Prometheus: The biggest challenge with Prometheus installations is data retention and dynamically scaling the storage infrastructure as the rate of metrics ingestion increases. Similar to the benefits offered by other AWS services, we would like to offload the maintenance of the Prometheus server and related storage to the cloud provider. AWS simply offers us a data ingestion endpoint that can be used by applications to push metrics data to the platform.
  • Amazon Managed Grafana: To visualize the metrics captured by Prometheus, we will use Grafana as our dashboarding solution. AWS offers us the same experience we would get by running a Grafana open source solution. The additional benefits are the readymade integrations around AWS data sources and configurations that come out of the box when using the service offered by AWS.
  • OpenTelemetry Collector: A common deployment pattern for using third-party tools in containerized environments is to deploy them as a sidecar. In our case, we will be deploying an OpenTelemetry collector from AWS, in the same ECS task definition that hosts the web application and MongoDB container. This will allow the collector agent to scrape metrics information from the application container on the /metrics endpoint.

All these components and the corresponding information flows are highlighted in Figure 8.3. In this case, the Flask-based web application forwards logs to the CloudWatch Logs service. The application metrics are scraped by the OpenTelemetry sidecar container and then forwarded to the Amazon Managed Prometheus (AMP) service. Finally, we configure Amazon Managed Grafana to querydata from both these sources and display it in a centralized dashboard:

Figure 8.3 – Observability stack for the To-Do List Manager test application

Note

If you have been exposed to Prometheus tooling in the past, you would have typically experienced a pull-based approach where the Prometheus server fetches the metrics directly from the relevant targets and stores them. In this case, however, we are using the OTEL collector sidecar pattern from AWS, which pushes the metrics to the Amazon Managed Prometheus workspace after scraping the application endpoints.

As a next step, let’s get our hands dirty with some code-level modifications to realize the architecture we just discussed.

Leave a Reply

Your email address will not be published. Required fields are marked *