Monitoring is the hidden hero inside container infrastructure. Much like driving a car, we don’t think about what’s happening under the hood until something goes wrong. We want to pivot away from being reactive with our infrastructure and container problems and move to a more proactive state of operation using monitoring insights to keep us up and running.
Gone are the days of just monitoring physical servers and their resources. We now need to know which containers are running and where they’re running, monitor applications and containers, and provide alerting and metrics information to operations teams. This article will assist you in navigating the business decisions required to find the right monitoring tool for the job.
Build vs. Buy
When considering new software, the decision to build or buy your monitoring system should be at the top of the list. This argument is very similar to building a house from scratch versus buying an existing, finished house. Building a house allows you to make changes and improvements as you perform the build, compared to buying a house where the decisions have mostly been made for you. Building is resource intensive, as it requires testing different components, trial-and-error experiences and learning how to handle new tools.
Buying a monitoring package offers a finished product that’s ready to run. This approach can save time and money, but there can also be drawbacks. You need to consider the anticipated growth of your organization. A rapidly changing environment with many new services being added can require significant additional time and effort to calibrate the monitoring solution. If many more people are expected to need access to the monitoring solution, then difficulties with identity management could be costly if the vendor charges for seat licenses.
Many monitoring solutions charge per agent. A solution’s pricing model can be a deal-breaker depending on the size of your current or future deployment. For example, some systems charge per agent, while we have seen a few that actually charge per container monitored. Others are usage pricing based; for example, based on the data ingest rate of the agent.
Whether you are building or buying, vetting is required. While a mature product will have reduced risk, rapidly changing ones often have the latest features. An active community of users is important, whether you are going the vendor route or building something based on open source projects. The community is often responsible for keeping documentation and code updated.
Buying an off-the-shelf solution alleviates a lot of these questions, but a comparison between solutions should be performed to see exactly what the support provides, what service-level agreements (SLAs) are available and if professional services can assist with installation and customization.
Support and SLAs
Support is a critical component to any monitoring system. Self-hosted solutions require maintenance, but you may not have the in-house operational capacity to build and manage a monitoring system on your own. But more to the point: who is responsible for supporting the monitoring solution? It is also important to ensure your company SLA and IT governance are adhered to when vetting a monitoring solution.
Flexibility and Customization
Having the ability to customize a monitoring stack exactly to your requirements is an enormous advantage, but it comes at a price, and that price is the time required to make these customizations. The build approach offers more flexibility and customization as you are working with open source software (OSS) that allows you to make your own changes. Buying offers some flexibility and customization dependent on the vendor’s offering, but is relatively more restrictive compared to OSS alternatives.
Cloud-Based vs. Hosted On-Premises
Many monitoring solutions can run either in the cloud or be hosted on-premises. Each offers their own advantages and disadvantages, as well as several considerations based on your workload. Cloud generally means less control of your workload, whereas on-premises offers more fine-grained control. You should also consider if your monitoring needs have to adhere to regulatory or data storing requirements. Generally, on-premises costs more because of the higher operational and support costs associated with running a data center, whereas cloud-based providers handle this for you when deciding for a cloud solution.
Moving to the cloud from on-premises is a business case in itself. Cloud offerings enable reduced costs by sharing resources with other cloud customers. The ability to quickly scale demand and the need for fewer specialized employees are all benefits related to cloud-based services. Central management of your infrastructure, including monitoring, is made easier with cloud vendors like Amazon Web Services (AWS), Google Cloud Platform (GCP), IBM Bluemix and Microsoft Azure.
Several cloud vendors also offer one-click deployment of monitoring solutions, which allows you to get quickly up and running with a Software as a Service (SaaS) or open source solution. However, you must consider where the cloud is located compared to your workload. Some users will want to have their monitoring stack located as close to their container workload as possible.
Not everything can run in the cloud for a variety of reasons, including data privacy, licensing, internet connectivity and performance. On-premises solutions allow you to be in complete control of your environment. You’re able to add more bandwidth and use custom hardware configurations. On-premises is sometimes combined with cloud services to offer hybrid solutions, which offer the best of both worlds. Additionally, not all SaaS monitoring solutions offer an on-premises solution, so ask about it when vetting different vendors.
There are other valid reasons for looking at hosting your own monitoring solution. The golden rule of going down the self-hosted route is that, if you are relying on alerting from your monitoring solution to notify you of critical incidents, then never host it in or on the same infrastructure as your application. That may seem obvious, but you will be surprised at a number of times it happens. If you take the cloud route, then that shouldn’t be a problem.
The on-premises route offers a lot of flexibility and configuration possibilities but will be more challenging to install and configure. The time it will take to have a monitoring solution up and running will take considerably more time compared to a SaaS solution that could be deployed instantly. These are all considerations that should be made when considering an on-premises solution.
Compliance and Data Protection
Something that is often overlooked when considering a monitoring solution is the regulatory implications of data handling. Can data be stored outside of your data center, city, region or country? In some cases, depending on what industry you are working in, the government may also have regulations on how the data is handled.
If you install a cloud monitoring agent on a host machine that is shipping both metrics and information on what processes are running, then data ownership can become an issue. If your metrics include customer data, make sure your cloud provider can remove user data from the transactions it captures.
It’s important to fully understand what is currently running in your environment. Based on what you’re running, you can start answering questions like what programming languages, applications, and tools are currently or planned to run in your containers. This information will help you build a requirements list. This list will assist you in choosing what client libraries and integrations are important to your company now and in the future.
It is important to consider how orchestrators are monitored and to what extent. Consider the discovery of new services as they are brought online and what configurations are required to monitor these new services. Some monitoring tools will auto-discover new services or orchestrator nodes as they go online or offline, while other tools require configuration or integration with each service or node. The more auto-discovery capabilities available within the monitoring system, the less operation support required.
Client Libraries and Integrations
Client libraries allow developers to write internal metrics within an application and expose them directly to a monitoring solution. Monitoring systems also integrate directly to the more common applications like databases and proxies. These integrations allow you to easily implement your monitoring solution and gather application-specific information. For example, HAProxy can be integrated to pull all the HTTP traffic statistics and display this information in monitoring dashboards. However, some integrations require additional agents, containers or sensors to collect metrics from the different integrations. Depending on the integration, this could range from easy to very complicated and may require additional configuration and testing.
If you are currently using a cloud provider or Platform as a Service (PaaS), chances are they’re already exposing monitoring metrics. Find out what is available within your current Containers as a Service (CaaS), PaaS or Infrastructure as a Service (IaaS) stack that you can leverage. Can you integrate metrics from the cloud service stack into your monitoring solution to create a single view of infrastructure and containers? You many want to centralize metrics into a single solution rather than using many different tools without integrations. Be sure to find out which of your current services can be repurposed into your new monitoring solution.
We have explained the use cases for build versus buy, cloud versus on-premises, integrations and native functionality for monitoring solutions. Based on this information, you can start building a matrix of information and make a comparison between the different monitoring solutions currently available on the market, both open source and SaaS solutions. Do your homework and gather information internally first to understand your requirements. This requirements list will be your key to choosing the right monitoring tool for the job.