Tigera Adds Full-Stack Observability with Calico Enterprise 3.5
Tigera has released the latest version of its Calico Enterprise SaaS security and observability platform for Kubernetes at this week’s KubeCon+CloudNativeCon Europe, adding a set of features that add up to offer real-time troubleshooting and full-stack observability for cloud native application developers and site reliability engineers (SREs).
Based on the open source networking and network security Calico project, Calico Enterprise offers expanded enterprise-level features, with Calico Enterprise 3.5 specifically adding several features that the company says delivers full-stack observability in “an easy-to-understand and action-oriented view that is otherwise extremely difficult and time-consuming to come by, due to the abstracted, ephemeral, and distributed nature of Kubernetes infrastructure.”
“Full-stack observability is really providing a complete observability for the DevOps and SREs all the way from their underlying infrastructure to the application stack. The reason this is very critical for microservices-based applications is that they’re distributed apps running on a distributed infrastructure, so now the boundary lines are blurred between what’s infrastructure and what’s application,” said Gupta. “You could have a performance issue in your application, or your latency could be going up because a DNS request is taking too long.”
To deliver this full-stack observability, Calico Enterprise 3.5 adds four distinct features that can operate across any multicloud infrastructure to help troubleshoot distributed applications.
First, the dynamic service graph provides a point-to-point topographical representation of traffic between namespaces, microservices, and deployments, and Gupta explains that this simplifies troubleshooting once a certain level of complexity is reached.
“Once a user has passed about 20 to 30 microservices running in their Kubernetes clusters, it is quite hard and difficult for them to be able to understand the dependency graph,” said Gupta. “It’s very important because that’s the base starting point, as you are looking for performance issues or infrastructure issues in your application.
Next, application-level observability can both detect and prevent anomalous behaviors, while the domain name system (DNS) dashboard lets users discern DNS issues from application issues.
“A very common situation is your performance latency goes up and really the root cause sits with a higher DNS latency. When all these microservices, they’re communicating over the network. These are not procedural calls inside a library. So if they are communicating on the network, any degradation in DNS performance is going to directly impact your application performance,” said Gupta. “We have seen that multiple times, so we have added a set of capabilities for users to be able to take application response times all the way down to infrastructure and DNS response.”
Finally, Calico Enterprise 3.5 adds dynamic packet capture, automatically retrieving pcap files and allowing users to customize the duration and packet size for packet capture.
“Once a DevOps engineer or an SRE identifies where the performance hotspots are or where the issues could be, the ability for them to be able to do further troubleshooting, further diagnostic, could be as simple as doing a packet capture to identify the payload of the request so that they can identify whether the performance issues are coming from the infrastructure or a poorly designed application query,” explained Gupta. “Some of these troubleshooting steps could be very complex in a Kubernetes world, because you need the ability to access the underlying infrastructure to be able to generate this packet capture file. We make it truly seamless and RBAC-driven, so the platform engineers can let the service teams do this by themselves in a self-service fashion.”
All of the data gathered by Calico Enterprise is both available in a single-pane-of-glass on the service, as well as streamed to specific endpoints. For example, Calico Enterprise can be configured to send data to Syslog endpoints, Splunk, sent to ElasticSearch using FluentD, or scraped by Prometheus, among other possible use cases. In addition, action can be taken directly within Calico Enterprise itself, or can be used to trigger events externally.
Moving forward, Gupta said that users can expect a further strengthening of observability features, with machine learning-based algorithms to help perform root cause analysis of issues. He also said that they would continue to expand the type of data collected and provided.
“We have made good strides with our full-stack observability, but the users can expect to see more and more expanded data sets so that it’s very optimal for them to troubleshoot performance and availability for their applications,” said Gupta.