Click and Span: How to Optimize with Code Profiling
With the scale, complexity and demands of modern applications, any delay in your application can mean the difference between greater adoption of users and loss of retention. As much as we optimize and bug-fix with testing to prevent this, certain issues only arise in production, and that is where finding the root cause quickly is paramount. One crucial tool in your optimization investigation is code profiling, which will add visibility into your runtime, saving your users time while conserving the precious resources of your environment.
Now What Does It Actually Do?
With code profiling, like many elements in observability monitoring, context is key. You will need a tool to extract traces, spans and code, and associate them all together.
For our use today, we’ll focus on using OpenTelemetry for collecting the data, specifically associating a call stack with the rest of your application’s metrics. The call stack traces are augmented with the rest of your observability data, such as spans from a specific user session, allowing you the full story of how your user is being affected from the backend code, and vice versa.
This better understanding of the relationship between backend code and user experience will highlight which elements in your code to focus optimization on as the application continually develops.
What Is a Call Stack?
The core of code profiling is the ability to grab snapshots of the runtime for your application, commonly referred to as a call stack. The call stack sheds light on the particular functions running at what time within your application. From only viewing, “Application A is running,” to seeing how the user journey calls specific functions that reach your backend infrastructure, code profiling gives developers the ability to know where their code matters within their observability monitoring.
As powerful as it is to drill down into individual sessions, the best work of code profiling is done in conjunction with your tagging and other aggregating methods, unlocking now the insights for types of users and sessions over the flood of disparate spans and call stacks.
Being able to determine what code paths different users of your application exercise and to see the difference in their performance is a key benefit of code profiling integrated with application performance monitoring (APM.). This is generally done through tagging spans belonging to particular customers or types of customers, and then aggregating those spans, sorted by those tags, to get insights on performance.
The human-readable format of viewing your call traces varies, but one common approach is to display them as flame graphs.
Viewing the time each particular function takes relative to the rest of the application can make it easy to discern which functions are using too much CPU or memory and optimize for all the customer journeys your application faces.
What Does It Mean for Me?
Code-profiling tools give you insight into the CPU and memory use of your application, broken down by user task. Improving the utilization of these resources can make a marked improvement in the overall speed, latency and user experience for your application, and at the same time make you potentially no longer need as large an instance to keep your application running smoothly.
With the increasing availability of hardware combined with the ease of scaling your application, many of the problems facing an application can, and likely will, be solved in production by “throwing” more hardware at it. Although this carries the benefit of not having to worry about your cloud provider bill at the end of the month, it also comes at the cost of a slower user experience. More assets being wasted means fewer resources for additional features; more waiting for the end user means less time for engaging with content. Without streamlined profiling, you will see the costs on both sides of the balance sheet.
With code profiling in your toolkit, the added efficiency can decrease costs, provide a faster experience for your end users and give your engineers more time for new features.