Profiling Code to Solve Performance Issues Faster

Can’t figure out why your users are experiencing slow load times and poor UI performance? Profiling may be the solution to help you pinpoint the exact line of code that is causing performance issues. In this Q&A, Indragie Karunaratne, director of engineering at Sentry, will answer questions about what a profiler is, why a developer might want to use one and what types of application performance issues you can solve using profiling data.
Q: What types of performance issues can a profiling tool uncover?
Profiling tools are most useful for uncovering performance issues in code that is CPU-bound, where most of the time in the application is spent on operations running on the CPU rather than idling or waiting for I/O. However, even for I/O-bound operations, a profiling tool can augment tracing by showing you the code that is waiting on an I/O-bound operation, thus improving your ability to debug the problem.
On mobile apps, profiling is commonly used to diagnose performance issues like slow app startup or view load times, excessive battery drain, UI jank and poor scroll performance. On the backend, profiling can diagnose latency issues caused by expensive code running within a service, such as a long wait on a connection to a database or slow serialization/deserialization.
Q: Developers often also use logs, traces and metrics to monitor performance. Why might that not be enough to catch a performance bottleneck?
Logs, traces, metrics and profiles all have distinct use cases.
Logs are a simple way of collecting diagnostics about what is happening within a service or application, but they require manually adding events where you want to collect data. Even if you manage to add logging to every potential performance bottleneck, aggregating and analyzing data from logs can be inefficient and time-consuming.
Traces are primarily useful for understanding failures and latency at a request level in a distributed system, especially for high-level analysis of systems where most of the performance bottlenecks are I/O-bound. Traces are less useful for understanding what is happening within a single service, when the bottlenecks are lower-level and CPU-bound.
Metrics like load times, CPU usage and memory usage can give you a bird’s-eye view of your overall application health. However, while metrics can alert you when something is wrong, understanding the root cause of the problem requires using additional tools.
Profiling fills a specific gap that other solutions miss: low-effort, automatic instrumentation of CPU-bound bottlenecks within an application. By enabling profiling in production for your services and applications, you get aggregated information about the functions taking the longest to execute — so you can catch performance bottlenecks and get down to the lines of code causing them.
Q: I’m already using a local profiler, like Python’s cProfile or Apple’s Instruments. Why would I need other profiling tools?
Local profilers are useful for debugging issues that reproduce in your local development environment but may not be representative of what your users actually experience in production. This is because controlling for all of the variables that affect performance in production — like network latency and connection quality, hardware and OS/language/runtime version — is impractical in local development.
In contrast, a production profiler captures data from every environment your product runs on, with minimal additional engineering effort. Aggregating this data from all of the environments provides an accurate overview of what your users are experiencing and allows you to diagnose issues that may only be affecting a subset of users — which would be very difficult to debug using a local profiling tool alone.
Q: I’ve seen that profiling tools often visualize performance using a flame chart. What is a flame chart?
A flame chart is a way of visualizing the code that runs in your application over time. The X-axis of the chart is time, usually specified in milliseconds, and the Y-axis represents the call stacks for functions executing in the application. Each block on the flame chart is a function, and the width of the block indicates how long the function took to execute. To analyze performance using a flame chart, the typical workflow is to look for wide blocks (long-running functions), then zoom into that region of the flame chart to understand when and why the function is being called.
Flame charts seem daunting at first, but quickly become an indispensable tool for when you need to manually analyze a performance problem using all of the context available to you. For example, if you are trying to identify a pattern of expensive code that is executed repetitively but don’t know precisely which functions you’re looking for, a flame chart is an excellent visualization for doing that.
Q: Besides flame charts, what are some other ways that I can profile data?
For more common performance scenarios, we can extract and summarize profiling data to give you actionable insights without having to look at flame charts. For example, we can aggregate information about how long specific functions take to run across many profiles and simply tell you what the slowest-running functions are in your application. This list of functions gives you a quick starting point for your performance optimization efforts.
Another use case is augmenting tracing data with profiling data. If you see a long-running span in a trace, and you’re not sure what’s causing the span to take a long time, we can pull additional context from the profiling data and show you which call stacks executed most frequently during that span. You get a full stack trace complete with file names and line numbers without having to open a flame chart and without having to manually add additional span instrumentation.
Finally, we can automatically detect common performance antipatterns using profiling data and surface them as issues. For example, the profiling data tells us when you call long-running functions that block the main thread of your mobile app and cause poor UI performance. Instead of you having to actively track down this problem by looking through flame charts, we can create an issue with the simplified context you need to solve the problem and alert you when it happens.
Q: How much performance overhead does profiling add to an application?
All profilers add some overhead, but certain design decisions can minimize this when performance really matters, like when you’re profiling in production.
Broadly speaking, deterministic profilers are a type of profiler that captures information about every function call that occurs. They produce the most accurate and highest fidelity data, but they also add a significant amount of overhead at runtime. Overhead could even be 100% or more, depending on the tool. Deterministic profilers are commonly used for local profiling, as performance overhead matters less in development.
In contrast, sampling profilers — the type Sentry uses in all of our SDKs — minimize overhead by making reasonable data fidelity tradeoffs. Instead of capturing information about every function call, we capture samples 100 times per second. Sampling at 100Hz produces ~10 millisecond granularity on function timing information, which is still enough to identify most performance problems but significantly less overhead than a deterministic profiler. At Sentry, we target 1% to 5% overhead.
To learn more about Sentry Profiling, now available on Python, PHP, Node.js, Android and iOS, check out our website and docs. Just set up performance monitoring (which takes just five lines of code), and then you can update your SDK to get started with Profiling. If you’re new to Sentry, you can try it for free or request a demo to get started.