TNS
VOXPOP
Favorite Social Media Timesink
When you take a break from work, where are you going?
Instagram/Facebook
0%
Discord/Slack
0%
LinkedIn
0%
Video clips on TikTok/YouTube
0%
X, Bluesky, Mastodon et al...
0%
Web surfing
0%
I do not get distracted by petty amusements
0%
DevOps / Observability

Top 5 Best Practices for Naming OpenTelemetry Attributes

For data to be valuable in troubleshooting and post-mortems, attribute names need to be consistent across every telemetry type, tool and service.
Dec 1st, 2023 4:00am by
Featued image for: Top 5 Best Practices for Naming OpenTelemetry Attributes
Image from Asmus Koefoed on Shutterstock.

When it comes to using OpenTelemetry (OTel) distributed tracing data, simply collecting it isn’t enough; you need to have practices in place to make sure the data is easy to find and correlates with other data. That’s the goal of having good attribute naming standards.

Effective attribute naming is not just a best practice; it’s a critical requirement. For data to be valuable in troubleshooting and post-mortems, attribute names need to be consistent across every telemetry type, every tool and every service. Without this uniformity, the usefulness of your OTel data is significantly reduced.

Semantic conventions and best practices for OTel make data more connected, more portable and more usable throughout your cloud native environment. Contextual data is the most beneficial type of data for observability teams, and best practices ensure you can maximize data usage and effectiveness.

These guidelines and best practices will help position your organization to get the most benefit from collected tracing data.

Establishing Effective Adoption for OTel Attributes 

To implement effective and useful OTel attributes, it’s crucial to involve all affected teams early in the process. To have a successful adoption, you should consider conducting workshops to get everyone on the same page on the positive outcomes that come from having a clear and consistent naming standard across all layers of the stack. Consistency creates clarity, which is crucial during incident response and debugging.

Get buy-in from software and systems architects by illustrating the benefits of a naming standard and focus on areas that are unique to your company and applications.

Then draft a detailed document that outlines the naming conventions, including syntax, structure and examples. Devise a process for modifying the standard, improving it through feedback and addressing any gaps that you find after the fact.

Best Practices for Naming OTel Attributes

There are five main best practices that you can use as part of your OTel attribute naming conventions to get the most out of your observability data.

1. Use Semantic and Descriptive Attributes

Semantic names help ensure efficient root-cause analysis.

  • Make sure your attributes are clear, descriptive and apply to the entirety of the resource they describe. Names like http.status_code and db.system are easy to identify and provide immediate insights into the nature of a problem, whether it’s in the database or a web service.
  • Non-semantic names like attribute, info or session_data are too generic and lead to confusion when analyzing telemetry data later on.
    • Example: app.service.version
  • Define namespaces for your attributes.
    • Example: app.component.name
    • This is especially important when multiple service teams have their own standard attributes.
  • Keep attribute names short.
    • Example: http.url
  • Set error attributes on error spans.
    • Example: client.error

With descriptive attribute names, you can easily look at resources and have all the necessary context to know what they are, what they include and what they relate to. For an excellent explanation of the existing semantic conventions, visit the official spec, where you can learn the general and system attributes as well as find them organized by signal or operation type (like HTTP or Database), including technology-specific conventions.

2. Use a Shared Library

The practice of creating a library of known attributes helps you catalog the data you care about, and their documentation creates a record of the data that is important to your customers.

When multiple teams will be sharing attributes, it is important to standardize them to avoid discrepancies. Discrepancies in attribute naming conventions across teams can make correlating data difficult or outright impossible. For example, if the backend team names latency as latency, but the frontend team names it duration, queries to compare or aggregate latency across services won’t work properly. Standardized attributes enable teams to leverage shared resources (think dashboards or alerts), and allow you to draw insights across multiple systems and services.

3. Create Custom Attributes

Occasionally you might need to create a new attribute for a specific aspect of your company or application. Before you do, though, it’s a good idea to consult the OpenTelemetry Attributes Registry to be absolutely sure one doesn’t already exist for what you need. Once you confirm there isn’t one that matches what you need, you can create a new one. It’s important to follow the tips in the OTel Attribute Naming guide, especially regarding the use of prefixes.

Prefixes in attribute names help to distinguish your custom attribute names from the standard names, names chosen by other projects, vendors or companies that you work with. If a custom attribute accidentally shares a name with another attribute, it can lead to incorrect conclusions and decisions, faulty dashboards and alerts, and make it challenging to track the flow or state of transactions accurately.

To avoid conflicts with other projects, vendors or companies, it is wise to consider using a prefix based on your company’s domain name, in reverse, like io.chronosphere.myapp.

If you are absolutely sure the name will never be used outside the confines of your application and only inside your company, prefixes are still essential for preventing collisions. Consider using a prefix name associated with your app or project, like bluebook.widget_count.

You might be tempted to piggyback on an existing prefix that belongs to OpenTelemetry or another project or vendor. Sharing prefixes can result in a name clash down the line, leaving you and your peers struggling to find ways to separate someone else’s data from your own during an incident.

4. Focus on Service Levels

When deciding what attributes to apply to your traces, remember that your application’s focus is to provide a high-quality software experience to customers. This mission is encoded in your service/application’s service-level objectives (SLOs), maybe in the form of a 99.999% uptime expectation. From the SLO, you can narrow down which service-level indicators (SLIs) best support or are most likely to threaten achieving SLOs. Your attributes should support your service levels.

For example, if you have latency SLOs that differ between segments of traffic, using attributes that provide segment dimensionality like ProductID, FeatureID or RegionID can help you organize alerts accordingly.

5. Think about New Use Cases

Think of attributes as the root source of pattern-matching in a distributed system. If you want to investigate relationships across and between categories, attributes are the vehicle for sorting and comparing.

Incrementally experiment with different attributes and see what shakes. Let’s consider an example.

Are your premium customers contacting support about an invoice error? Didn’t the order service deploy a new build a few minutes ago? Correlating an attribute, such as service.version and membership.level, against an error metric for service.name:order could help identify if the elevated error rates for premium members are highly correlated to the new version of the order service.

Useful Attribute Types 

A great deal of careful consideration has been put into the development of the standard attributes for OpenTelemetry, and this list is constantly evolving. Although there are more categories than can be mentioned here, it can be useful to explore what exists when building your internal naming standards and to call out what would be useful to teams when investigating regressions. Here are a few examples from the registry:

  • General attributes: General attributes provide broad context about the overall environment and network.
    • server.address: The address of the server.
    • destination.address: The address of the destination.
    • network.carrier.name: The name of the network carrier.
    • code.filepath: The file path of the code.
  • Messaging systems: Attributes related to messaging systems, aiding in tracing and diagnosing issues in message processing.
    • messaging destination: Describes the logical entity messages are published to.
    • messaging.kafka.consumer.group: Kafka Consumer Group that is handling the message.
    • messaging.message.body.size: The size of the message body in bytes.
  • HTTP: Essential for tracing HTTP requests and responses, providing insights into web transactions.
    • http.urI: Full HTTP request URL.
    • http.status_code: HTTP response status code.
    • user_agent.original: Value of the HTTP User-Agent header from the client.
  • Resource attributes: These attributes provide detailed context about the service, infrastructure and operational environment.
    • service.version: Version of the service.
    • k8s.cluster.name: Name of the Kubernetes cluster.
    • gcp.gce.instance.name: Name of the Google Compute Engine instance.
    • aws.ecs.container.arn: Amazon Resource Name (ARN) of the ECS container.

What about Events?

There is one special kind of span attribute called the span event log that often gets overlooked. Span events are very similar to logs, but they are a great place to put contextual information that could be useful when troubleshooting a problem with a transaction.

When thinking about what might go in a span event log, you should clean up any payload of private user data/add any events that are happening within the span, include a shorthand summary of what occurred, any exceptions or full error messages, and additional context.

Attribute Practices to Avoid

We’ve been focusing on the “do’s” of attributes, but here is a closer look at some attribute pitfalls to avoid.

  • Having cryptic semantic attribute names, such as errorcode, as they only cause confusion and make getting information harder.
  • Using the otel.* namespace, unless you think the name is applicable to other applications in the industry. In that case you can submit a proposal to add the new name to the semantic conventions.
  • Creating attributes you aren’t using, even if it seems like it might be useful to someone in the future. Unless you have solid evidence of the usefulness of an attribute, it’s best to hold off adding it.
  • Placing stack traces, uuids (unique user IDs), or exception info inside custom attributes. It is recommended to record them as an Event on the span when it occurred, and the name of the event must be "exception". See exceptions in the spec.
  • Attribute key duplication — either overwriting a key on the same span or having two of the same values with different names. Duplicate attribute keys can cause collisions and overwrite data. It also complicates queries and analysis.
  • Unset or empty values. Unset values provide no useful information. An attribute without a value takes up storage but doesn’t help with troubleshooting or analysis. They also can distort analytics by skewing totals. They also cause confusion.

There are many more useful insights and recommendations in the OpenTelemetry documentation, so it’s a good idea to check the latest spec when working on your attribute standards.

Conclusion

Trace data collection is a necessary part of observability. But it requires having processes in place to ensure the data is useful, accessible and insightful. Naming conventions take upfront work, but by embracing these best practices — from ensuring semantic clarity and maintaining a unified library to understanding data, aligning with service levels and anticipating new use cases — your team can elevate the utility of your telemetry.

This approach doesn’t just streamline troubleshooting, it helps you build an effective culture of observability within your organization. The result of this work is a rich OTel data set full of accessible insights, enabling smarter, quicker decision-making.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.