TNS
VOXPOP
Will real-time data processing replace batch processing?
At Confluent's user conference, Kafka co-creator Jay Kreps argued that stream processing would eventually supplant traditional methods of batch processing altogether.
Absolutely: Businesses operate in real-time and are looking to move their IT systems to real-time capabilities.
0%
Eventually: Enterprises will adopt technology slowly, so batch processing will be around for several more years.
0%
No way: Stream processing is a niche, and there will always be cases where batch processing is the only option.
0%
Data

Demystifying the Metrics Store and Semantic Layer

A semantic layer is a data representation for business that allows end-users to access data independently using conventional business words.
Jun 1st, 2022 10:00am by
Featued image for: Demystifying the Metrics Store and Semantic Layer
Feature image via Pixabay.

Joanna He
Joanna He is the senior director of product growth at Kyligence. She is a seasoned BI architect and is enthusiastic about the data landscape trend. Prior to her current position, she worked as the product manager who helped Kyligence transition open source project Apache Kylin into a cloud native enterprise product.

The market landscape for data architectures has changed dramatically over the last 20 years. Many enterprises have made the journey from traditional on-premises BI/DW (business intelligence and data warehouse) architectures to big data-based distributed architectures like Hadoop in just the last decade. And as cloud computing becomes more prevalent, the data environment is shifting once more and embracing cloud native architectures.

The dominant data architectures in today’s market differ greatly from preceding generations. Modern data stacks center around cloud data warehouses (Snowflake, Amazon Web Services‘ Amazon Redshift, Google BigQuery, etc.), Cloud Data Lakes (Databricks) and Data Lakehouses.

What’s driving all of this change? To put it simply, the ability of cloud data warehouses and cloud data lakes to store enormous volumes of data cost-effectively and the lack of expertise needed to run them, along with the ability to offer consumption-based (pay-as-you-go) pricing, ticks all of the right boxes for most corporations.

Why Would You Need a Semantic Layer or Metrics Store?

This new paradigm for data architectures isn’t without its limits, however. The potential for gaps between the data platform and the way businesses wish to use their data can impede analysts and decision-makers from fully leveraging the data in order to innovate.

Why does this happen? 

First, many critical data assets end up isolated on local servers, data centers and cloud services. Unifying them poses a significant challenge. Often, there are also no standardized data and business definitions, and this adds to the difficulty for businesses to tap into the full value of their data. As companies embark on new data management projects, they need to address these concerns; however, many have chosen to avoid this issue for one reason or another. This results in new data silos across the business.

Second, as every data warehouse practitioner is aware, it’s difficult for most business users to interpret the data in the warehouse. Because technical metadata like table names, column names and data types are typically worthless to business users, data warehouses aren’t enough when it comes to allowing users to conduct analysis on their own.

From a business user’s perspective, what can be done to solve this problem?

Two popular solutions are metrics stores and semantic layers, but which is the best approach? And what’s the difference between them?

This article aims to demystify both metrics stores and semantic layers to help you understand the similarities and differences between these powerful approaches to the challenges we’ve outlined above.

What Is a Metrics Store?

In the simplest terms, a metrics store is a layer that sits between upstream data warehouses/data sources and downstream business applications. Metrics platform, Headless BI, metrics layer and the metrics store are all terms that refer to the same idea.

Unlike typical BI reporting, metrics stores separate metrics definitions from BI reporting and visualizations. The teams in charge of managing the metrics are able to define them once inside the metrics store, creating a single source of truth. They can then reuse these definitions consistently across BI, automation tools, business workflows and advanced analytics operations.

What Is a Semantic Layer?

A semantic layer is a data representation for business that allows end users to access data independently using conventional business words. The semantic layer accomplishes this by translating complex data into common business terms like the product, customer and revenue, resulting in a uniform, consolidated view of data across the enterprise.

Semantic layers frequently contain data in the form of measures, such as sales, distances, duration and weight, which can be totaled, averaged or both. They can also include dimensions, such as sales rep, city and product, which are categorical buckets that can be used to segment, filter or group data. Additionally, metrics and KPIs, which are quantitative measures used to track and assess performance, can be built on top of this.

Similarities Between Semantic Layers and Metrics Stores

User Personas: Both semantic layers and metrics stores can accommodate many analytics roles, such as consumers, explorers, innovators and experts.

Values: Both semantic layers and metrics stores support the following business priorities.

Outcome-Oriented: Both align with the overall goals of the organization.

End-User Democracy: Both approaches benefit end users across the business. Data is accessible to a larger group of users, is more adaptable, enables more sophisticated analytics and is more economical.

Reusability and Availability: Both can act as a single source of truth that is easily accessible, integrates into apps and workflows, and is reusable across different systems and users.

Security: For both approaches, governance, as well as advanced identification, access, and security management, is a central component.

Cost and SLA Optimization: Semantic layers and metrics stores both deliver performant, dependable platforms that provide high-quality data at the lowest cost.

Differences Between Semantic Layers and Metrics Stores

Scope: Semantic layers provide a business-friendly set of logical data models, measures and metrics, whereas metrics stores only provide a business-friendly set of metrics. For metrics stores, the data model is usually controlled by the underlying data source, such as a data warehouse or data mart.

Ease of Use: Semantic layers may be too complex for end users to utilize, customize and update in some circumstances. IT also needs to be involved in the maintenance and update of the semantic layer. As a result, business users can only ever really be a semantic layer’s consumers.

On the other hand, metrics stores offer easy-to-use metrics as code or even a simple interface for business users to generate and change metrics, allowing businesses to achieve a higher level of self-service and increase acceptance and utilization.

Virtual vs. Physical: Most metrics stores serve as a virtual abstraction tier containing business-oriented metric logic. Data is rarely physically stored in the metrics store itself. Typically, metrics stores translate metric logic into underlying data source queries, with the corresponding data source having responsibility for the data store.

Alternatively, the semantic layer can be a virtual or physical tier that sits between the data source and the downstream applications. In addition, the semantic layer may offer a set of performance optimization techniques, such as pushdown, intermediate servers, caching and precomputation, to make the semantic layer more performant across a range of sources and analytics use cases.

Query Language: Some semantic layers support MDX queries, whereas metrics stores, based on the modern data stack, are typically SQL-based.

Location Options: Various generations of analytics and business intelligence (A&BI) tools, data marts, data warehouses, query accelerators, knowledge graph/data fabric and stand-alone virtualization platforms are all possible locations for the semantic layer. Also, many semantic layer solutions provided by vendors can be deployed both on premises and in the cloud. 

When it comes to metrics stores, as the concept itself arises from the modern data stack, they usually reside on top of a cloud data warehouse and cloud data lake.

Summary

It will be fascinating to watch how the data landscape develops over the coming years. The demand for something like a metrics store or semantic layer appears to be gaining traction, and putting everything in a data lake rather than a bunch of warehouses seems a lot more likely.

For any organizations planning to adopt a metrics store or semantic layer, here’s some advice from those who have already made the move: 

Getting the onboarding right is the key to success.

Even if teams agree that a universal layer is needed, the challenge is in making it simple for people across the business to accept and incorporate it into their work. For those able to overcome this challenge, their business will have a significant advantage when it comes to making the metrics store or semantic layer a reality for their enterprise.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.