Overcoming data inconsistency with a ‘universal semantic layer’

GartnerAccording to , the cost that companies spend due to bad data amounts to $12.9 million (approximately 17.798 billion won) per year. Naturally, data leaders have been searching for a single source of truth for business intelligence and analytics for decades to ensure everyone’s decisions are based on the same data and definitions.

ⓒ Getty Images Bank

BI solution providers have introduced the concept of a semantic layer to achieve data consistency. The semantic layer is a layer of abstraction between raw data described by row, column, and field names that only data experts can understand and that provides informed insights to business users. Hides the complexity of data and maps it to business definitions, logic, and relationships. The semantic layer allows business users to perform self-service analytics using standard terms such as sales and profits.

Proliferation of Semantic Hierarchies

Semantic layers were a welcome development before the sprawl of BI tools and associated semantic layers. BusinessObjects built the first lightweight semantic layer in SAP BusinessObjects in the 1990s. The problem is that early BI solutions, such as BusinessObjects, were monolithic and not very user-friendly. Frustrated users turned to Tableau, Power BI, and Looker, which had improved ease of use. The problem now is that these tools are proliferating and replicating throughout the enterprise, destroying any hope of a single source of truth.

Different parts of the enterprise now use different BI, analytics, and data science tools to create their own data definitions, dimensions, metrics, logic, and context. Additionally, each team maintains its own semantic hierarchy. As a result, data interpretation, business logic, and definitions differ depending on the user group, leading to distrust in reports and intelligence derived from data.

Additionally, inconsistency often causes confusion among teams. For example, is an active customer someone who has purchased an ongoing paid subscription to your service, or someone who logged in within the last 7 days, or someone who signed up for a 7-day free trial? Inconsistent definitions interfere with finance’s ability to bill, renewals’ ability to identify customers, and operations’ ability to accurately process and report products sold.

The Rise of Semantic Layers in Data Warehouses

As if the complexity of the data environment was not enough, data architects have begun implementing semantic layers within data warehouses. Designers can think of the data assets they manage as a single source of truth for all use cases. But generally that is not the case. This is because structures with millions of denormalized tables are generally not ‘business ready’. Once the semantic layer is embedded in the various warehouses, data engineers must connect analytics use cases to the data by designing and maintaining data pipelines with transformations that produce ‘analytics-ready’ data.

Without a consistent semantic layer, data engineers must hard-code semantic meaning into dedicated pipelines to support data customers. Semantic meanings (definitions) quickly become rigid, making it difficult for the central architecture team to keep pace with the domain-specific requirements of various work groups. As code expands, it becomes more difficult to maintain and more inconsistent. This approach introduces delays and dependencies that hinder data-driven decision making.

More diffuse local semantic layers

To make matters worse, when a data warehouse moves to the cloud, user queries can become unbearably slow. When performance is poor, business users almost always extract data and load it into their preferred analytics platform to make it easier to work with and query faster, which leads to additional semantic diffusion within the localized semantic layer.

Nowadays, in most cases, there are some layers of meaning floating around the data stack. It exists in cloud data warehouses, transformation pipelines, and a little bit of each BI tool. This semantic proliferation causes extreme inefficiency. This is because every time a data engineer designs a new data pipeline, he or she recreates a common business concept (such as annual forecasting or currency conversion). Data teams must constantly recreate common business concepts that exist sporadically in various semantic layers whenever a new business question requires an answer that includes different data definitions or business logic. It’s like spending all day catching moles. It’s a duplication of engineering and a waste of time and resources.

Creating a universal semantic layer

What is needed is a universal semantic layer that defines all metrics and metadata for every possible data experience (visualization tools, customer-facing analytics, embedded analytics, AI agents). Having a universal semantic layer ensures that everyone in the enterprise agrees on a standard set of definitions for terms like ‘customer’ and ‘prospect’ and standard relationships between data (standard business logic and definitions), allowing data teams to create a single, consistent set of semantic data. You can build a model.

The general-purpose semantic layer sits on top of the data warehouse and provides data semantics (context) to various data applications. It works seamlessly with transformation tools, allowing companies to define metrics, prepare data models, and expose them to a variety of BI and analytics tools.

To build a universal semantic layer, the data team must first establish the business logic, computations, and context that apply to the semantic data model. First, identify the real problem to be solved, collect the necessary data, then encode the relationships between the data and define governance and security policies to achieve trusted access. Metadata is then used to build an abstraction of the data to consistently expose dimensions, hierarchies, and calculations to downstream data consumers.

Once the underlying data and meaning are established, the general-purpose semantic layer must be integrated with data consumers such as generative AI, BI, spreadsheets, and embedded analytics. Cube Cloudis a universal semantic layer platform that provides numerous pre-built integrations and a robust set of APIs to enable enterprises to model their data once and serve it anywhere. It also provides a variety of developer tools that make it easier to collaborate, build data models, set up caching and pre-aggregations, and maintain data access control.

Benefits of a universal semantic layer

Using a universal semantic layer increases governance and control for data teams and, when implemented properly, helps end users get more value from their data and reduces misunderstandings between teams. The result is greater efficiency and ensures that all data consumption locations work with the same, accurate data. Whether the data is being used by someone viewing a dashboard or a large language model providing someone with an answer to a question, the data is consistent.

This makes it easier for data teams to quickly provide data to internal and external data consumers. Data teams can easily update metrics or define new metrics, design domain-specific data views, and integrate new raw data sources. It can also enforce governance policies, including access control, definitions, and performance.

There is another advantage too. As data volume grows, cloud computing costs soar. A general-purpose semantic layer solves this problem by pre-processing or pre-aggregating data, storing frequently used business metrics, and using them as the basis for analysis, thereby reducing cloud data bills. Additionally, the universal semantic layer delivers extremely high performance and low latency for data across the enterprise, speeding up user queries.

A single source of truth that finally becomes a reality.

A general-purpose semantic layer is needed to power the next generation of data-driven applications. There are a variety of tools for visualizing and using this data, and we must accept that the data is stored in a variety of data sources. Additionally, a universal semantic layer finally creates a single source of truth for enterprise metrics, giving decision makers the data they need to get consistent, fast, and accurate answers.

*Artyom Keydunov is the CEO of Cube.
editor@itworld.co.kr

Source: www.itworld.co.kr