In 2008, Ralph Kimball, one of the leading innovators in the data industry, introduced a revolutionary approach to data warehousing: the concept of dimensional modeling (DM). The idea was a part of his broader work on data warehouse design, known as the Kimball Lifecycle or the Business Dimensional Lifecycle. Dimensional data models are based on the Select operation and serve to create highly scalable, functional, and dynamic warehouses that can be implemented in various use-cases.
Kimball's work made it easier and more efficient for businesses working with data warehouses to retrieve, analyze, and summarize data through fact and dimension tables. This is essential for processing data warehouses in BI tools to extract actionable insights and generate business reports.
This article will provide an overview of dimensional data modeling, list down some of the main elements of a dimensional data model and explain how it can be implemented.
What is dimensional data modeling?
Dimensional data modeling is based on the joining of fact and dimension tables within databases. In this regard, fact tables refer to the fields typically populated by quantitative data and transactional records. On the other hand, dimension tables contain contextual information about the facts from the fact table and give meaning to statistics.
It is helpful to compare facts and dimensions to nouns and verbs to understand how they function in dimensional data models.
Dimensions can be thought of as nouns as they are the metrics that vary or change due to some action. For example, employees and customers are dimensions as they are the subjects that perform a transaction, i.e., sale or purchase. Similarly, products are also categorized as dimensions as their quantity and price experiences fluctuations.
Expanding on the same analogy, facts can be compared to verbs as they represent what is done to the dimensions. An event that occurs in relation to any of the dimensions will be recorded in the dimension table. For instance, the sale of a product or the addition of a new customer will be entered in the fact table.
Elements of a dimensional data model
Facts involve the various metrics and quantitative measurements that arise from a business process. Examples of facts that govern an item's sale may include the sale price, the number of items sold, discounts, and the total cost.
Dimensions are the contextual information that represents one aspect of the business process. For example, sales, claims, inventory, and customers are dimensions in the retail store data model.
Attributes comprise the details and specifics for a single dimension. Item name, item type, number of items in stock, shelf life and expiry date are all attributes for the inventory dimension.
Fact tables store all the facts for different dimensions and the 'foreign keys' that relate them to their unique dimensions. Facts are categorized as additive, semi-additive or non-additive, depending on the degree of their relevance to multiple business processes. Fact tables typically have a small number of columns and many rows.
Dimensional tables are stores of background information and explanations for the fact table. These help business analysts make sense of the facts stored in the fact table and analyze them better. Dimension tables are usually denormalized structures containing many columns based on the different business processes the enterprise may be involved in. The primary key in dimension tables serves as the foreign key in a fact table, and thus, both these tables are used together to reference and retrieve data.
How to implement dimensional modeling in your data warehouse?
The design of a dimensional model depends on the level of detail and the number of business processes that your data warehouse caters to. Before implementing the dimensional model, it is a good practice to consider the results that you might need to extract from your data. Typically, a dimensional data model is designed to explain the what, who, where, how, why, and when of your business process in great depth. The steps to create a dimensional data model for your enterprise are outlined in the diagram and explained below:
1. Select the Business Process
The first step in creating a dimensional data model is identifying the business process you need to track and observe. For example, monitoring the performance of the sales staff in a company would require fact and dimension tables that provide relevant data. The business process can be described using plain text; however, this may become tedious and confusing over time. Therefore, to conveniently design the model, notations such as the Business Process Modeling Notation (BPMN) and the Unified Modeling Language (UML).
2. Declare the Grain
The grain refers to the level of detail and intricacy that the data analysis requires. The grain must be consistent across all fact and dimension tables to ensure that the data stored in the data warehouse is high quality. For example, the sales reports for two different branches of the same company need to be compiled at the same grain to be compared and consolidated in the data warehouse.
3. Identify the Dimensions
Here, the different qualities or metrics related to the business process are identified and recorded according to the grain set in step 2. The dimensions for a company's sales performance could include employee number, number of sales, volume of sales, value of sales, etc. Each dimension contains a primary key that can be matched with the foreign key in a fact table to join them and retrieve the required data.
4. Identify the Facts
This involves matching the relevant quantitative information to the dimension that is attributed to it. For example, the exact sales values for a particular salesperson in a month for a specific product are facts. Different facts are combined and retrieved together from the data warehouse to generate business reports.
5. Build the Schema
The last step involves defining a structure for the database, known as the schema. This ties together the fact and dimension tables so that the data warehouse can make sense and references the correct information for each field. Two of the most popular schemas used in dimensional modeling are the Star Schema and the Snowflake Schema. Generally, the Snowflake Schema contains greater detail and can be considered a slightly denormalized version of the Star Schema.
Data analysts need to consider factors such as the cost of building a data warehouse, ease of data retrieval, the level of detail required, the storage space available, and their staff's technical expertise when working on data warehouse architecture. While dimensional modeling promises many benefits, including speedy data retrieval, clearly depicted business processes, and a higher degree of scalability, it also has certain limitations. Problems of choosing dimensions and the difficulty in modifying the data warehouse once its architecture has been set up are particularly challenging for data professionals when implementing the dimensional model. Each enterprise must consider the pros and cons of the data warehouse design in the context of their business processes before deciding on whether to choose dimensional models or relational databases.
Read for more blog: Lay on hands 5e