Dimensional Modeling - Lecture notes 7 PDF

Title Dimensional Modeling - Lecture notes 7
Author Aamir Khan
Course Data Mining
Institution Bahria University
Pages 6
File Size 298.5 KB
File Type PDF
Total Downloads 47
Total Views 139

Summary

Dimensional Modeling lecture for Data Warehouse/Data Mining...


Description

LECTURE: DIMENSIONAL MODELING IN DATA WAREHOUSE

What is Dimensional Data Modeling? Dimensional data modeling is one of the data modeling techniques used in data warehouse design. The main goal of this modeling is to improve the data retrieval, it is optimized for the SELECT operation. Dimensional data modeling in data warehouse is different than the ER modeling where main goal is to normalize the data by reducing redundancy. This model gives us the advantage of storing data in such a way that it is easier to store and retrieve the data once stored in the data warehouse. Dimensional model is the underlying data model used by many of the OLAP systems. Dimensional model is developed by the legendary Ralph Kimball.

Let’s start with listing the basic components of a dimensional model, later tying them all together to get a complete picture:

Dimension Table

Dimension tables store the descriptive details of each business process; the what, who, where, and when. In the store dimension here, the dimension table holds the address and other details related to the store’s location. Each of these ‘attributes’ represents a column in the database.

LECTURE: DIMENSIONAL MODELING IN DATA WAREHOUSE

Fact Table

A fact table stores transaction or event data in a numeric format. In the example above, the fact table holds values for ‘price’ and ‘quantity’. These are called ‘measures’. The rest of the entries you see in this fact table are ‘foreign keys’ of dimensions that are dependent on this fact table. The dimensions, therefore, associated with our fact tables are product, time, store, employee, and sales.

Primary and Foreign Keys Each dimension has a primary key, which is stored in the dimension table as a separate column. To reference these primary keys and therefore associated records in a dimension table, foreign keys are used, which are stored in a separate column. As seen in the fact table diagram above, foreign keys that reference primary keys to each of the associated dimensions are contained.

Tying the Components Together When we bring dimensions, facts, and primary and foreign keys together, we get a dimensional modeling schema: Below diagram illustrates a star schema, the most popular dimensional modeling schema used in data warehousing. In a star schema, there can be one or more fact tables, and each fact table is associated with multiple dimensions. The resultant shape resembles a star, thus the name ‘star schema’. This schema is the simplest form of a dimensional model.

LECTURE: DIMENSIONAL MODELING IN DATA WAREHOUSE

Building a Dimensional Model Let’s understand the steps to create dimensional model with the help of example, say, you want to store the information of how many paracetamol and diclofenac tablets sold from single MedPlus store every day. Everything we model fall into two tables , a fact, that hold measures and dimension, hold information that qualifies the measure. Below are the steps for data warehouse dimensional modeling example:

Step 1: Chose Business Objective The first step in data modeling is, identify the business objective. In our example, the business objective is to store the information of how many paracetamol and diclofenac tablets sold from single MedPlus store every day.

LECTURE: DIMENSIONAL MODELING IN DATA WAREHOUSE

Step2: Identify Granularity Granularity is the lowest level of information stored in the table. E.g. if the table contains daily sales data then granularity is “Daily”. In our example, say, a specific MedPlus shop sells 1,000 paracetamol tablets on a specific day then granularity is daily and 10,000 on specific month then granularity would be monthly. It is very important to set the granularity of the information required. In our case, it is daily.

Step 3: Identify Dimension and its attributes Dimensions are objects or things. In our example, we are dealing with 3 things, a “Shop”, “Medicine”, and “Day”. We have 3 dimension tables here “Shop”, “Medicine” – paracetamol and diclofenac, and “Day”. Below are the dimension table structures for our simple dimensional model. Complex, systems may have very complex table structure.

Medicine SK 1 2

NAME Paracetamol Diclofenac

Shop SK 1 2 3

NAME Shop1 Shop2 Shop3

Day SK 1 2 3

NAME 2020-01-01 2020-01-02 2020-01-03

LECTURE: DIMENSIONAL MODELING IN DATA WAREHOUSE

Step 3: Identify Fact Fact table holds something that is measurable. In our example, number of tablets sold is a measure. We create separate table called Fact to store the measures.

In our example, granularity is medicine getting sold per day; we will add the SK column from Shop, Medicine, and Day to fact table as shown below

Finally, to summarise the process, we have created 3 dimension tables (Shop, Medicine and Day) and 1 fact table. Primary key from the dimension table flows into Fact table as a foreign key. This model looks like star and is also called “Star Schema”

LECTURE: DIMENSIONAL MODELING IN DATA WAREHOUSE

Sometimes, you may want to keep the history of the data that is changing in dimension tables. There are lots of “slowly changing dimensions” methods available to preserve history. Advantages of Dimensional Modeling Following are the benefits of dimensional modeling are: Dimensional modeling is simple: Dimensional modeling methods make it possible for warehouse designers to create database schemas that business customers can easily hold and comprehend. There is no need for vast training on how to read diagrams, and there is no complicated relationship between different data elements. Dimensional modeling promotes data quality: The star schema enable warehouse administrators to enforce referential integrity checks on the data warehouse. Since the fact information key is a concatenation of the essentials of its associated dimensions, a factual record is actively loaded if the corresponding dimensions records are duly described and also exist in the database. By enforcing foreign key constraints as a form of referential integrity check, data warehouse DBAs add a line of defense against corrupted warehouses data. Performance optimization is possible through aggregates: As the size of the data warehouse increases, performance optimization develops into a pressing concern. Customers who have to wait for hours to get a response to a query will quickly become discouraged with the warehouses. Aggregates are one of the easiest methods by which query performance can be optimized.

Disadvantages of Dimensional Modeling To maintain the integrity of fact and dimensions, loading the data warehouses with a record from various operational systems is complicated. It is severe to modify the data warehouse operation if the organization adopting the dimensional technique changes the method in which it does business....


Similar Free PDFs