Quantcast
Viewing all articles
Browse latest Browse all 3160

Create an Extended Date Dimension for a SQL Server Data Warehouse

Problem

The Date Dimension is a key dimension in a SQL Server data warehousing as it allows us to analyze data in different aspects of date. Apart from the standard date attributes like year, quarter, month, etc., this article explains how the date dimension can be extended to richer analysis in a SQL Server data warehouse.

Solution

A date dimension is mostly a static dimension which does not require daily update. However, a window function may need a daily update. The solution would be to have a special date dimension and populate at the start of the year and update changing data on a daily basis.

Why a Special Dimension

First question would be, what is the requirement for the special dimension for a date in the data warehouse. Let's look at the options we have in case a date dimension is not present.

What are sales per year?

SELECT YEAR(OrderDate) Year ,SUM(SalesAmount) Amount
FROM FCT_Sales
GROUP BY YEAR(OrderDate)

What are the sales done on weekends?

SELECT SUM(SalesAmount)
FROM FCT_Sales
WHERE DATEPART(dw,OrderDate) IN (1,7)

In both the scenarios, there will be a performance impact. As you are aware, we are dealing with a large number of records in a data warehouse, the above queries will have performance issues. Also, indexes won’t be a solution as the use of functions will not make the index usable.

Apart from performance issues, there are functional limitations. For example, in case you need to get the sales on a special holiday or for a season, in which there are no built-in functions, you have no choice, but to have a special date dimension.

By looking at functional and performance limitations, it is very obvious that there needs to be a special dimension to store a date which is the date dimension used more commonly in the data warehouse.

Also, there are times where more than one date column is available in the fact table. In that instance, a date dimension will act as a role playing dimension in SQL Server Analysis Services as shown in the below figure.


Image may be NSFW.
Clik here to view.
Create an Extended Date Dimension for a SQL Server Data Warehouse

In the above example, OrderDateKey, DueDateKey and ShipDateKey are linked to the Date Dimension.

Please note that the role playing dimension feature is not available in the Tabular world where you need add multiple instances of the date dimension.

Surrogate Key

Typically, surrogate keys will be an incremental number. However, in case of a date dimension, YYYYMMDD format is used for a surrogate key. This is to facilitate data partitioning in the data warehouse. Fact tables are normally partition by the date. If mere incremental numbers are used for the date dimension, the fact table will also have the same incremental numbers which will lead to difficulties in partitioning. In case of YYYYMMDD format, it is much easier to include a partitioning function using the details in the surrogate key.

Standard Columns

In a date dimension, it is always better to include all the possible columns leaving the options of deriving attributes at the user level.

Below is an example of the basic columns for a Date Dimension.

CREATE TABLE dbo.Dim_Date (
DateKey INT NOT NULL PRIMARY KEY,
[Date] DATE NOT NULL,
[Day] TINYINT NOT NULL,
[DaySuffix] CHAR(2) NOT NULL,
[Weekday] TINYINT NOT NULL,
[WeekDayName] VARCHAR(10) NOT NULL,
[WeekDayName_Short] CHAR(3) NOT NULL,
[WeekDayName_FirstLetter] CHAR(1) NOT NULL,
[DOWInMonth] TINYINT NOT NULL,
[DayOfYear] SMALLINT NOT NULL,
[WeekOfMonth] TINYINT NOT NULL,
[WeekOfYear] TINYINT NOT NULL,
[Month] TINYINT NOT NULL,
[MonthName] VARCHAR(10) NOT NULL,
[MonthName_Short] CHAR(3) NOT NULL,
[MonthName_FirstLetter] CHAR(1) NOT NULL,
[Quarter] TINYINT NOT NULL,
[QuarterName] VARCHAR(6) NOT NULL,
[Year] INT NOT NULL,
[MMYYYY] CHAR(6) NOT NULL,
[MonthYear] CHAR(7) NOT NULL,
IsWeekend BIT NOT NULL,
)

Important to note is the existence of three columns for month name. MonthName is used to store the month name such as January, February, etc. In some reports, you might have experienced that the month name will be shorten such as Jan, Feb, etc. which can be stored in MonthName_Short column. MonthName_FirstLetter column can be used to store J, F, M, etc. for the first character of the month giving more options for users. Similarly, there are three columns for Weekday as well.

Most of these attributes can be generated by using built-in SQL Server functions such as YEAR, MONTH, DATEPART and DATENAME. In this script, EndDate can be defined.

SET NOCOUNT ON
TRUNCATE TABLE DIM_Date
DECLARE @CurrentDate DATE = '2016-01-01'
DECLARE @EndDate DATE = '2020-12-31'
WHILE @CurrentDate < @EndDate
BEGIN
INSERT INTO [dbo].[Dim_Date] (
[DateKey],
[Date],
[Day],
[DaySuffix],
[Weekday],
[WeekDayName],
[WeekDayName_Short],
[WeekDayName_FirstLetter],
[DOWInMonth],
[DayOfYear],
[WeekOfMonth],
[WeekOfYear],
[Month],
[MonthName],
[MonthName_Short],
[MonthName_FirstLetter],
[Quarter],
[QuarterName],
[Year],
[MMYYYY],
[MonthYear],
[IsWeekend],
[IsHoliday]
)
SELECT DateKey = YEAR(@CurrentDate) * 10000 + MONTH(@CurrentDate) * 100 + DAY(@CurrentDate),
DATE = @CurrentDate,
Day = DAY(@CurrentDate),
[DaySuffix] = CASE
WHEN DAY(@CurrentDate) = 1
OR DAY(@CurrentDate) = 21
OR DAY(@CurrentDate) = 31
THEN 'st'
WHEN DAY(@CurrentDate) = 2
OR DAY(@CurrentDate) = 22
THEN 'nd'
WHEN DAY(@CurrentDate) = 3
OR DAY(@CurrentDate) = 23
THEN 'rd'
ELSE 'th'
END,
WEEKDAY = DATEPART(dw, @CurrentDate),
WeekDayName = DATENAME(dw, @CurrentDate),
WeekDayName_Short = UPPER(LEFT(DATENAME(dw, @CurrentDate), 3)),
WeekDayName_FirstLetter = LEFT(DATENAME(dw, @CurrentDate), 1),
[DOWInMonth] = DAY(@CurrentDate),
[DayOfYear] = DATENAME(dy, @CurrentDate),
[WeekOfMonth] = DATEPART(WEEK, @CurrentDate) - DATEPART(WEEK, DATEADD(MM, DATEDIFF(MM, 0, @CurrentDate), 0)) + 1,
[WeekOfYear] = DATEPART(wk, @CurrentDate),
[Month] = MONTH(@CurrentDate),
[MonthName] = DATENAME(mm, @CurrentDate),
[MonthName_Short] = UPPER(LEFT(DATENAME(mm, @CurrentDate), 3)),
[MonthName_FirstLetter] = LEFT(DATENAME(mm, @CurrentDate), 1),
[Quarter] = DATEPART(q, @CurrentDate),
[QuarterName] = CASE
WHEN DATENAME(qq, @CurrentDate) = 1
THEN 'First'
WHEN DATENAME(qq, @CurrentDate) = 2
THEN 'second'
WHEN DATENAME(qq, @CurrentDate) = 3
THEN 'third'
WHEN DATENAME(qq, @CurrentDate) = 4
THEN 'fourth'
END,
[Year] = YEAR(@CurrentDate),
[MMYYYY] = LEFT('0' + CAST(MONTH(@CurrentDate) AS VARCHAR(2)), 2) + CAST(YEAR(@CurrentDate) AS VARCHAR(4)),
[MonthYear] = CAST(YEAR(@CurrentDate) AS VARCHAR(4)) + UPPER(LEFT(DATENAME(mm, @CurrentDate), 3)),
[IsWeekend] = CASE
WHEN DATENAME(dw, @CurrentDate) = 'Sunday'
OR DATENAME(dw, @CurrentDate) = 'Saturday'
THEN 1
ELSE 0
END,
[IsHoliday] = 0
SET @CurrentDate = DATEADD(DD, 1, @CurrentDate)
END Holidays and Special Days

Holidays will be handled by the following columns.

IsHoliday BIT NOT NULL,
HolidayName VARCHAR(20) NULL,
SpecialDays VARCHAR(20) NULL As holidays are dependent on the country or region that you are implementing the data warehouse, a customized script is needed for the holidays and spec

Viewing all articles
Browse latest Browse all 3160

Trending Articles