Optimizing Type 2 Slowly Changing Dimensions in Integration Services

By: Koen Verbeeck || Related Tips:More > Integration Services Development

Problem

In thefirst part we introduced the reasoning behind type 2 slowly changing dimensions. We also discussed different implementation options in Integration Services (SSIS). Finally, we started with an implementation in the data flow using out-of-the-box transformations only, so we can build an optimized data flow for loading a type 2 dimension. In this tip, we’ll continue the implementation.

Solution

In the first part of the tip we ended with a data flow where we checked if a row was an insert or an update using the lookup component. If you haven’t read the first part yet, it’s recommended to do so.

For the updated rows, we’re going to verify if a row has columns that have changed compared with the most recent row of that business key. First, we’re going to check for type 2 changes. If a type 2 change has occurred, a new row will be inserted and the previous version will get a timestamp for the ValidTo column, to indicate how long that version was valid in time.

Add a conditional split to the data flow canvas and connect it with the match output of the lookup component.

Optimizing Type 2 Slowly Changing Dimensions in Integration Services - Part 2

In the condition, we check if the new value of the field Location is different from the current value (which was retrieved in the lookup):

If a change has occurred, a new row has to be inserted into the dimension. But at the same time, we also need to update the previous version. To solve this issue, we’ll use a multicast to create two copies of the row. One we’ll send to the OLE DB Destination, the other to an update. We’re going to reuse the OLE DB Destination that inserts new rows into the dimension. To do this, we place a UNION ALL to merge the two streams together.

The data flow looks like this:

The UNION ALL component has the following configuration:

The insert of new rows is now covered in the data flow. We’re still left with the updates though. There are two types of updates:

Updating the ValidTo field of the previous version when a Type 2 field has occurred Updating all other fields when Type 1 changes have occurred

When using the SCD Type 2 wizard in SSIS, the OLE DB Command is used to issue the updates against the database. The problem with this transformation is that an update is sent to the database for every single row. This puts a burden on the transaction log and is much slower than a single batch update. For large dimensions, this can cause performance issues. To work around this, we’ll create two tables. The following T-SQL is used:

DROP TABLE IF EXISTS [dbo].[UPD_DimCustomer_SCD2];
CREATE TABLE [dbo].[UPD_DimCustomer_SCD2](
[SK_Customer] [int] NOT NULL,
[ValidFrom] [date] NOT NULL
);
DROP TABLE IF EXISTS [dbo].[UPD_DimCustomer_SCD1];
CREATE TABLE [dbo].[UPD_DimCustomer_SCD1](
[CustomerName] [VARCHAR](50) NOT NULL,
[Email] [varchar](50) NULL
);

Keep in mind that DROP TABLE IF EXISTS is only valid since SQL Server 2016. These T-SQL statements are executed in an Execute SQL Task right before the data flow task.

After the data flow task, another Execute SQL Task can be added to drop the update tables, as some sort of clean-up. While debugging, it might be interesting not to drop these tables as you might want to inspect their contents.

To make sure this pattern works, you have to set the DelayValidation property of the data flow to True . If not, the data flow will fail validation because the update tables won’t exist yet when the package starts. Let’s continue with the data flow. Add another OLE DB Destination to write the type 2 changes to the dbo.UPD_DimCustomer_SCD2 table:

Here we’ll map the surrogate key retrieved from the lookup and the ValidFrom field:

Finally, add a third OLE DB Destination to write the type 1 changes to the dbo.UPD_DimCustomer_SCD1 table.

In the mapping pane, the business key of the dimension CustomerName and all type 1 fields of the dimension are mapped to the update table.

The data flow is now finished. The last step is now configuring the updates. Add an Execute SQL Task between the data flow and the last Execute SQL Task:

First, we need a T-SQL update statement that sets the ValidTo field of the previous version of a row:

UPDATE d
SET [ValidTo] = DATEADD(DAY,-1,[u].[ValidFrom])
FROM [dbo].[DimCustomer] d
JOIN [dbo].[UPD_DimCustomer_SCD2] u ON [u].[SK_Customer] = [d].[SK_Customer];

Because we join on the surrogate key, this update statement will have good performance since this column is typically the primary key of the dimension (and thus has a clustered index on it when the defaults were followed).

The ValidTo field is set to one day prior to the ValidFrom field.

Next, we update the type 1 columns using the business key:

UPDATE d
SET [Email] = u.[Email]
FROM dbo.[DimCustomer] d
JOIN dbo.[UPD_DimCustomer_SCD1] u ON u.[CustomerName] = d.[CustomerName]
WHERE ISNULL(d.[Email],’’) <> ISNULL(u.[Email],’’);

Here we join on the business key. Ideally, a unique index is placed on the business key columns to enforce uniqueness. Therefore, this update statement should have good performance as well. To minimize the number of updates, we also check if the value for email has changed it all. To take NULL values into account, we wrap both columns in an ISNULL function. If the column is not nullable, this isn’t necessary of course.

Testing the Package

The package is now finished, so let’s test some scenarios. Currently, the following data is in the staging table and in the dimension:

Let’s change the location to California in the staging table, triggering a type 2 change. The data flow looks like this when running the package:

The data in the tables:

The package was run on the 4 th of August. This means ValidFrom is the 3 rd of

Optimizing Type 2 Slowly Changing Dimensions in Integration Services - Part 2

Trending Articles

SM3268AB 8CE三星量产无法格式化

[下载工具]Think4V utubedown(Youtube高清视频下载工具) v2.1.6 官方版2.1.3

出售: SINE Othello 電源線

博讯｜张磊帮助下，李源潮的儿子被耶鲁录取

FullEventLogView 1.73 免安裝中文版 - 事件檢視器取代工具

同門四角戀？李沛旭喇舌「小郭雪芙」曾智希，蔡淑臻拍完婚紗...怒毀婚

五代RAV4 降車身（機械車位因素）

[攻略] 《魔獸世界》6.2.2 白色魚人蛋再現！來去收編魚人寶寶特基！

jetBrains Product crack 2024 Java based

2013 KUGA 6G轉動方向盤會聽到摳摳摳的異音，有人知道原因嗎?

【豌豆字幕組】[藥屋少女的呢喃（藥師少女的獨語）/ Kusuriya no Hitorigoto][25][繁體][1080P][MP4]

好用的照片后期处理软件【DxO PhotoLab Elite 5.4.0.4765 (x64) 多语言便携版】..

出售: Thixar Silence Plus 啫喱板

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

三條崙討海人故事…重建烏倉寮憶43年前船難

致喬立建設道歉聲明

[一般] 神州全地圖掉寶資料

方易通7862 8/128G 無360 刷機

動感校園小記者・瑪利諾修院學校｜採訪王瑋駿陳晞文帶領試玩風帆

有藍電流行車紀錄器分享文嗎