Columnstore Indexes part 99 (“Merge”)

Continuation from the previous 98 parts, the whole series can be found at http://www.nikoport.com/columnstore/ .

This blog post is focused on the MERGE statement for the Columnstore Indexes, or as I call it the worst enemy of the Columnstore Indexes. It is extremely difficult to imagine some statement or way of making the worst out of the Columnstore Indexes, if not the infamous MERGE statement. Why ? Because it is not only making Columnstore Indexes perform slow, it will make them perform MUCH SLOWER then any Rowstore Indexes . Yes, you have read right slower then ANY_ROWSTORE_INDEXES. In fact, this should be a hint that one should apply to the Merge statement, when it is executed against Columnstore Indexes!

I decide to dedicate a whole blog post on this matter, mainly to warn people of this pretty problematic statement I hope not to see it being used for Columnstore Indexes in the future!

MERGE T-SQL statement has a huge number of bugs and potential problems with some statements delivering incorrect results or being canceled for the details I recommend that you read Use Caution with SQL Server’s MERGE Statement where Aaron Bertrand went into the details of why one should strive to avoid using this statement.

You might point that a couple of years ago, I have already published a blog post on the dangers of using UPDATE statement for the Columnstore Indexes , but as I keep seeing MERGE statement on the production servers, it is clearly deserves an own post.

As in the other blog post I will be using the a generated copy of the TPCH database (10GB version this time, because I want my tests to reflect bigger workloads), that I have generated with HammerDB (free software).

Below you will find the script for restoring the backup of TPCH from the C:\Install\

USE [master] if exists(select * fromsys.databaseswherename = 'tpch') begin alterdatabase [tpch] setSINGLE_USERWITHROLLBACKIMMEDIATE; end RESTOREDATABASE [tpch] FROMDISK = N'C:\Install\tpch_10gb.bak' WITHFILE = 1, NOUNLOAD,STATS = 1 alterdatabase [tpch] setMULTI_USER; GO GO ALTERDATABASE [tpch] SETCOMPATIBILITY_LEVEL = 130 GO USE [tpch] GO EXECdbo.sp_changedbowner @loginame = N'sa', @map = false GO USE [master] GO ALTERDATABASE [tpch] MODIFYFILE ( NAME = N'tpch',FILEGROWTH = 2561520KB ) GO ALTERDATABASE [tpch] MODIFYFILE ( NAME = N'tpch_log', SIZE = 1200152KB , FILEGROWTH = 256000KB )

I will be showing the data loading on the sample table dbo.lineitem and as previously, here is the setup script for creating the Clustered Columnstore Index on it:

USE [tpch] GO DROPTABLEIF EXISTSdbo.lineitem_cci; -- DataLoding SELECT [l_shipdate] ,[l_orderkey] ,[l_discount] ,[l_extendedprice] ,[l_suppkey] ,[l_quantity] ,[l_returnflag] ,[l_partkey] ,[l_linestatus] ,[l_tax] ,[l_commitdate] ,[l_receiptdate] ,[l_shipmode] ,[l_linenumber] ,[l_shipinstruct] ,[l_comment] intodbo.lineitem_cci FROM [dbo].[lineitem]; -- CreateClusteredColumnstoreIndex createclusteredcolumnstoreindexcci_lineitem_cci ondbo.lineitem_cci;

Now we have a sweet 60 million rows within our lineitem table.

For the data loading test, I will be using the following staging table:

USE [tpch] GO DROPTABLEIF EXISTS [dbo].[lineitem_cci_stage]; CREATETABLE [dbo].[lineitem_cci_stage]( [l_shipdate] [date] NULL, [l_orderkey] [bigint] NOT NULL, [l_discount] [money] NOT NULL, [l_extendedprice] [money] NOT NULL, [l_suppkey] [int] NOT NULL, [l_quantity] [bigint] NOT NULL, [l_returnflag] [char](1) NULL, [l_partkey] [bigint] NOT NULL, [l_linestatus] [char](1) NULL, [l_tax] [money] NOT NULL, [l_commitdate] [date] NULL, [l_receiptdate] [date] NULL, [l_shipmode] [char](10) NULL, [l_linenumber] [bigint] NOT NULL, [l_shipinstruct] [char](25) NULL, [l_comment] [varchar](44) NULL ) ON [PRIMARY] GO CREATECLUSTEREDCOLUMNSTOREINDEX [cci_lineitem_cci_stage] ON [dbo].[lineitem_cci_stage]; GO

For the test I decided to load from the staging and update our lineitem_cci table with 1 and with 5 million rows.

First of all let us start with extracting 1 million rows and putting them into the staging table:

TRUNCATETABLE [dbo].[lineitem_cci_stage]; INSERTINTO [dbo].[lineitem_cci_stage] WITH (TABLOCK) SELECTTOP 1000000 l_shipdate, l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_returnflag, l_partkey, l_linestatus, l_tax, l_commitdate, l_receiptdate, l_shipmode, l_linenumber, l_shipinstruct, l_comment FROMdbo.lineitem_cci;

Let’s MERGE the data from the Stage table into our main one, like a lot of people are doing. Surely this statement won’t hurt my test VM, running on the Azure Standard DS12 v2 (4 cores, 28 GB memory):

MERGEINTO [dbo].[lineitem_cci] AS [Target] USING ( SELECT l_shipdate, l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_returnflag, l_partkey, l_linestatus, l_tax, l_commitdate, l_receiptdate, l_shipmode, l_linenumber, l_shipinstruct, l_comment FROMdbo.lineitem_cci_stage ) AS [Source] ON Target.L_ORDERKEY = Source.L_ORDERKEY AND Target.L_LINENUMBER = Source.L_LINENUMBER WHENMATCHEDTHEN UPDATE SETl_shipdate = Source.l_shipdate, l_orderkey = Source.l_orderkey, l_discount = Source.l_discount, l_extendedprice = Source.l_extendedprice, l_suppkey = Source.l_suppkey, l_quantity = Source.l_quantity, l_returnflag = Source.l_returnflag, l_partkey = Source.l_partkey, l_linestatus = Source.l_linestatus, l_tax = Source.l_tax, l_commitdate = Source.l_commitdate, l_receiptdate = Source.l_receiptdate, l_shipmode = Source.l_shipmode, l_linenumber = Source.l_linenumber, l_shipinstruct = Source.l_shipinstruct, l_comment = Source.l_comment WHENNOT MATCHEDBYTARGETTHEN INSERT (l_shipdate, l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_returnflag, l_partkey, l_linestatus, l_tax, l_commitdate, l_receiptdate, l_shipmode, l_linenumber, l_shipinstruct, l_comment) VALUES (Source.l_shipdate, Source.l_orderkey, Source.l_discount, Source.l_extendedprice, Source.l_suppkey, Source.l_quantity, Source.l_returnflag, Source.l_partkey, Source.l_linestatus, Source.l_tax, Source.l_commitdate, Source.l_receiptdate, Source.l_shipmode, Source.l_linenumber, Source.l_shipinstruct, Source.l_comment);

It took 27.3 seconds on the average to execute this operation (notice again, I am not pitching you here canceled transaction, because of the repeatedly touched row or any of other bugs or inconsistencies) with 29.5 seconds CPU time burnt.

Is it good ?

Is it bad ?

Let us find out by running the same data loading procedure, but this time with the help of DELET & INSERT statements:

DELETETarget FROMdbo.lineitem_ccias Target WHEREEXISTS ( SELECT 1 FROMdbo.lineitem_cci_stageas Source WHERETarget.L_ORDERKEY = Source.L_ORDERKEY AND Target.L_LINENUMBER = Source.L_LINENUMBER ); INSERTINTOdbo.lineitem_cciWITH (TABLOCK) SELECTl_shipdate, l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_returnflag, l_partkey, l_linestatus, l_tax, l_commitdate, l_receiptdate, l_shipmode, l_linenumber, l_shipinstruct, l_comment FROM [dbo].[lineitem_cci_stage];
Columnstore Indexes part 99 (“Merge”)

Columnstore Indexes part 99 (“Merge”)

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本