Continuation from the previous 94 parts, the whole series can be found at http://www.nikoport.com/columnstore/ .
In this blog post I am continuing showing some of the solutions for the common problematic places for the Columnstore Indexes.
Here I will focus on 2 important aspects of the query execution plans that are using the Columnstore indexes:
Non-Hash Joins
Query construction (focus on the engine improvements)
This is not a complete list of the patterns that I am pretending to publish, more stuff will be added as soon, but these 2 are quite common in my experience.
Before advancing I will use my own generated copy of the TPCH database (1GB version), that I have done with the help of the HammerDB .
As usually, the location of my backup file is C:\Install and I will use the following script to restore and reconfigure the database on SQL Server 2016 instance by making sure that the compatibility level is set to 130:
/* * This script restores backup of the TPC-H Database from the C:\Install */ USE [master] if exists(select * fromsys.databaseswherename = 'tpch') begin alterdatabase [tpch] setSINGLE_USERWITHROLLBACKIMMEDIATE; end RESTOREDATABASE [tpch] FROMDISK = N'C:\Install\tpch_1gb_new.bak' WITHFILE = 1, NOUNLOAD,STATS = 1 alterdatabase [tpch] setMULTI_USER; GO GO ALTERDATABASE [tpch] SETCOMPATIBILITY_LEVEL = 130 GO USE [tpch] GO EXECdbo.sp_changedbowner @loginame = N'sa', @map = false GO USE [master] GO ALTERDATABASE [tpch] MODIFYFILE ( NAME = N'tpch',FILEGROWTH = 256152KB ) GO ALTERDATABASE [tpch] MODIFYFILE ( NAME = N'tpch_log', SIZE = 1200152KB , FILEGROWTH = 256000KB )Let’s convert 2 biggest tables of the TPCH (lineitem & orders) to the columnstore, by creating the copies of the respective tables with Clustered Columnstore Indexes:
USE [tpch] GO droptableif existsdbo.lineitem_cci; -- DataLoding SELECT [l_shipdate] ,[l_orderkey] ,[l_discount] ,[l_extendedprice] ,[l_suppkey] ,[l_quantity] ,[l_returnflag] ,[l_partkey] ,[l_linestatus] ,[l_tax] ,[l_commitdate] ,[l_receiptdate] ,[l_shipmode] ,[l_linenumber] ,[l_shipinstruct] ,[l_comment] intodbo.lineitem_cci FROM [dbo].[lineitem]; GO -- CreateClusteredColumnstoreIndex createclusteredcolumnstoreindexcci_lineitem_cci ondbo.lineitem_cci; USE [tpch] GO DROPTABLEIF EXISTSdbo.orders_cci; SELECT [o_orderdate] ,[o_orderkey] ,[o_custkey] ,[o_orderpriority] ,[o_shippriority] ,[o_clerk] ,[o_orderstatus] ,[o_totalprice] ,[o_comment] intodbo.orders_cci FROM [dbo].[orders]; createclusteredcolumnstoreindexcci_orders_cci ondbo.orders_cci; GO Non-Hash JoinsThe importance of having Hash Joins in the SQL Server with the Columnstore Indexes can not be overstated. Unless you are processing very small amounts of data out of the Columnstore Indexes (under 1000), the only join type you will want to see is the HASH JOIN.
The difference can be abysmal in the terms of the used resources and overall performance that it will take you down not only on the level of the rowstore indexes, but because you might be dealing with the
Consider running the following query which is returning top 100 orders and the sum of the top 10 line item volume sales for each of the orders:
setstatisticstime, ioon selectTOP 100 o_orderkey, o.o_orderdate, MAX(o.o_totalprice) as orderprice, SUM(Volume) as Top10VolumeSales fromdbo.orders_cci o crossapply ( selecttop 10 (l_extendedprice * (1 - l_discount)) as volume fromlineitem_ccili whereo_orderkey = li.l_orderkey orderby(l_extendedprice * (1 - l_discount)) desc ) f where o.o_orderstatus <> 'F' groupby o.o_orderdate, o_orderkey ORDERBYSUM(o.o_totalprice) DESCOn my VM with 4 cores it takes 33 seconds to execute this query on SQL Server 2016 with Service Pack 1, while it burns almost 48 seconds of the CPU Time .
The relevant part of the execution plan can be found below,

showing so many performance problems that this query is suffering, such as INNER LOOP JOIN, INDEX SPOOL, besides even worse part that is actually hidden and is identifiable only once you open the properties of any of the lower tree (left side of the LOOP JOIN), seeing that it all runs with the Row Execution Mode actually.

To show you the problem, on the left side you will find the properties of the sort iterator that is to be found in the lower (left) part of the LOOP Join that was executed around 770.000 times in the Row Execution Mode, effectively taking any chances away from this query to be executed in a fast way. One might argue that it might that it might be more effective to do the loop part in Row Mode, but given that we are sorting around 3.1 Million Rows there for me there is no doubt that it would be faster to do it within a Batch Execution Mode. Consulting the last sort iterator in the execution plan (TOP N SORT), you will find that it is running with the help of the Batch Execution Mode, even though it is processing around 770.000 rows.
One of the possible solutions that I see in such cases is to rewrite the query where possible without the CROSS APPLY part, thus enabling the query to be executed with the Batch Execution Mode completely by using HASH JOIN for the connection between the 2 tables:
setstatisticstime, ioon selectTOP 100 o.o_orderkey, o.o_orderdate, MAX(o.o_totalprice) as orderprice, SUM(Volume) as Top10Volume fromdbo.orders_cci o innerjoin ( selectROW_NUMBER() OVER (PARTITIONBYl_orderkeyORDERBYsum(l_extendedprice * (1 - l_discount)) desc) AS rowNumber, sum(l_extendedprice * (1 - l_discount)) as volume, li.l_orderkey fromlineitem_ccili groupbyl_orderkey ) f ono_orderkey = f.l_orderkey where o.o_orderstatus <> 'F' and rowNumber < 10 groupby o.o_orderdate, o.o_orderkey ORDERBYSUM(o.o_totalprice) DESCTake a look at the relevant part of the execution plan below:

here we have every single iterator running in the Batch Execution Mode, making this query lightning fast to be executed under 1 seconds with just 2.8 seconds spent off the CPU time . There is no need to compare the performance of these 2 queries even though both of them are using Columnstore Indexes and the first one is using Batch Execution Mode in around 40% of the existing iterators.
Naturally it is not always possible to rewrite a complex DWH query in such a light way as in this example, but the idea here is to provide the tip that the LOOP join is typically not a good thing to be found within your execution plan with Columnstore Indexes when dealing with hundreds of thousands or millions of rows. There are times when Query Optimiser will select the LOOP JOIN because of the statistics distribution or any other possible reasons, and if there is nothing that you can do about it, then your natural option should be forcing the HASH JOIN with the OPTION (HASH JOIN) or HASH JOIN hints.
Another important aspect to notice here is that the CROSS APPLY does not automatically mean that the execution plan will use NESTED LOOP JOIN, please consider the notice the following query below which is using HASH JOINS and runs fast as it should:
selectTOP 100 o_orderkey, o.o_orderdate, MAX(o.o_totalprice) as orderprice, SUM(Volume) as