Real-time Operational Analytics in SQL Server 2016

Introduction

SQL Server 2016 introduces the Real-time Operational Analytics feature, which allows you to run both transactional/operational and analytics workloads on the same system without having a need for another analytics system for reporting.

In the last article of the series, we looked into different design considerations for leveraging the Real-time Operational Analytics feature, we looked into different scenarios where it fits or where it does not fit and then finally we got started with it for disk based tables.

In this article we will learn about creating a filtered, non-clustered columnstore index to minimize the overhead of maintaining it and will learn about making use of this feature on memory-optimized tables.

Getting Started with Real-time Operational Analytics

SQL Server 2016 supports real-time operational analytics for both traditional disk-based and memory-optimized tables without requiring any changes to your transactional workload.

Disk Based Tables

Related Articles Getting Started with Stretch Database Functionality in SQL Server 2016 Part 1 Getting Started with Stretch Database Functionality in SQL Server 2016 - Part 2 Row Level Security with SQL Server 2016: Part 1 Allow Access to Only a Subset of Rows Using Row Level Security Row Level Security with SQL Server 2016: Part 2 - Blocking Updates at the Row Level Altering an Existing Table to Support Temporal Data Getting Started with JSON Support in SQL Server 2016 Part 2 Getting Started with Temporal Table in SQL Server 2016 Part 2 Getting Started with Temporal Table in SQL Server 2016 Part 1

In my last article, we learned about creating a non-clustered columnstore index for real-time operational analytics. We also saw how usage of a non-clustered columnstore index improves performance of analytical queries several folds. To summarize, here is what we’ve learned about running analytical queries in SQL Server 2016:

SQL Server 2016 allows you to create an update-able non-clustered columnstore index (as shown in the figure below) and the table below has details on how that update happens: Operation B-Tree (Non-Clustered Index - NCI) Non-Clustered Columnstore Index (NCCI) Insert

Insert row into btree

Insert row into btree (delta store)

Delete

1. Seek row(s) to be deleted

2. Delete the row

1. Seek for the row in the btree (delta stores) - there can be multiple

2. If row found, then delete it in btree (delta stores)

3. Otherwise insert the key into delete bitmap

Update

1. Seek row(s) to be updated

2. Update row(s)

1. Delete the existing row(s) - steps same as above

2. Insert the updated row into btree (delta stores)

You can consider dropping all other indexes that you created for analytics as newly created non-clustered columnstore index will be used for analytics Query Optimizer will choose non-clustered columnstore index whenever needed
Real-time Operational Analytics in SQL Server 2016 - Part 3

Real-time Operational Analytics in SQL Server 2016 - Part 3

B-tree Index

Notes As you can see in the table above, fragmentation starts occurring for non-clustered columnstore index when: A compressed row is deleted as it is actually not removed immediately but rather marked as deleted. It continues to consume disk-storage and memory when one or more columns of the compressed row group are read into memory. It also degrades query performance because these rows need to be filtered out before returning the query results. A compressed row is updated, as an existing row is deleted (as described above) and then the new row is inserted into delta store/row group. For this workload pattern, we will end up compressing the same row several times along with overhead of deleted rows as described in the previous point.

Usage of Filtered Index

As you can guess by looking at the figure and table above, an update is often a multistep process, expensive operation on non-clustered columnstore index and hence SQL Server 2016 not only makes non-clustered columnstore index update-able but it goes one step further to provide an option to make it a filtered index in order to minimize the overhead of maintaining non-clustered columnstore index (on disk-based tables only). This means you can define a filtered predicate to create a non-clustered columnstore index only on the warm or slowly changing data.

B-tree Clustered Index

This means, non-clustered columnstore index exists on slowly changing data and makes a good case for application like order management system. For example, when an order is placed, it moves through several phases (hot data) and once it is delivered it is marked as received (once the order has been received, it rarely changes after that and therefore can be considered warm data).

Now suppose if you want to run an analytic query for received order only, it will access only warm data from the non-clustered columnstore index but if you run analytic query for received and shipped orders, the analytics query transparently accesses both warm and hot data as needed to provide real-time analytics.

--Create the non-clustered columnstore index with a filtered condition CREATE NONCLUSTERED COLUMNSTORE INDEX [NCCS_OrderHeaderWithRowStoreClusteredIndexAndNonClusteredColumnStoreIndex_All] ON OrderHeaderWithRowStoreClusteredIndexAndNonClusteredColumnStoreIndex (AccountKey, CustomerName, PurchasePrice, OrderStatus) WHERE OrderStatus = 5 -- 5 => 'Order Received'

SQL Server optimizer will consider, but not always choose, the non-clustered columnstore index for the analytical query execution. When the query optimizer chooses the filtered non-clustered columnstore index, it transparently combines the rows both from the columnstore index as well as the rows that do not meet the filtered condition to allow real-time analytics. This is another improvement and different from a regular filtered index, which can be used only in queries that restrict themselves to the rows present in the filtered index.

Here are summary, considerations and best practices for improving performance using a Filtered Index:

To minimize the overhead of maintaining a non-clustered columnstore index for operational workload, you can use a filtered condition to create a non-clustered columnstore index only on the warm or slowly changing data. Filtered columnstore index is only supported on disk-based tables. It is not supported on memory-optimized tables. Have a clustered B-Tree index on the column(s) used in the filtered condition to identify the ‘hot’ rows efficiently. The idea behind having clustered B-Tree index and filtered non-clustered columnstore index is to access frequently changing “hot” data from the B-Tree index, and to access the more stable "warm" or "cold" data from the filtered non-clustered columnstore index. Usage of Compression Delay

In situations where you cannot classify data based on a filter condition to provide a time duration on how long the data is going to be hot, SQL Server 2016 provides an option to classify it (warm and hot data) based on time using a setting called Compression Delay . Rows are placed in the delta store/row group initially and moved to columnstore indexes (in compressed form) after the compression delay elapses.

B-tree Clustered Index

In the figure above, the hot data is present in the delta rowgroups until the compression delay time elapses, at which point it will be converted to a columnstore compressed format. You can set the compression delay up to as far as seven days (the valid range for disk-based table is between (0, 10080) minutes and for memory-optimized table is zero or between (60, 10080) minutes) to stay in the delta stores/rowgroups. This gives you flexibility to have an exponentially increasing delta rowgroups to accommodate all the DML activity for the rows during the compression delay.

If you think of this from a different perspective, Compression Delay helps in scenarios where you don't have a filter predicate (to differentiate between warm and hot data) as well as it also helps to minimize columnstore fragmentation (which mostly happens as part of a delete or update on columnstore compressed data). If you have large fragmentation, it causes inefficient utilization of memory/storage, negatively impacting performance of the analytical queries as it needs to filter the deleted rows from the result set. Hence with Compression Delay you can control the compression for data between slowly changing or frequently changing data.

Memory-optimized Tables

SQL Server 2016 combines memory-optimized table for extreme OLTP and in-memory analytics using columnstore to deliver real-time operational analytics, without changes to your operational workload. If you want to use real-time operational analytics on the in-memory table, you can go ahead and create a columnstore index on top of your in-memory table, which has either Hash index or a Range index or both.

As you can see in the figure below, a columnstore index is created, shown at the bottom along with Deleted Row Table (you can think of Deleted Rows Table - DRT as Delete bitmap, which keeps track of all the deleted rows in the columnstore index on the table).

All the recently changed data due to the recent DML activity is written to the Tail of the in-memory table (you can think of Tail as Delta store, which contains all the rows in the in-memory table which are not yet compressed into columnstore format).

Like non-clustered columnstore index, when the number of rows in the Tail of the in-memory table exceeds 1 million, a background thread compresses rows in the unit of 1 million row group. Also an analytics query will combine rows both from the columnstore index and the Tail automatically without requiring any changes to the query.

Hash Index

Doing real-time operation analytics on an in-memory table has a few differences though when compared to doing it on traditional disk based tables:

You need to include all the columns of the memory optimized table in the columnstore index and hence it is referred to as a clustered columnstore index in the declaration (you can think of this as a special type of non-clustered columnstore index). Columnstore index is fully memory resident, which consumes more memory though the size requirement for storing columnstore index should be much smaller because of the aggressive compression it uses. SQL 2016 also increases the size limit for in-memory OLTP up to 2TB. As you can see in the figure above, there is no explicit delta store. The new rows are only inserted into the memory-optimized table. The rows that are in the memory-optimized table but still not in columnstore are referred to as the Tail or "virtual delta store".

You can reference the script below for creating a memory optimized table with columnstore index. As you can see, the columnstore index is clustered and hence we don’t need to specify individual columns as all the columns of the table are already included.

-- Create a memory-optimized table with a columnstore index CREATE TABLE SalesOrderDetailWithMemoryOptimizedAndClusteredColumnStoreIndex( [SalesOrderID] [int] NOT NULL , [SalesOrderDetailID] [int] NOT NULL PRIMARY KEY NONCLUSTERED, [CarrierTrackingNumber] [nvarchar](25) NULL, [OrderQty] [smallint] NOT NULL, [ProductID] [int] NOT NULL, [SpecialOfferID] [int] NOT NULL, [UnitPrice] [money] NOT NULL, [UnitPriceDiscount] [money] NOT NULL, [LineTotal] [money], [ModifiedDate] [datetime] NOT NULL, INDEX SalesOrderDetail_CCI CLUSTERED

Real-time Operational Analytics in SQL Server 2016 - Part 3

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本