Optimize Large SQL Server Insert, Update and Delete Processes by Using Batches

By: Eduardo Pivaral || Related Tips:More >T-SQL

Problem

Sometimes you must perform DML processes (insert, update, delete or combinations of these) on large SQL Server tables. If your database has a high concurrency these types of processes can lead toblocking or filling up the transaction log , even if you run these processes outside of business hours. So maybe you were tasked to optimize some processes to avoid large log growths and minimize locks on tables. How can this be done?

Solution

We will do these DML processes using batches with the help of @@ROWCOUNT . This also give you the ability to implement custom “stop-resume” logic. We will show you a general method, so you can use it as a base to implement your own processes.

Please note that we will not focus onindexes on this tip, of course this can help queries, but I want to show you a worst-case scenario and index creation is another topic.

Basic algorithm

The basic batch process is something like this:

DECLARE @id_control INT
DECLARE @batchSize INT
DECLARE @results INT
SET @results = 1 --stores the row count after each successful batch
SET @batchSize = 10000 --How many rows you want to operate on each batch
SET @id_control = 0 --current batch
-- when 0 rows returned, exit the loop
WHILE (@results > 0)
BEGIN
-- put your custom code here
SELECT * -- OR DELETE OR UPDATE
FROM <any Table>
WHERE <your logic evaluations>
(
AND <your PK> > @id_control
AND <your PK> <= @id_control + @batchSize
)
-- very important to obtain the latest rowcount to avoid infinite loops
SET @results = @@ROWCOUNT
-- next batch
SET @id_control = @id_control + @batchSize
END

To explain the code, we use a WHILE loop and run our statements inside the loop and we set a batch size (numeric value) to indicate how many rows we want to operate on each batch.

For this approach, I am assuming the primary key is either an int or a numeric data type, so for this algorithm to work you will need that type of key. So for alphanumeric or GUID keys, this approach won't work, but you can implement some other type of custom batch processing with some additional coding.

So, with the batch size and the key control variable, we validate the rows in the table are within the range.

Important Note:Your process will need to always operate on at least some rows in each batch. If a batch does not operate on any rows, the process will end as row count will be 0. If you have a situation where only some rows from a large table will be affected, it is better and more secure to use the index/single DML approach. Another approach for these cases is to use a temporary table to filter the rows to be processed and then use this temp table in the loop to control the process.

Our example setup We will use a test table [MyTestTable] with this definition: CREATE TABLE [dbo].[MyTestTable](
[id] [bigint] IDENTITY(1,1) NOT NULL,
[dataVarchar] [nvarchar](50) NULL,
[dataNumeric] [numeric](18, 3) NULL,
[dataInt] [int] NULL,
[dataDate] [smalldatetime] NULL,
CONSTRAINT [PK_MyTestTable] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

It contains random information and 6,000,000 records.

Executing a SELECT statement

Here we execute a simple SELECT statement over the entire table. Note, I enabled statistics IO and cleared the data cache first so we have better results for comparison.

DBCC DROPCLEANBUFFERS
SET STATISTICS IO ON
SELECT *
FROM [dbo].[MyTestTable]
WHERE dataInt > 600

These are the IO results:

Table 'MyTestTable'. Scan count 1, logical reads 65415, physical reads 2, read-ahead reads 65398, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The SELECT took 1:08 minutes and retrieved 2,395,317 rows.

Optimize Large SQL Server Insert, Update and Delete Processes by Using Batches

SELECT Statement using batches

For the same SELECT we implement the following process to do it in batches:

DBCC DROPCLEANBUFFERS
SET STATISTICS IO ON
DECLARE @id_control INT
DECLARE @batchSize INT
DECLARE @results INT
SET @results = 1
SET @batchSize = 100000
SET @id_control = 0
WHILE (@results > 0)
BEGIN
-- put your custom code here
SELECT *
FROM [dbo].[MyTestTable]
WHERE dataInt > 600
AND id > @id_control
AND id <= @id_control + @batchSize
-- very important to obtain the latest rowcount to avoid infinite loops
SET @results = @@ROWCOUNT
-- next batch
SET @id_control = @id_control + @batchSize
END

The IO results (for each batch):

Table 'MyTestTable'. Scan count 1, logical reads 1092, physical reads 0, read-ahead reads 1088, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

If we multiply it for 60 batches performed it should be around 65,500 logical reads (approximately the same as before, this makes sense since is the same data we are accessing).

But if we look at the overall execution time, it improves by around 10 seconds, with the same number of rows:

A SELECT statement is probably not the best way to demonstrate this, so let's proceed with an UPDATE statement.

UPDATE Statement using batches

We will do an UPDATE on a varchar field with random data (so our test is more real), after clearing the cache, we will execute the code.

This is a screenshot of the transaction log before the operation.

DBCC DROPCLEANBUFFERS
BEGIN TRAN;
UPDATE [dbo].[MyTestTable]
SET dataVarchar = N'Test UPDATE 1'
WHERE dataInt > 200;
COMMIT TRAN;

The execution took 37 seconds on my machine.

To find the rows affected, we perform a simple count and we get 4,793,808 rows:

SELECT COUNT(1)
FROM [dbo].[MyTestTable]
WHERE dataVarchar = N'Test UPDATE 1'
Optimize Large SQL Server Insert, Update and Delete Processes by Using Batches

Checking the log size again, we can see it grew to 1.5 GB (and then released the space since the database is in SIMPLE mode):

Let's proceed to execute the same UPDATE statement in batches. We will just change the text Test UPDATE 1 for Test UPDATE 2 , this time using the batch process. I also shrunk the transaction log to its original size and perform a cache cleanup before executing.

DBCC DROPCLEANBUFFERS
DECLARE @id_control INT
DECLARE @batchSize INT
DECLARE @results INT
SET @results = 1
SET @batchSize = 1000000
SET @id_control = 0
WHILE (@results > 0)
BEGIN
-- put your custom code here
BEGIN TRAN;

Optimize Large SQL Server Insert, Update and Delete Processes by Using Batches

Trending Articles

拉花比賽曾對決情侶開咖啡館

fabia combi 原廠音響加裝後車門音響喇叭的經驗分享

RADStudio v12.2.29.0.53982.0329 KeyPatch [含附件]

家樂福便利購超市台南開元店開幕

人气声优井上麻里奈裸背写真集「Marilro」美图欣赏

詐騙猖獗網路名師也中鏢江兆君(小M老師)：學員勿上當！

琥珀金開箱

集法荷12大馆珍藏故宫特展探索大航海时代

令Gaussian 16中SCF未收敛到默认收敛限也能继续做后续计算的方法

[黑白字幕组]我的可爱对黑岩目高不管用 / Kuroiwa Medaka ni Watashi no Kawaii ga Tsuujinai [07]...

泰语每日一词：เมา“醉”（Day 726）

出售: PERREAUX 750 POWER AMP

出售: Sound Master TA-377 FET preamp

各类游戏机终极档案PDF

地方扫描－涉贿选员林镇代江世钟一审当选无效

NZXT CAM 3.0.3 中文版 - 電腦溫度監控軟體支援手機遠端監控

onActivityCallback的params.result返回值没有生效

[公告] 無法登入水族箱和解決當機的方式

54歲歌王洪榮宏梅開三度！相識6個月甜娶「小鄧麗君」張瀞云

告發片商強迫拍AV？ SOD「最強新人」竹内乃愛作品被刪光