DELETE operation in SQL Server HEAPs

When I first recovered from the surprise of seeing the huge number of heaps in the database project that I’d just become involved with, I was left with a burning curiosity as to how and why it had been done that way. After all a heap, which is a table without a clustered index, isn’t generally useful in SQL Server. It was then explained to me that all Clustered Indexes have been removed to curb the heavy frequency of deadlocks that afflicted the system. When they reengineered their database structure, the deadlocks indeed went, but the team had not considered the drawbacks that would be the inevitable consequence of working with heaps and these drawbacks were on top of the disadvantages of losing the indexes! In this article, I’ll be describing just one of these drawbacks, that of the complications of DELETE operations on heaps.

Preamble

I am not one of those who argue that heaps have no place in a SQL Server database. I personally favor using heaps as often as possible: Data Warehouses are good candidates for heaps. My own opinion about heaps has been influenced by two very interesting blog posts about the difference of Heaps and Clustered Indexes.

Unreasonable defaults: Primary Key as Clustering Key Markus Wienand Clustered Index vs. Heap Thomas Kejser

Although it is generally better to use clustered indexes than heaps, especially for reasons of maintenance, it is not universally so. However, you need to be aware of the drawbacks of heaps. The reason for these “drawbacks” are based on the structure of a nonclustered index. A nonclustered index in a heap stores the position of the record because there is no order criteria for a lookup. If a heap gets rebuilt, all the nonclustered indexes have to be updated, too because the position of the records changes. I recommend to my customers that, before they decide between a heap and a clustered index, they should analyze the following:

the workloads the type of SELECT statements the impact to nonclustered indexes the maintenance requirements (fragmentation, forwarded records, DML operations)

When all conditions are perfect, then it is better to use a Heap rather than a Clustered Index. When one of these conditions are wrong, then you are likely to regret the choice of heaps.

A customer of mine failed to analyze these three issues, focusing only on the second option. What the team didn’t explore sufficiently was the deep impact of DELETE operations on Heaps and the impact of UPDATE operations in conjunction with so called Forwarded Records (you can read more about the drawbacks of Forwarded Records here: ‘How forwarded records are read and processed in a sql server heap’ . To illustrate the impact in DELETE operations, we’ll need to run a test.

Environment for test

We will use a simple heap structure for all the following examples. The size of a record is 8,004 bytes so only one record will fit on a data page. The table will be filled with 20,000 records.

-- Create a HEAP for the demo CREATE TABLE dbo.demo_table ( Id INT NOT NULL IDENTITY (1, 1), C1 CHAR(8000) NOT NULL DEFAULT ('Das ist nur ein Test') ); GO -- Now we fill the table with 20,000 records SET NOCOUNT ON; GO INSERT INTO dbo.demo_table WITH (TABLOCK) (C1) SELECT TOP 20000 text FROM sys.messages; When the table is filled up, we will have a total of 20,001 data pages in the buffer pool of the Microsoft SQL Server. -- what resource of the table dbo.demo_table are in the buffer pool now! ;WITH db_pages AS ( SELECT DDDPA.page_type, DDDPA.allocated_page_file_id, DDDPA.allocated_page_page_id, DDDPA.page_level, DDDPA.page_free_space_percent, DDDPA.is_allocated sys.dm_db_database_page_allocations ( DB_ID(), OBJECT_ID(N'dbo.demo_table', N'U'), NULL, NULL, 'DETAILED' ) AS DDDPA ) SELECT DOBD.file_id, DOBD.page_id, DOBD.page_level, DOBD.page_type, DOBD.row_count, DOBD.free_space_in_bytes, DP.page_free_space_percent, DP.is_allocated FROM sys.dm_os_buffer_descriptors AS DOBD INNER JOIN db_pages AS DP ON ( DOBD.file_id = DP.allocated_page_file_id AND DOBD.page_id = DP.allocated_page_page_id AND DOBD.page_level = DP.page_level ) WHERE DOBD.database_id = DB_ID() ORDER BY DP.page_type DESC, DP.page_level DESC, DOBD.page_id, DOBD.file_id;
DELETE operation in SQL Server HEAPs

Every data page of the heap is completely filled up.

With the next statement 1,000 records will be deleted from the Heap. Due to the fact that only ONE record fits on a data page, we would expect that the page for each deleted record would then become empty, and ready to be recycled.

-- Now we delete half of the records SET ROWCOUNT 2000; DELETE dbo.demo_table WHERE Id % 2 = 0;

By analyzing the buffer pool we can see when data is deleted from a heap and the data pages become free.

We would expect that empty data pages will be deallocated and given back to the database, but what has happened is very different. The result above shows the allocated buffer pool. The empty data pages allocate the buffer pool and although they don’t have any data stored they consume 8.192 bytes. With 1,000 deleted records this adds up to a wasted reservoir of nearby 8 MB in the buffer pool.

Why is this happening? We need to investigate further.

Reading data pages in a Heap

When a heap is used, data can only be accessed by a Table Scan. A Table Scan means always accessing the entire table. When all data has been read, the data will be filtered by the dedicated predicate.

SELECT</code> <code>* FROM</code> <code>dbo.demo_table WHERE</code> <code>Id = 10 OPTION</code> <code>(QUERYTRACEON 9130);

The above example generates the following execution plan. Please note that TF 9130 has been used to make the FILTER-Operator visible in the execution plan!

The structure of a Heap is, by definition, a bunch of data without any specific order. For this reason a Heap works like a jigsaw puzzle, and finding data in the heap is the same as having to check every single piece of the puzzle. Another important difference to any index structure is the fact that data pages don’t have a sequence. A single data page does not know where the next or the previous one is.

A data page in a heap is isolated and controlled by the IAM page (Index Allocation Map) of the table. Think about the IAM as the container which holds all the data pages together. When, for example, Microsoft SQL Server accesses page 110 it does not know what or where the next one is. For that reason, Microsoft SQL Server reads the IAM data page first to determine what pages it has to read for a full table scan. Based on this information, the process can start and ev

Latest Images

Trending Articles

Latest Images