Searching Strings in SQL Server is Expensive

In the classic spirit of my How to Think Like the Engine class , let’s go take a look at the StackOverflow.Users table and find all the users named Brent. There’s a DisplayName field, and I’m going to be querying that a lot in this blog post, so I’ll go ahead and create an index on it, then look for the Brents:

Part 1: looking for Brent

There’s 858 of us, and I can find them instantly. To get even more exact, we can turn on SET STATISTICS TIME, IO ON andthen check out the Messages tab:

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
(1 row(s) affected)
Table 'Users'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

Woohoo! Nice and fast. Don’t get misled by “scan count 1” it’s actually an index seek, seeking directly to the Brents in the table:

Query plan #1

Beautiful!

Except it’s wrong.

We’re everywhere

Those aren’t all the Brents these are only the people whose DisplayName BEGINS with Brent. There are others, like Sun Brent, DesignerBrent, Dan Brentley, Ben Brenton, and more.

So I’m not happy about this, but I’m going to have to change my query from DisplayName LIKE ‘Brent%’, and I’m going to have to use a leading wildcard.

How bad can it be?

It’s only a second or two

It’s actually not that bad only takes about a second or two to find the 914 %Brent%’s. Sure, our beautiful, quick index seek has become an ugly scan of the entire index . No surprise there we have to find people with Brent anywhere in their name, which means they could be anywhere in the index.

We already know that we’re going to be reading more data, and STATISTICS IO shows it:

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 3 ms.
(1 row(s) affected)
Table 'Users'. Scan count 5, logical reads 22548, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5235 ms, elapsed time = 1672 ms.

Logical reads the number of 8K pages we looked at in order to build our query results went from 8, all the way up to 22,548. So that’s why the query’s taking 1.7 seconds now, right?

Not so fast.

Try the query without a where clause.

Literally, SELECT COUNT(*) FROM dbo.Users;

Wait, that was really fast

It takes no time at all even though it’s returning 5.3mm users!

What do the output statistics say?

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
(1 row(s) affected)
Table 'Users'. Scan count 5, logical reads 22637, physical reads 0, read-ahead reads 17, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 344 ms, elapsed time = 92 ms.

We still read over 22,000 pages but why is it taking just 92 milliseconds to happen now?

It all comes down to the CPU work required to execute this line:

WHERE DisplayName LIKE '%Brent%'

In order to do that, SQL Server burns 5.2 seconds of CPU time cracking open each DisplayName string, and moving through it looking for the pattern Brent anywhere in the string. You can see the effect of it really clearly anytime you execute this query:

CPU spikes to 100%

CPU goes straight to 100% across all four of my cores for 1-2 seconds each time this query runs and this is one of the tiniest tables in the StackOverflow database .

Leading wildcard searches or anything to do with parsing strings just aren’t SQL Server’s strong point, and will burn up a ton of CPU power. When you’re only parsing a few records, like when we looked for Brent%, it’s not that big of a deal. But if I have to parse the whole table, it’s a hot, expensive mess.

How do we know if our code has this problem?

Run the query with SET STATISTICS TIME ON, and examine the output. Look for the execution times line:

SQL Server Execution Times:
CPU time = 5235 ms, elapsed time = 1672 ms.

If CPU time is higher than elapsed time, that means your query went parallel now, that alone isn’t a problem.Even if it didn’t, though say you’ve got 10 seconds of elapsed time, and 9.5 seconds of that time is spent tearing up CPU. That’s bad.

When CPU time is unacceptably high, and you’re reading a relatively low number of pages (like in our case), you might havethis problem. To figure it out, go through the functions and string comparisons in your query, strip them out, and see if the query suddenly runs fast.

So how do we fix it?

Option 1: live with it, and spend money on licensing.As your workload and your volume scales, performance will get worse, linearly. When you double the number of records you have to scan, CPU use will exactly double. The good news here is that as long as your queries don’t have parallelism inhibitors , then they’ll go parallel across multiple cores, and you can divide and conquer. It’s gonna be expensive:

SQL 2014 & prior: Standard Edition maxes out at 16 cores SQL 2016: Standard Edition can do up to 24 cores Beyond 16/24, you’re looking at SQL Server Enterprise Edition

Option B: fix the code.Stop letting users do leading wildcard searches.

But Brent, we have to let them do leading wildcards.

Sometimes the table design requires leading wildcards. For example, at StackOverflow.com, questions can be tagged with up to 5 tags to describe the question’s topic. Check it out in the database:

Tags concatenated into a single field

Yes, that’s how tags were initially stored in the StackOverflow database design. This meant that if you needed to search for questions tagged sql-server, you had to write:

SELECT * FROM dbo.Posts
WHERE Tags LIKE '%<sql-server>%'

That didn’t scale especially for a crew whose motto is performance is a feature so eventually Stack ended upmoving on to…

Option III: create a child table.If you’re storing lists in a single field, break them apart into a separate child table. In Stack’s example, this might be a PostsTags table where:

Id INT IDENTITY PostId INT, r

Searching Strings in SQL Server is Expensive

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本