Quantcast
Channel: CodeSection,代码区,SQL Server(mssql)数据库 技术分享 - CodeSec
Viewing all articles
Browse latest Browse all 3160

Searching Strings in SQL Server is Expensive

$
0
0

In the classic spirit of my How to Think Like the Engine class , let’s go take a look at the StackOverflow.Users table and find all the users named Brent. There’s a DisplayName field, and I’m going to be querying that a lot in this blog post, so I’ll go ahead and create an index on it, then look for the Brents:


Searching Strings in SQL Server is Expensive

Part 1: looking for Brent

There’s 858 of us, and I can find them instantly. To get even more exact, we can turn on SET STATISTICS TIME, IO ON andthen check out the Messages tab:

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
(1 row(s) affected)
Table 'Users'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

Woohoo! Nice and fast. Don’t get misled by “scan count 1” it’s actually an index seek, seeking directly to the Brents in the table:


Searching Strings in SQL Server is Expensive

Query plan #1

Beautiful!

Except it’s wrong.


Searching Strings in SQL Server is Expensive

We’re everywhere

Those aren’t all the Brents these are only the people whose DisplayName BEGINS with Brent. There are others, like Sun Brent, DesignerBrent, Dan Brentley, Ben Brenton, and more.

So I’m not happy about this, but I’m going to have to change my query from DisplayName LIKE ‘Brent%’, and I’m going to have to use a leading wildcard.

How bad can it be?


Searching Strings in SQL Server is Expensive

It’s only a second or two

It’s actually not that bad only takes about a second or two to find the 914 %Brent%’s. Sure, our beautiful, quick index seek has become an ugly scan of the entire index . No surprise there we have to find people with Brent anywhere in their name, which means they could be anywhere in the index.

We already know that we’re going to be reading more data, and STATISTICS IO shows it:

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 3 ms.
(1 row(s) affected)
Table 'Users'. Scan count 5, logical reads 22548, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5235 ms, elapsed time = 1672 ms.

Logical reads the number of 8K pages we looked at in order to build our query results went from 8, all the way up to 22,548. So that’s why the query’s taking 1.7 seconds now, right?

Not so fast.

Try the query without a where clause.

Literally, SELECT COUNT(*) FROM dbo.Users;


Searching Strings in SQL Server is Expensive

Wait, that was really fast

It takes no time at all even though it’s returning 5.3mm users!

What do the output statistics say?

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
(1 row(s) affected)
Table 'Users'. Scan count 5, logical reads 22637, physical reads 0, read-ahead reads 17, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 344 ms, elapsed time = 92 ms.

We still read over 22,000 pages but why is it taking just 92 milliseconds to happen now?

It all comes down to the CPU work required to execute this line:

WHERE DisplayName LIKE '%Brent%'

In order to do that, SQL Server burns 5.2 seconds of CPU time cracking open each DisplayName string, and moving through it looking for the pattern Brent anywhere in the string. You can see the effect of it really clearly anytime you execute this query:


Searching Strings in SQL Server is Expensive

CPU spikes to 100%

CPU goes straight to 100% across all four of my cores for 1-2 seconds each time this query runs and this is one of the tiniest tables in the StackOverflow database .

Leading wildcard searches or anything to do with parsing strings just aren’t SQL Server’s strong point, and will burn up a ton of CPU power. When you’re only parsing a few records, like when we looked for Brent%, it’s not that big of a deal. But if I have to parse the whole table, it’s a hot, expensive mess.

How do we know if our code has this problem?

Run the query with SET STATISTICS TIME ON, and examine the output. Look for the execution times line:

SQL Server Execution Times:
CPU time = 5235 ms, elapsed time = 1672 ms.

If CPU time is higher than elapsed time, that means your query went parallel now, that alone isn’t a problem.Even if it didn’t, though say you’ve got 10 seconds of elapsed time, and 9.5 seconds of that time is spent tearing up CPU. That’s bad.

When CPU time is unacceptably high, and you’re reading a relatively low number of pages (like in our case), you might havethis problem. To figure it out, go through the functions and string comparisons in your query, strip them out, and see if the query suddenly runs fast.

So how do we fix it?

Option 1: live with it, and spend money on licensing.As your workload and your volume scales, performance will get worse, linearly. When you double the number of records you have to scan, CPU use will exactly double. The good news here is that as long as your queries don’t have parallelism inhibitors , then they’ll go parallel across multiple cores, and you can divide and conquer. It’s gonna be expensive:

SQL 2014 & prior: Standard Edition maxes out at 16 cores SQL 2016: Standard Edition can do up to 24 cores Beyond 16/24, you’re looking at SQL Server Enterprise Edition

Option B: fix the code.Stop letting users do leading wildcard searches.

But Brent, we have to let them do leading wildcards.

Sometimes the table design requires leading wildcards. For example, at StackOverflow.com, questions can be tagged with up to 5 tags to describe the question’s topic. Check it out in the database:


Searching Strings in SQL Server is Expensive

Tags concatenated into a single field

Yes, that’s how tags were initially stored in the StackOverflow database design. This meant that if you needed to search for questions tagged sql-server, you had to write:

SELECT * FROM dbo.Posts
WHERE Tags LIKE '%<sql-server>%'

That didn’t scale especially for a crew whose motto is performance is a feature so eventually Stack ended upmoving on to…

Option III: create a child table.If you’re storing lists in a single field, break them apart into a separate child table. In Stack’s example, this might be a PostsTags table where:

Id INT IDENTITY PostId INT, r

Viewing all articles
Browse latest Browse all 3160

Trending Articles