Quantcast
Channel: CodeSection,代码区,SQL Server(mssql)数据库 技术分享 - CodeSec
Viewing all articles
Browse latest Browse all 3160

The sniffing database

$
0
0

Your SQL Server instances, like people with hay fever that forget to take their antihistamines during summer, is sniffing all the time. Sniffing is a trick employed by the optimizer in an attempt to give you better execution plans.

The most common form of sniffing is parameter sniffing. Many people know about parameter sniffing, but there are a lot of misconceptions about this subject. I have heard people describe parameter sniffing as a bad thing, and I know people who claim that parameter sniffing is mostly good with some exceptions that they then call “bad parameter sniffing”. Neither of these statements is true; in reality parameter sniffing (and the other forms of sniffing) are sometimes good, sometimes bad, and very often irrelevant. I will explain this in more detail in a later post this post focuses on explaining what parameter sniffing actually is, and on what other forms of sniffing exist.

Yes, other forms of sniffing. Under the right conditions, SQL Server will also sniff variables and cardinalities. Most SQL Server developers and DBAs appear to be blissfully unaware of variable sniffing, and the few speakers and authors that do mention it tend to get it wrong. And cardinality sniffing is, as far as I know, completely undocumented. I have mentioned it a few times in some of my presentations, but never written about it and I have never seen or heard anyone else describe this unique type of sniffing.

Parameter sniffing explained

To understand parameter sniffing, you have to know a bit about how SQL Server compiles queries into execution plans. When a query batch is submitted (either through an ad-hoc query tool such as Management Studio or the sqlcmd utility, or submitted from a client application through e.g. the ADO.Net library or JDBC), SQL Server will first try to avoid the (expensive) compilation process: it checks the plan cache to see if the same plan has been executed before and the plan is available. If that is not the case, then SQL Server will parse the entire batch, compile execution plans for each of the queries in the plan, store them in the plan cache. After that, all of the plans for the batch (either taken from the plan cache, or compiled, stored in the plan cache and then taken from it), are executed, in sequence or out of sequence as dictated by control-of-flow logic in the batch.

While compiling the query, the optimizer uses statistics about the data in the tables to estimate how many rows will satisfy any given condition. A condition such as “WHERE Age = 42” on its own is pretty broad. When you understand the data it operates on, your perception of the condition will change: in a database on men in a mid-life crisis, odds are that a rather high percentage of the rows will match; the same condition in the student database of a community college should generate at most a handful of hits. This reflects in the statistics that the optimizer uses, so the same query condition can result in different plans depending on the data distribution.

When the condition uses a variable (e.g. “WHERE Age = @Age”), then SQL Server cannot use the statistics in the same way. When the query is optimized, the optimizer knows the data type of the variable (because the parser has processed the DECLARE statement), but not the value , because it has not been assigned yet; the assignment occurs when the batch executes, after the optimization process. The optimize will still use some statistics, but not for a specific age; instead it looks at the number of distinct values used and assumes that the data is evenly distributed. So for a kindergarten database, the number of distinct values for Age would probably be three (5, 6, and 7), and the optimizer would assume 33% matching rows for any value of @Age passed in; for the US census database that same Age column would have over 100 distinct values, and the optimizer would estimate that less than 1% of the rows will match the condition for any value of @Age.

A parameter looks for most purposes exactly like a regular variable. The difference is that a parameter is declared in the header of a separately executable code unit: a stored procedure, scalar user-defined function, or multi-statement user-defined function. (And since you should as a rule not use the latter two, I’ll use a stored procedure for my example). The optimizer treats the body of a stored procedure like an ad-hoc batch: when the procedure is invoked the plan cache is first checked, and when no cached plan for the procedure is found it is generated and then stored for future reuse. The key difference is how parameters are treated. To see this in action, run the below script in the AdventureWorks2012 sample database (though it probably also works in other versions of AdventureWorks), with the option to show the actual execution plan enabled. It contains four batches, to create a sample stored procedure, invoke it twice, and then drop the procedure again. The discussion below will focus on the second and third batches, with the two EXEC statements.

CREATE PROC dbo . ParameterSniffingDemo

@ProductID int

AS

SELECT SUM ( OrderQty )

FROM Sales . SalesOrderDetail

WHERE ProductID = @ProductID ;

GO

-- Run the procedure, then check the execution plan.

EXEC dbo . ParameterSniffingDemo @ProductID = 898 ;

GO

-- Run the procedure again, for a different product.

EXEC dbo . ParameterSniffingDemo @ProductID = 897 ;

GO

-- Clean up

DROP PROC dbo . ParameterSniffingDemo ;

GO

The second batch in the query above, like any other batch, is first parsed and compiled. It is important to be aware that only the batch itself is compiled at this time. Once the compilation is done, SQL Server executes the EXEC statement: it sets the parameter value to 870 and then passes control to the stored procedure. Since the procedure was just created, there is no plan in cache yet, so at this time the compiler is once more invoked to generate an execution plan for the procedure. If you read this sequence of events carefully, you will realize that, unlike “normal” variables, the value of parameter @ProductID has been set before the compiler is invoked . The optimizer can read this value to use the specific statistics for ProductID 870 instead of the generic statistics it would use for a normal variable, to create an execution plan that is optimal for

Viewing all articles
Browse latest Browse all 3160

Trending Articles