Quantcast
Channel: CodeSection,代码区,SQL Server(mssql)数据库 技术分享 - CodeSec
Viewing all articles
Browse latest Browse all 3160

SQL Server v.Next : STRING_AGG Performance, Part 2

$
0
0

Last week, I made a couple of quick performance comparisons, pitting the new STRING_AGG() function against the traditional FOR XML PATH approach I've used for ages. I tested both undefined/arbitrary order as well as explicit order, and STRING_AGG() came out on top in both cases:

SQL Server v.Next : STRING_AGG() Performance, Part 1

For those tests, I left out several things (not all intentionally):

Mikael Eriksson and Grzegorz yp both pointed out that I was not using the absolute most efficient FOR XML PATH construct (and to be clear, I never have). I did not perform any tests on linux; only on windows. I don't expect those to be vastly different, but since Grzegorz saw very different durations, this is worth further investigation. I also only tested when output would be a finite, non-LOB string which I believe is the most common use case (I don't think people will commonly be concatenating every row in a table into a single comma-separated string, but this is why I asked in my previous post for your use case(s)). For the ordering tests, I did not create an index that might be helpful (or try anything where all the data came from a single table).

In this post, I'm going to deal with a couple of these items, but not all of them.

FOR XML PATH

I had been using the following:

... FOR XML PATH, TYPE).value(N'.[1]', ...

After this comment from Mikael , I have updated my code to use this slightly different construct instead:

... FOR XML PATH(''), TYPE).value(N'text()[1]', ... Linux vs. Windows

Initially, I had only bothered to run tests on Windows:

Microsoft SQL Server vNext (CTP1.1) - 14.0.100.187 (X64)
Dec 10 2016 02:51:11
Copyright (C) 2016 Microsoft Corporation. All rights reserved.
Developer Edition (64-bit) on Windows Server 2016 Datacenter 6.3 (Build 14393: ) (Hypervisor)

But Grzegorz made a fair point that he (and presumably many others) only had access to the Linux flavor of CTP 1.1. So I added Linux to my test matrix:

Microsoft SQL Server vNext (CTP1.1) - 14.0.100.187 (X64)
Dec 10 2016 02:51:11
Copyright (C) 2016 Microsoft Corporation. All rights reserved.
on Linux (Ubuntu 16.04.1 LTS)

Some interesting but completely tangential observations:

@@VERSION doesn't show edition in this build, but SERVERPROPERTY('Edition') returns the expected Developer Edition (64-bit) . Based on the build times encoded into the binaries, the Windows and Linux versions seem to now be compiled at the same time and from the same source. Or this was one crazy coincidence. Unordered tests

I started by testing the arbitrarily ordered output (where there is no explicitly defined ordering for the concatenated values). Following Grzegorz, I used WideWorldImporters (Standard) , but performed a join between Sales.Orders and Sales.OrderLines . The fictional requirement here is to output a list of all orders, and along with each order, a comma-separated list of each StockItemID .

Since StockItemID is an integer, we can use a defined varchar , which means the string can be 8000 characters before we have to worry about needing MAX. Since an int can be a max length of 11 (really 10, if unsigned), plus a comma, this means an order would have to support about 8,000/12 (666) stock items in the worst case scenario (e.g. all StockItemID values have 11 digits). In our case, the longest ID is 3 digits, so until data gets added, we would actually need 8,000/4 (2,000) unique stock items in any single order to justify MAX. In our case, there are only 227 stock items in total, so MAX isn't necessary, but you should keep an eye on that. If such a large string is possible in your scenario, you'll need to use varchar(max) instead of the default ( STRING_AGG() returns nvarchar(max) , but truncates to 8,000 bytes unless the input is a MAX type).

The initial queries (to show sample output, and to observe durations for single executions):

SET STATISTICS TIME ON; GO SELECT o.OrderID, StockItemIDs = STRING_AGG(ol.StockItemID, ',') FROM Sales.Orders AS o INNER JOIN Sales.OrderLines AS ol ON o.OrderID = ol.OrderID GROUP BY o.OrderID; GO SELECT o.OrderID, StockItemIDs = STUFF((SELECT ',' + CONVERT(varchar(11),ol.StockItemID) FROM Sales.OrderLines AS ol WHERE ol.OrderID = o.OrderID FOR XML PATH(''), TYPE).value(N'text()[1]',N'varchar(8000)'),1,1,'') FROM Sales.Orders AS o GROUP BY o.OrderID; GO SET STATISTICS TIME OFF; /* Sample output: OrderID StockItemIDs ======= ============ 1 67 2 50,10 3 114 4 206,130,50 5 128,121,155 Important SET STATISTICS TIME metrics (SQL Server Execution Times): Windows: STRING_AGG: CPU time = 217 ms, elapsed time = 405 ms. FOR XML PATH: CPU time = 1954 ms, elapsed time = 2097 ms. Linux: STRING_AGG: CPU time = 627 ms, elapsed time = 472 ms. FOR XML PATH: CPU time = 2188 ms, elapsed time = 2223 ms. */

I ignored the parse and compile time data completely, as they were always exactly zero or close enough to be irrelevant. There were minor variances in the execution times for each run, but not much the comments above reflect the typical delta in runtime ( STRING_AGG seemed to take a little advantage of parallelism there, but only on Linux, while FOR XML PATH did not on either platform). Both machines had a single socket, quad-core CPU allocated, 8 GB of memory, out-of-the-box configuration, and no other activity.

Then I wanted to test at scale (simply a single session executing the same query 500 times). I didn't want to return all of the output, as in the above query, 500 times, since that would have overwhelmed SSMS and hopefully doesn't represent real-world query scenarios anyway. So I assigned the output to variables and just measured the overall time for each batch:

SELECT sysdatetime(); GO DECLARE @i int, @x varchar(8000); SELECT @i = o.OrderID, @x = STRING_AGG(ol.StockItemID, ',') FROM Sales.Orders AS o INNER JOIN Sales.OrderLines AS ol ON o.OrderID = ol.OrderID GROUP BY o.OrderID; GO 500 SELECT sysdatetime(); GO DECLARE @i int, @x varchar(8000); SELECT @i = o.OrderID, @x = STUFF((SELECT ',' + CONVERT(varchar(11),ol.StockItemID) FROM Sales.OrderLines AS ol WHERE ol.OrderID = o.OrderID FOR XML PATH(''), TYPE).value(N'text()[1]',N'varchar(8000)'),1,1,'') FROM Sales.Orders AS o GROUP BY o.OrderID; GO 500 SELECT sysdatetime();

I ran those tests three times, and the difference was profound nearly an order of magnitude. Here is the average duration across the three tests:


SQL Server v.Next : STRING_AGG Performance, Part 2
Average duration, in milliseconds, for 500 executions of variable assignment

I tested a variety of other things this way as well, mostly to make sure I was covering the types of tests Grzegorz was running (without the LOB part).

Select

Viewing all articles
Browse latest Browse all 3160

Latest Images

Trending Articles