SQL Server v.Next : STRING_AGG() performance

While SQL Server on linux has stolen almost all of the headlines about v.Next, there are some other interesting advancements coming in the next version of our favorite database platform. On the T-SQL front, we finally have a built-in way to perform grouped string concatenation: STRING_AGG() .

Let's say we have the following simple table structure:

CREATE TABLE dbo.Objects ( [object_id] int, [object_name] nvarchar(261), CONSTRAINT PK_Objects PRIMARY KEY([object_id]) ); CREATE TABLE dbo.Columns ( [object_id] int NOT NULL FOREIGN KEY REFERENCES dbo.Objects([object_id]), column_name sysname, CONSTRAINT PK_Columns PRIMARY KEY ([object_id],column_name) );

For performance tests, we're going to populate this using sys.all_objects and sys.all_columns . But for a simple demonstration first, let's add the following rows:

INSERT dbo.Objects([object_id],[object_name]) VALUES(1,N'Employees'),(2,N'Orders'); INSERT dbo.Columns([object_id],column_name) VALUES(1,N'EmployeeID'),(1,N'CurrentStatus'), (2,N'OrderID'),(2,N'OrderDate'),(2,N'CustomerID');

If the forums are any indication, it is a very common requirement to return a row for each object, along with a comma-separated list of column names. (Extrapolate that to whatever entity types you model this way product names associated with an order, part names involved in the assembly of a product, subordinates reporting to a manager, etc.) So, for example, with the above data we'd want output like this:

object columns
--------- ----------------------------
Employees EmployeeID,CurrentStatus
Orders OrderID,OrderDate,CustomerID

The way we would accomplish this in current versions of SQL Server is probably to use FOR XML PATH , as I demonstrated to be the most efficient outside of CLR inthis earlier post. In this example, it would look like this:

SELECT [object] = o.[object_name], [columns] = STUFF( (SELECT N',' + c.column_name FROM dbo.Columns AS c WHERE c.[object_id] = o.[object_id] FOR XML PATH, TYPE ).value(N'.[1]',N'nvarchar(max)'),1,1,N'') FROM dbo.Objects AS o;

Predictably, we get the same output demonstrated above. In SQL Server v.Next, we will be able to express this more simply:

SELECT [object] = o.[object_name], [columns] = STRING_AGG(c.column_name, N',') FROM dbo.Objects AS o INNER JOIN dbo.Columns AS c ON o.[object_id] = c.[object_id] GROUP BY o.[object_name];

Again, this produces the exact same output. And we were able to do this with a native function, avoiding both the expensive FOR XML PATH scaffolding, and the STUFF() function used to remove the first comma (this happens automatically).

What About Order?

One of the problems with many of the kludge solutions to grouped concatenation is that the ordering of the comma-separated list should be considered arbitrary and non-deterministic.

For the XML PATH solution, I demonstrated inanother earlier post that adding an ORDER BY is trivial and guaranteed. So in this example, we could order the column list by column name alphabetically instead of leaving it to SQL Server to sort (or not):

SELECT [object] = [object_name], [columns] = STUFF( (SELECT N',' +c.column_name FROM dbo.Columns AS c WHERE c.[object_id] = o.[object_id] ORDER BY c.column_name -- only change FOR XML PATH, TYPE ).value(N'.[1]',N'nvarchar(max)'),1,1,N'') FROM dbo.Objects AS o;

Output:

object columns
--------- ----------------------------
Employees CurrentStatus,EmployeeID
Order CustomerID,OrderDate,OrderID

CTP 1.1 adds WITHIN GROUP to STRING_AGG() , so using the new approach, we can say:

SELECT [object] = o.[object_name], [columns] = STRING_AGG(c.column_name, N',') WITHIN GROUP (ORDER BY c.column_name) -- only change FROM dbo.Objects AS o INNER JOIN dbo.Columns AS c ON o.[object_id] = c.[object_id] GROUP BY o.[object_name];

Now we get the same results. Note that, just like a normal ORDER BY clause, you can add multiple ordering columns or expressions inside WITHIN GROUP () .

All Right, Performance Already!

Using quad-core 2.6 GHz processors, 8 GB of memory, and SQL Server CTP1.1 (14.0.100.187), I created a new database, re-created these tables, and added rows from sys.all_objects and sys.all_columns . I made sure to only include objects that had at least one column:

INSERT dbo.Objects([object_id], [object_name]) -- 656 rows SELECT [object_id], QUOTENAME(s.name) + N'.' + QUOTENAME(o.name) FROM sys.all_objects AS o INNER JOIN sys.schemas AS s ON o.[schema_id] = s.[schema_id] WHERE EXISTS ( SELECT 1 FROM sys.all_columns WHERE [object_id] = o.[object_id] ); INSERT dbo.Columns([object_id], column_name) -- 8,085 rows SELECT [object_id], name FROM sys.all_columns AS c WHERE EXISTS ( SELECT 1 FROM dbo.Objects WHERE [object_id] = c.[object_id] );

On my system, this yielded 656 objects and 8,085 columns (your system may yield slightly different numbers).

The Plans

First, let's compare the plans and Table I/O tabs for our two unordered queries, using Plan Explorer . Here are the overall runtime metrics:

SQL Server v.Next : STRING_AGG() performance

Runtime metrics for XML PATH (top) and STRING_AGG() (bottom)

The graphical plan and Table I/O from the FOR XML PATH query:

Plan and Table I/O for XML PATH, no order

And from the STRING_AGG version:

Plan and Table I/O for STRING_AGG, no ordering

For the latter, the clustered index seek seems a little troubling to me. This seemed like a good case for testing out the seldom-used FORCESCAN hint (and no, this would certainly not help out the FOR XML PATH query):

SELECT [object] = o.[object_name], [columns] = STRING_AGG(c.column_name, N',') FROM dbo.Objects AS o INNER JOIN dbo.Columns AS c WITH (FORCESCAN) -- added hint ON o.[object_id] = c.[object_id] GROUP BY o.[object_name];

Now the plan and Table I/O tab look a lot better, at least on first glance:

Plan and Table I/O for STRING_AGG(), no ordering, with FORCESCAN

The ordered versions of the queries generate roughly the same plans. For the FOR XML PATH version, a sort is added:

Added sort in FOR XML PATH version

For STRING_AGG() , a scan is chosen in this case, even without the FORCESCAN hint, and no additional sort operation is required so the plan looks identical to the FORCESCAN version.

At Scale

Looking at a plan and one-off runtime metrics might give us some idea about whether STRING_AGG() performs better than the existing FOR XML PATH solution, but a larger test might make more sense. What happens when we perform the grouped concatenation 5,000 times?

SELECT SYSDATETIME(); GO DECLARE @x nvarchar(max); SELECT @x = STRING_AGG(c.column_name, N',') FROM dbo.Objects AS o INNER JOIN dbo.Columns AS c ON o.[object_id] = c.[object_id] GROUP BY o.[object_name]; GO 5000 SELECT [string_agg, unordered] = SYSDATETIME(); GO DECLARE @x nvarchar(max); SELECT @x = STRING_AGG(c.column_name, N',') FROM dbo.Objects AS o INNER JOIN dbo.Columns AS c WITH (FORCESCAN) ON o.[object_id] = c.[object_id] GROUP BY o.[object_name]; GO 5000 SELECT [string_agg, unordered, forcescan] = SYSDATETIME(); GO DECLARE @x nvarchar(max); SELECT @x = STUFF((SELECT N',' +c.column_name FROM dbo.Columns AS c WHERE c.[object_id] = o.[object_id] FOR XML PATH, TYPE).value(N'.[1]',N'nvarchar(max)'),1,1,N'') FROM dbo.Objects AS o; GO 5000 SELECT [for xml path, unordered] = SYSDATETIME(); GO DECLARE @x nvarchar(max); SELECT @x = STRING_AGG(c.column_name, N',') WITHIN GROUP (ORDER BY c.column_name) FROM dbo.Objects AS o INNER JOIN dbo.Columns AS c ON o.[object_id] = c.[object_id] GROUP BY o.[object_name]; GO 5000 SELECT [string_agg, ordered] = SYSDATETIME(); GO DECLARE @x nvarchar(max); SELECT @x = STUFF((SELECT N',' +c.column_name FROM dbo.Columns AS c WHERE c.[object_id] = o.[object_id] ORDER BY c.column_name FOR XML PATH, TYPE).value(N'.[1]',N'nvarchar(max)'),1,1,N'') FROM dbo.Objects AS o ORDER BY o.[object_name]; GO 5000 SELECT [for xml path, ordered] = SYSDATETIME();

After running this script five times, I averaged the duration numbers and here are the results:

Duration (milliseconds) for various grouped concatenation approaches

We can see that our FORCESCAN hint really did make things worse while we shifted the cost away from the clustered index seek, the sort was actually much worse, even though the estimated costs deemed them relatively equivalent. More importantly, we can see that STRING_AGG() does offer a performance benefit, whether or not the concatenated strings need to be ordered in a specific way. As with STRING_SPLIT() , which I looked at back in March , I am quite impressed that this function scales well prior to "v1."

I have further tests planned, perhaps for a future post:

When all the data comes from a single table, with and without an index that supports ordering Similar performance tests on Linux

In the meantime, if you have specific use cases for grouped concatenation, please share them below (or e-mail me at abertrand@sentryone.com). I'm always open to making sure my tests are as real-world as possible.

SQL Server v.Next : STRING_AGG() performance

Trending Articles

[银色子弹字幕组][名侦探柯南][第1167集 17年前的真相皇后的谋略][WEBRIP][简繁日多语MKV][1080P]

徵求: STC 4304 CB

中国网民投诉：DeepSeek输出错误讯息

曾新有陶瓷甕走紅大陸市場

二零一七年五月二十八日大陆综合消息

名詞解釋：FATP (Final Assembly Test & Pack)最後整機組裝測試包裝

鱷魚口塞氣球「別驚擾大師」…豬隊友秒摔 4人手刀逃跑：啊啊啊

iOS 原生工程为swift + swiftUI实现，怎么封装UTS插件，官方Demo 也没找到案例

新店瑠公圳老宅拆除案新北：依法辦理文資審議程序

【漫画推荐】色情！血腥！重口！这些毁三观的漫画你看过吗？

[转载]煞貢、直星、人專吉日\金神七煞歌

Ps2024【Adobe Photoshop 2024 v25.12.2.1009 (x64) 多语 GenP 坡姐】..

SFC超級任天堂釣魚太郎1.2.3 (海釣太郎) 遊戲+金手指+模擬器！

Resource Hacker 3.6.0.92 免安裝中文版 (4.6.32 英文版) - 執行檔編輯工具

RAV4 E-Mirror電子式後視鏡無法連線

[閒聊] 資深玩家週年禮

Devart UniDAC v10.3.0 SOURCES Delphi / Lazarus [含附件]

免费翻墙节点大全

[游戏工具]FishingMan钓鱼小精灵下载(魔兽世界钓鱼辅助) v8.2.2.6 绿色版

神岡國小三好門