While SQL Server on linux has stolen almost all of the headlines about v.Next, there are some other interesting advancements coming in the next version of our favorite database platform. On the T-SQL front, we finally have a built-in way to perform grouped string concatenation: STRING_AGG() .
Let's say we have the following simple table structure:
CREATE TABLE dbo.Objects ( [object_id] int, [object_name] nvarchar(261), CONSTRAINT PK_Objects PRIMARY KEY([object_id]) ); CREATE TABLE dbo.Columns ( [object_id] int NOT NULL FOREIGN KEY REFERENCES dbo.Objects([object_id]), column_name sysname, CONSTRAINT PK_Columns PRIMARY KEY ([object_id],column_name) );For performance tests, we're going to populate this using sys.all_objects and sys.all_columns . But for a simple demonstration first, let's add the following rows:
INSERT dbo.Objects([object_id],[object_name]) VALUES(1,N'Employees'),(2,N'Orders'); INSERT dbo.Columns([object_id],column_name) VALUES(1,N'EmployeeID'),(1,N'CurrentStatus'), (2,N'OrderID'),(2,N'OrderDate'),(2,N'CustomerID');If the forums are any indication, it is a very common requirement to return a row for each object, along with a comma-separated list of column names. (Extrapolate that to whatever entity types you model this way product names associated with an order, part names involved in the assembly of a product, subordinates reporting to a manager, etc.) So, for example, with the above data we'd want output like this:
object columns--------- ----------------------------
Employees EmployeeID,CurrentStatus
Orders OrderID,OrderDate,CustomerID
The way we would accomplish this in current versions of SQL Server is probably to use FOR XML PATH , as I demonstrated to be the most efficient outside of CLR inthis earlier post. In this example, it would look like this:
SELECT [object] = o.[object_name], [columns] = STUFF( (SELECT N',' + c.column_name FROM dbo.Columns AS c WHERE c.[object_id] = o.[object_id] FOR XML PATH, TYPE ).value(N'.[1]',N'nvarchar(max)'),1,1,N'') FROM dbo.Objects AS o;Predictably, we get the same output demonstrated above. In SQL Server v.Next, we will be able to express this more simply:
SELECT [object] = o.[object_name], [columns] = STRING_AGG(c.column_name, N',') FROM dbo.Objects AS o INNER JOIN dbo.Columns AS c ON o.[object_id] = c.[object_id] GROUP BY o.[object_name];Again, this produces the exact same output. And we were able to do this with a native function, avoiding both the expensive FOR XML PATH scaffolding, and the STUFF() function used to remove the first comma (this happens automatically).
What About Order?One of the problems with many of the kludge solutions to grouped concatenation is that the ordering of the comma-separated list should be considered arbitrary and non-deterministic.
For the XML PATH solution, I demonstrated inanother earlier post that adding an ORDER BY is trivial and guaranteed. So in this example, we could order the column list by column name alphabetically instead of leaving it to SQL Server to sort (or not):
SELECT [object] = [object_name], [columns] = STUFF( (SELECT N',' +c.column_name FROM dbo.Columns AS c WHERE c.[object_id] = o.[object_id] ORDER BY c.column_name -- only change FOR XML PATH, TYPE ).value(N'.[1]',N'nvarchar(max)'),1,1,N'') FROM dbo.Objects AS o;Output:
object columns--------- ----------------------------
Employees CurrentStatus,EmployeeID
Order CustomerID,OrderDate,OrderID
CTP 1.1 adds WITHIN GROUP to STRING_AGG() , so using the new approach, we can say:
SELECT [object] = o.[object_name], [columns] = STRING_AGG(c.column_name, N',') WITHIN GROUP (ORDER BY c.column_name) -- only change FROM dbo.Objects AS o INNER JOIN dbo.Columns AS c ON o.[object_id] = c.[object_id] GROUP BY o.[object_name];Now we get the same results. Note that, just like a normal ORDER BY clause, you can add multiple ordering columns or expressions inside WITHIN GROUP () .
All Right, Performance Already!Using quad-core 2.6 GHz processors, 8 GB of memory, and SQL Server CTP1.1 (14.0.100.187), I created a new database, re-created these tables, and added rows from sys.all_objects and sys.all_columns . I made sure to only include objects that had at least one column:
INSERT dbo.Objects([object_id], [object_name]) -- 656 rows SELECT [object_id], QUOTENAME(s.name) + N'.' + QUOTENAME(o.name) FROM sys.all_objects AS o INNER JOIN sys.schemas AS s ON o.[schema_id] = s.[schema_id] WHERE EXISTS ( SELECT 1 FROM sys.all_columns WHERE [object_id] = o.[object_id] ); INSERT dbo.Columns([object_id], column_name) -- 8,085 rows SELECT [object_id], name FROM sys.all_columns AS c WHERE EXISTS ( SELECT 1 FROM dbo.Objects WHERE [object_id] = c.[object_id] );On my system, this yielded 656 objects and 8,085 columns (your system may yield slightly different numbers).
The PlansFirst, let's compare the plans and Table I/O tabs for our two unordered queries, using Plan Explorer . Here are the overall runtime metrics:

Runtime metrics for XML PATH (top) and STRING_AGG() (bottom)
The graphical plan and Table I/O from the FOR XML PATH query:


Plan and Table I/O for XML PATH, no order
And from the STRING_AGG version:


Plan and Table I/O for STRING_AGG, no ordering
For the latter, the clustered index seek seems a little troubling to me. This seemed like a good case for testing out the seldom-used FORCESCAN hint (and no, this would certainly not help out the FOR XML PATH query):
SELECT [object] = o.[object_name], [columns] = STRING_AGG(c.column_name, N',') FROM dbo.Objects AS o INNER JOIN dbo.Columns AS c WITH (FORCESCAN) -- added hint ON o.[object_id] = c.[object_id] GROUP BY o.[object_name];Now the plan and Table I/O tab look a lot better, at least on first glance:


Plan and Table I/O for STRING_AGG(), no ordering, with FORCESCAN
The ordered versions of the queries generate roughly the same plans. For the FOR XML PATH version, a sort is added:

Added sort in FOR XML PATH version
For STRING_AGG() , a scan is chosen in this case, even without the FORCESCAN hint, and no additional sort operation is required so the plan looks identical to the FORCESCAN version.
At ScaleLooking at a plan and one-off runtime metrics might give us some idea about whether STRING_AGG() performs better than the existing FOR XML PATH solution, but a larger test might make more sense. What happens when we perform the grouped concatenation 5,000 times?
SELECT SYSDATETIME(); GO DECLARE @x nvarchar(max); SELECT @x = STRING_AGG(c.column_name, N',') FROM dbo.Objects AS o INNER JOIN dbo.Columns AS c ON o.[object_id] = c.[object_id] GROUP BY o.[object_name]; GO 5000 SELECT [string_agg, unordered] = SYSDATETIME(); GO DECLARE @x nvarchar(max); SELECT @x = STRING_AGG(c.column_name, N',') FROM dbo.Objects AS o INNER JOIN dbo.Columns AS c WITH (FORCESCAN) ON o.[object_id] = c.[object_id] GROUP BY o.[object_name]; GO 5000 SELECT [string_agg, unordered, forcescan] = SYSDATETIME(); GO DECLARE @x nvarchar(max); SELECT @x = STUFF((SELECT N',' +c.column_name FROM dbo.Columns AS c WHERE c.[object_id] = o.[object_id] FOR XML PATH, TYPE).value(N'.[1]',N'nvarchar(max)'),1,1,N'') FROM dbo.Objects AS o; GO 5000 SELECT [for xml path, unordered] = SYSDATETIME(); GO DECLARE @x nvarchar(max); SELECT @x = STRING_AGG(c.column_name, N',') WITHIN GROUP (ORDER BY c.column_name) FROM dbo.Objects AS o INNER JOIN dbo.Columns AS c ON o.[object_id] = c.[object_id] GROUP BY o.[object_name]; GO 5000 SELECT [string_agg, ordered] = SYSDATETIME(); GO DECLARE @x nvarchar(max); SELECT @x = STUFF((SELECT N',' +c.column_name FROM dbo.Columns AS c WHERE c.[object_id] = o.[object_id] ORDER BY c.column_name FOR XML PATH, TYPE).value(N'.[1]',N'nvarchar(max)'),1,1,N'') FROM dbo.Objects AS o ORDER BY o.[object_name]; GO 5000 SELECT [for xml path, ordered] = SYSDATETIME();After running this script five times, I averaged the duration numbers and here are the results:

Duration (milliseconds) for various grouped concatenation approaches
We can see that our FORCESCAN hint really did make things worse while we shifted the cost away from the clustered index seek, the sort was actually much worse, even though the estimated costs deemed them relatively equivalent. More importantly, we can see that STRING_AGG() does offer a performance benefit, whether or not the concatenated strings need to be ordered in a specific way. As with STRING_SPLIT() , which I looked at back in March , I am quite impressed that this function scales well prior to "v1."
I have further tests planned, perhaps for a future post:
When all the data comes from a single table, with and without an index that supports ordering Similar performance tests on LinuxIn the meantime, if you have specific use cases for grouped concatenation, please share them below (or e-mail me at abertrand@sentryone.com). I'm always open to making sure my tests are as real-world as possible.