Quantcast
Viewing all articles
Browse latest Browse all 3160

What’s In SQL Server 2019 CTP 2.0?

APPROX_COUNT_DISTINCT

This new aggregate function is designed for data warehouse scenarios, and is an equivalent for COUNT(DISTINCT()) . Instead of performing expensive distinct sort operations to determine actual counts, it relies instead on statistics to get something relatively accurate. You should find that the margin of error is within 2% of the precise count, 97% of the time, which is usually fine for high-level analytics, values that populate a dashboard, or quick estimates.

On my system I created a table with integer columns ranging from 100 to 1,000,000 unique values, and string columns ranging from 100 to 100,000 unique values. There were no indexes other than a clustered primary key on the leading integer column. Here are the results of COUNT(DISTINCT()) vs. APPROX_COUNT_DISTINCT() against those columns, so you can see where it is off by a bit (but always well within 2%):


Viewing all articles
Browse latest Browse all 3160

Trending Articles