Quantcast
Channel: CodeSection,代码区,SQL Server(mssql)数据库 技术分享 - CodeSec
Viewing all articles
Browse latest Browse all 3160

Using Power BI and SSRS for visualizing SQL Server and R data (Part 4)

$
0
0

Visualizing data is important part to understanding the dataset that we are trying to analyses and a different, pictoral view of the data. Instead of a tabular/matrix view or numerical views, a presentation in the form of pictures, diagrams, and graphs can help broaden data insight. Certainly this brings the understanding of data to a new level.

So far, in the previous article we have discussed how to analyze sales data. And from time to time, visualization is much needed. In this article, we will discuss two ways visualizing the data. Namely:

With Power BI With reporting services (SSRS) With R Tools for Visual Studio / R Studio Power BI

For this matter, we will use again the WideWorldImportersDW demo database and we will use the visualization for clustering.Let us take the following query:

DECLARE @SQLStat NVARCHAR(4000)
SET @SQLStat = 'SELECT
SUM(fs.[Profit]) AS Profit
,c.[Sales Territory] AS SalesTerritory
,CASE
WHEN c.[Sales Territory] = ''Rocky Mountain'' THEN 1
WHEN c.[Sales Territory] = ''Mideast'' THEN 2
WHEN c.[Sales Territory] = ''New England'' THEN 3
WHEN c.[Sales Territory] = ''Plains'' THEN 4
WHEN c.[Sales Territory] = ''Southeast'' THEN 5
WHEN c.[Sales Territory] = ''Great Lakes'' THEN 6
WHEN c.[Sales Territory] = ''Southwest'' THEN 7
WHEN c.[Sales Territory] = ''Far West'' THEN 8
END AS SalesTerritoryID
,fs.[Customer Key] AS CustomerKey
,SUM(fs.[Quantity]) AS Quantity
FROM [Fact].[Sale] AS fs
JOIN dimension.city AS c
ON c.[City Key] = fs.[City Key]
WHERE
fs.[customer key] <> 0
AND c.[Sales Territory] NOT IN (''External'')
GROUP BY
c.[Sales Territory]
,fs.[Customer Key]
,CASE
WHEN c.[Sales Territory] = ''Rocky Mountain'' THEN 1
WHEN c.[Sales Territory] = ''Mideast'' THEN 2
WHEN c.[Sales Territory] = ''New England'' THEN 3
WHEN c.[Sales Territory] = ''Plains'' THEN 4
WHEN c.[Sales Territory] = ''Southeast'' THEN 5
WHEN c.[Sales Territory] = ''Great Lakes'' THEN 6
WHEN c.[Sales Territory] = ''Southwest'' THEN 7
WHEN c.[Sales Territory] = ''Far West'' THEN 8
END ;'
DECLARE @RStat NVARCHAR(4000)
SET @RStat = 'library(ggplot2)
image_file <- tempfile()
jpeg(filename = image_file, width = 400, height = 400)
clusters <- hclust(dist(Sales[,c(1,3,5)]), method = ''average'')
clusterCut <- cutree(clusters, 3)
ggplot(Sales, aes(Total, Quantity, color = Sales$SalesTerritory)) +
geom_point(alpha = 0.4, size = 2.5) + geom_point(col = clusterCut) +
scale_color_manual(values = c(''black'', ''red'', ''green'',''yellow'',''blue'',''lightblue'',''magenta'',''brown''))
dev.off()
OutputDataSet <- data.frame(data=readBin(file(image_file, "rb"), what=raw(), n=1e6))'
EXECUTE sp_execute_external_script
@language = N'R'
,@script = @RStat
,@input_data_1 = @SQLStat
,@input_data_1_name = N'Sales'
WITH RESULT SETS ((plot varbinary(max)))

This will be directly imported into Power BI in a slightly different way. On one hand the data will be imported using only a T-SQL query.

SELECT
SUM(fs.[Profit]) AS Profit
,c.[Sales Territory] AS SalesTerritory
,CASE
WHEN c.[Sales Territory] = 'Rocky Mountain' THEN 1
WHEN c.[Sales Territory] = 'Mideast' THEN 2
WHEN c.[Sales Territory] = 'New England' THEN 3
WHEN c.[Sales Territory] = 'Plains' THEN 4
WHEN c.[Sales Territory] = 'Southeast' THEN 5
WHEN c.[Sales Territory] = 'Great Lakes' THEN 6
WHEN c.[Sales Territory] = 'Southwest' THEN 7
WHEN c.[Sales Territory] = 'Far West' THEN 8
END AS SalesTerritoryID
,fs.[Customer Key] AS CustomerKey
,SUM(fs.[Quantity]) AS Quantity
FROM [Fact].[Sale] AS fs
JOIN dimension.city AS c
ON c.[City Key] = fs.[City Key]
WHERE
fs.[customer key] <> 0
AND c.[Sales Territory] NOT IN ('External')
GROUP BY
c.[Sales Territory]
,fs.[Customer Key]
,CASE
WHEN c.[Sales Territory] = 'Rocky Mountain' THEN 1
WHEN c.[Sales Territory] = 'Mideast' THEN 2
WHEN c.[Sales Territory] = 'New England' THEN 3
WHEN c.[Sales Territory] = 'Plains' THEN 4
WHEN c.[Sales Territory] = 'Southeast' THEN 5
WHEN c.[Sales Territory] = 'Great Lakes' THEN 6
WHEN c.[Sales Territory] = 'Southwest' THEN 7
WHEN c.[Sales Territory] = 'Far West' THEN 8
END

And later separately the R code:

clusters <- hclust(dist(Sales[,c(1,3,5)]), method = ''average'')
clusterCut <- cutree(clusters, 3)
ggplot(Sales, aes(Total, Quantity, color = Sales$SalesTerritory)) +
geom_point(alpha = 0.4, size = 2.5) + geom_point(col = clusterCut) +
scale_color_manual(values = c(''black'', ''red'', ''green'',''yellow'',''blue'',''lightblue'',''magenta'',''brown''))

After opening Power BI, we select Get data -> SQL Server and insert all needed information, as shown in the print screen below:


Using Power BI and SSRS for visualizing SQL Server and R data (Part 4)

After clicking Ok, data will be imported into Power BI. Next step is to select “New visual” and on the visualizations list select the R script visual.


Using Power BI and SSRS for visualizing SQL Server and R data (Part 4)

You might get a dialog window asking for enabling R visualization. After that, select the variables needed for the graph. Based on the R code, we are using columns Profit, Quantity and SalesTerritoryID. All three columns will appear in a predefined dataset as a data.frame that the R visualization is creating by default:


Using Power BI and SSRS for visualizing SQL Server and R data (Part 4)

In the R-script code we can paste the R code from the example above.Starting with R code, we need some minor modifications change the name of dataset and rename the columns. So from this:

library(ggplot2)
clusters <- hclust(dist(Sales[,c(1,3,5)]), method = ''average'')
clusterCut <- cutree(clusters, 3)
ggplot(Sales, aes(Total, Quantity, color = Sales$SalesTerritory)) +
geom_point(alpha = 0.4, size = 2.5) + geom_point(col = clusterCut) +
scale_color_manual(values = c(''black'', ''red'', ''green'',''yellow'',''blue'',''lightblue'',''magenta'',''brown''))

into this:

library(ggplot2)
clusters <- hclust(dist(dataset[,c(1,2,3)]), method = 'average')
clusterCut <- cutree(clusters, 3)
ggplot(dataset, aes(Profit, Quantity, color = dataset$SalesTerritory)) +
geom_point(alpha = 0.4, size = 2.5) + geom_point(col = clusterCut) +
scale_color_manual(values = c('black', 'red', 'green','yellow','blue','lightblue','magenta','brown'))

Also, make sure you check the quotes or double quotes around the declared values in R code.After that, you will get a visualization in Power BI:


Viewing all articles
Browse latest Browse all 3160

Trending Articles