Quantcast
Channel: CodeSection,代码区,SQL Server(mssql)数据库 技术分享 - CodeSec
Viewing all articles
Browse latest Browse all 3160

Getting Started with Data Analysis on the Microsoft Platform - Examining Data

$
0
0

By: Nai Biao Zhou || Related Tips:More >SQL Server 2017

Problem We live in a world of data. Data are the facts and figures that are collected to make a decision [1]. Companies who use Microsoft technologies usually store their data in SQL Server databases. To extract business values from these data, we usually apply statistical techniques. Statistics involves collecting, classifying, summarizing, organizing, analyzing, and interpreting data [2]. R, which has become the worldwide language for statistics [3], can bridge the gap between statistics and business intelligence development. Furthermore, The Microsoft platform enable us to work with SQL Server databases and R together [4]. With powerful statistical tools, how can we get started to use statistical methods to extract meaningful information from voluminous amount of data? Solution We are going to use data from the AdventureWorks sample database "AdventureWorks2017.bak" [5]. We should always start with asking research questions when we analyze data. Here are research questions needed to be addressed through this study: Did an employee have a different sales performance in 2013 from 2012? Was an employee sales performance impacted by the seasonal factors? Were postal codes of customer addresses in the database valid? We will use " R Tools for Visual Studio sample projects " [6] as a starting point. While investigating employee sales performance, we will go through procedures to create and publish a stored procedure by using R Tools for Visual Studio (RTVS) and use line graphs to interpret data. Then, we will compute mean and median of the all employee sales to measure the "central tendency", which are indicators of typical middle value of all employees’ performance [1]. In the end, we will use regular expression to test Canadian postal codes in the database.

The solution was tested with SQL Server Management Studio V17.4 and Microsoft Visual Studio Community 2017 on windows 10 Home 10.0 <X64>. The DBMS is Microsoft SQL Server 2017 Enterprise Edition (64-bit).

A First Look at RTVS Line graphs are commonly used to present changes in data over a period [1]. We are going to look at employee monthly sales and reveal the employee performance changes through a line graph. In the meanwhile, we will create a stored procedure by using RTVS. 1 - Work on the first R project 1.1 Download the sample project and Open the solution file "Examples.sln".

For using RTVS effectively, we use "Data Science Setting" as the Visual Studio setting. Before switching on this setting, I recommend that we save the current window layout. We can revert to the previous window layout after we have done data analysis. Figure 1 shows the menu item that we can use to save the window layout.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 1 - Save current window layout

To change the setting for using RTVS, click on the menu item shown in the Figure 2.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 2 - Switch to the data science settings Figure 3 shows the window layout for using RTVS. It is noteworthy that we should verify if a correct version of R is used when we have multiple versions installed. We can find the version number in the "Workspaces" panel or in the bottom right corner of the IDE. For the best experience, we should follow the instructions in [6] to run the R codes of the "1-Getting_Started_with_R.R" file line-by-line.
Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 3 - IDE window layout for data analysis 1.2 Add a database connection to the project

Click on the menu item "Add Database Connection…", as shown in Figure 4.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 4 - Add database connection to the project

Configure the connection properties, as shown in the Figure 5.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 5 - Configure the connection properties

Click on the "OK" button. An R script file "Settings.R" is automatically added to the project, as shown in Figure 6. To access the connection string, we should run the codes immediately, consequently save the connection string in the "setting" variable.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 6 - Application settings file 2 - Create a new stored procedure 2.1 Add a new stored procedure

Right-click on the "A first look at R" folder and select "Add > New Item" from the context menu. A pop-up window shows up as illustrated in Figure 7. Select "SQL Stored Procedure with R" template and name the procedure as "sp_employee_sales_monthly".


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 7 - Create a new stored procedure

In the "Solution Explore" panel shown in Figure 8, we find three files have been created. This allows us to work on R scripts and SQL scripts, separately.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 8 - R Tool for Visual Studio creates three files for one stored procedure 2.2 Write a SQL query to retrieve data from the database In this tip, we adopt SQL queries used in the SQL Server Reporting Services Product Samples [7]. Open the file "sp_employee_sales_monthly.Query.sql" and replace the content with the following SQL query: -- Place SQL query retrieving data for the R stored procedure here
-- Employee sales
DECLARE @EmployeeID int
SET @EmployeeID = 283
SELECT P.FirstName + SPACE(1) + P.LastName AS Employee,
DATEPART(Year, SOH.OrderDate) AS [Year],
DATEPART(Month, SOH.OrderDate) AS MonthNumber,
DATENAME(Month, SOH.OrderDate) AS [Month],
SUM(DET.LineTotal) AS Sales
FROM [Sales].[SalesPerson] SP
INNER JOIN [Sales].[SalesOrderHeader] SOH ON SP.[BusinessEntityID] = SOH.[SalesPersonID]
INNER JOIN Sales.SalesOrderDetail DET ON SOH.SalesOrderID = DET.SalesOrderID
INNER JOIN [Person].[Person] P ON P.[BusinessEntityID] = SP.[BusinessEntityID]
WHERE SOH.SalesPersonID = @EmployeeID
and DATEPART(Year, SOH.OrderDate) in (2012, 2013)
GROUP BY P.FirstName + SPACE(1) + P.LastName,
SOH.SalesPersonID,
DATEPART(Year, SOH.OrderDate), DATEPART(Month, SOH.OrderDate),
DATENAME(Month, SOH.OrderDate)

Click on the arrow button, as shown in Figure 9, to run the query. If we run the query first time, a pop-up window may show up and ask us to establish a database connection.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 9 - Run the SQL query in Visual Studio 2.3 Write R script to plot a multiple line graph for an employee

Open the file "sp_employee_sales_monthly.R". The RTVS included some testing codes in the file, as shown in the Figure 10. These testing codes provide us a method to load data from a SQL Server database.


Getting Started with Data Analysis on the Microsoft Platform - Examining Data
Figure 10 - Initial R codes in the new file Unc

Viewing all articles
Browse latest Browse all 3160

Trending Articles