By: Nai Biao Zhou || Related Tips:More >SQL Server 2017
Problem We live in a world of data. Data are the facts and figures that are collected to make a decision [1]. Companies who use Microsoft technologies usually store their data in SQL Server databases. To extract business values from these data, we usually apply statistical techniques. Statistics involves collecting, classifying, summarizing, organizing, analyzing, and interpreting data [2]. R, which has become the worldwide language for statistics [3], can bridge the gap between statistics and business intelligence development. Furthermore, The Microsoft platform enable us to work with SQL Server databases and R together [4]. With powerful statistical tools, how can we get started to use statistical methods to extract meaningful information from voluminous amount of data? Solution We are going to use data from the AdventureWorks sample database "AdventureWorks2017.bak" [5]. We should always start with asking research questions when we analyze data. Here are research questions needed to be addressed through this study: Did an employee have a different sales performance in 2013 from 2012? Was an employee sales performance impacted by the seasonal factors? Were postal codes of customer addresses in the database valid? We will use " R Tools for Visual Studio sample projects " [6] as a starting point. While investigating employee sales performance, we will go through procedures to create and publish a stored procedure by using R Tools for Visual Studio (RTVS) and use line graphs to interpret data. Then, we will compute mean and median of the all employee sales to measure the "central tendency", which are indicators of typical middle value of all employees’ performance [1]. In the end, we will use regular expression to test Canadian postal codes in the database.The solution was tested with SQL Server Management Studio V17.4 and Microsoft Visual Studio Community 2017 on windows 10 Home 10.0 <X64>. The DBMS is Microsoft SQL Server 2017 Enterprise Edition (64-bit).
A First Look at RTVS Line graphs are commonly used to present changes in data over a period [1]. We are going to look at employee monthly sales and reveal the employee performance changes through a line graph. In the meanwhile, we will create a stored procedure by using RTVS. 1 - Work on the first R project 1.1 Download the sample project and Open the solution file "Examples.sln".For using RTVS effectively, we use "Data Science Setting" as the Visual Studio setting. Before switching on this setting, I recommend that we save the current window layout. We can revert to the previous window layout after we have done data analysis. Figure 1 shows the menu item that we can use to save the window layout.

Figure 1 - Save current window layout
To change the setting for using RTVS, click on the menu item shown in the Figure 2.

Figure 2 - Switch to the data science settings Figure 3 shows the window layout for using RTVS. It is noteworthy that we should verify if a correct version of R is used when we have multiple versions installed. We can find the version number in the "Workspaces" panel or in the bottom right corner of the IDE. For the best experience, we should follow the instructions in [6] to run the R codes of the "1-Getting_Started_with_R.R" file line-by-line.

Figure 3 - IDE window layout for data analysis 1.2 Add a database connection to the project
Click on the menu item "Add Database Connection…", as shown in Figure 4.

Figure 4 - Add database connection to the project
Configure the connection properties, as shown in the Figure 5.

Figure 5 - Configure the connection properties
Click on the "OK" button. An R script file "Settings.R" is automatically added to the project, as shown in Figure 6. To access the connection string, we should run the codes immediately, consequently save the connection string in the "setting" variable.

Figure 6 - Application settings file 2 - Create a new stored procedure 2.1 Add a new stored procedure
Right-click on the "A first look at R" folder and select "Add > New Item" from the context menu. A pop-up window shows up as illustrated in Figure 7. Select "SQL Stored Procedure with R" template and name the procedure as "sp_employee_sales_monthly".

Figure 7 - Create a new stored procedure
In the "Solution Explore" panel shown in Figure 8, we find three files have been created. This allows us to work on R scripts and SQL scripts, separately.

Figure 8 - R Tool for Visual Studio creates three files for one stored procedure 2.2 Write a SQL query to retrieve data from the database In this tip, we adopt SQL queries used in the SQL Server Reporting Services Product Samples [7]. Open the file "sp_employee_sales_monthly.Query.sql" and replace the content with the following SQL query: -- Place SQL query retrieving data for the R stored procedure here
-- Employee sales
DECLARE @EmployeeID int
SET @EmployeeID = 283
SELECT P.FirstName + SPACE(1) + P.LastName AS Employee,
DATEPART(Year, SOH.OrderDate) AS [Year],
DATEPART(Month, SOH.OrderDate) AS MonthNumber,
DATENAME(Month, SOH.OrderDate) AS [Month],
SUM(DET.LineTotal) AS Sales
FROM [Sales].[SalesPerson] SP
INNER JOIN [Sales].[SalesOrderHeader] SOH ON SP.[BusinessEntityID] = SOH.[SalesPersonID]
INNER JOIN Sales.SalesOrderDetail DET ON SOH.SalesOrderID = DET.SalesOrderID
INNER JOIN [Person].[Person] P ON P.[BusinessEntityID] = SP.[BusinessEntityID]
WHERE SOH.SalesPersonID = @EmployeeID
and DATEPART(Year, SOH.OrderDate) in (2012, 2013)
GROUP BY P.FirstName + SPACE(1) + P.LastName,
SOH.SalesPersonID,
DATEPART(Year, SOH.OrderDate), DATEPART(Month, SOH.OrderDate),
DATENAME(Month, SOH.OrderDate)
Click on the arrow button, as shown in Figure 9, to run the query. If we run the query first time, a pop-up window may show up and ask us to establish a database connection.

Figure 9 - Run the SQL query in Visual Studio 2.3 Write R script to plot a multiple line graph for an employee
Open the file "sp_employee_sales_monthly.R". The RTVS included some testing codes in the file, as shown in the Figure 10. These testing codes provide us a method to load data from a SQL Server database.

Figure 10 - Initial R codes in the new file Unc