By: Gauri Mahajan || Related Tips:More >SQL Server 2017
ProblemIn any organization, data is stored in various locations and in several formats. Importing and exporting files and data is a very routine task in SQL Server. Developers use different mechanisms and tools as well as customized code to import and export different formats of data programmatically into and out of SQL Server either temporarily or permanently. TypicallyETL tools are employed for this purpose.
But during the development phase, developers may not have ETL packages in place for importing and exporting data . Therefore, at times, they resort to manual methods or develop custom code for this. Maintaining different types of codebase to import/export data from various file formats requires more resources for developing, maintaining and operating the codebase. A better solution would be to have a common data gateway in the form of a library that can facilitate import/export of data into SQL Server, which would hide the underlying implementation complexity of importing/exporting data for each type of data source.
SolutionThe rio package in R-services integrated with SQL Server 2017 provides an ability to address the problem. We will demonstrate how simple it is to work with various data formats with R services in T-SQL in this tip.
R comprises of a powerful tool of open source libraries and 'rio' is one amongst them. rio simplifies the process of importing and exporting data into and out of R in SQL Server. It allows almost all common data formats to be read or exported with the same function. Besides making import-export easy, it also uses R as a simple data parsing and conversion utility.
Before we begin, ensure system has the required software components likeSQL Server 2017,SSMS, SQL Server R services (in-database), etc. Subsequently, to ensure rio is fully functional, we need to install this package. If you are new to R and installing R packages on SQL Server, consider following this link on how to install packages on R server .
Once the package is installed, open a new query window on SSMS and execute the code mentioned below. Let's go over the steps of the script.
EXECUTE sp_execute_external_script@language = N'R',
@script = N'
library(rio)
print (packageVersion("rio"))
'
The sp_execute_external_script is the store procedure that helps to execute external scripts (R in this case) inside the database engine.
@language indicates the script language, which is R. @script specifies the R script that will be executed and its data type is nvarchar(max).The library function references the rio package and function “packageVersion” returns the version of the rio package and is printed using the print function.
Successful execution of the code validates that the package has been installed correctly and is accessible to SQL Server.

Since the required environment has been set, lets go ahead and see how it works.
Import the Data into SQL ServerOne of the popular file formats is the flat file format like CSV, TSV, delimited, etc. Let’s get started with the CSV file format. You can download a sample CSV file fromhere or feel free to use any CSV file that you have created or you already have in your system.
Below is the CSV file that we will be using in this tip. It is named CSVFile.csv and is saved in directory C:\temp\CSVFile.csv.

Once the CSV file is ready, execute the below code in SSMS.
EXECUTE sp_execute_external_script@language = N’R’,
@script = N’
Library(rio)
csvfile <- import(“C:\\temp\\CSVFile.csv”)
print(csvfile)
In this code, apparently apart from referencing the rio package, we are using the 'import' function to read the CSV file followed by printing it in the output section using the print function.
Note: We need to use '\\'instead of '\'while providing the location of the CSV file. This is because '\'alone is considered an escape character.

Export Data from SQL Server
Now let’s try to output the same CSV data that we read in above into a better format than just printing it on the output console. In the below code, we are defining the output schema by using the 'WITH RESULT SETS' option and emitting output from the R-Script to T-SQL code by assigning the imported data to OutputDataSet data frame.
EXECUTE sp_execute_external_script@language = N'R',
@script = N'
library(rio)
CSVFile <- import("C:\\temp\\CSVFile.csv")
OutputDataSet <- CSVFile
'
WITH RESULT SETS((SepalLength numeric, SepalWidth numeric, PetalLength numeric,
PetalWidth numeric, Species nvarchar(10)))

Import an Excel File
Let’s switch to one another popular file format - Excel. We can read an Excel file same way as a CSV file. Execute the below code in order to import the Excel file and data can be seen in the result window. The Excel file is named ExcelFile.xlsx and is saved in directory C:\temp\ExcelFile.csv in the system. You can use any sample Excel data or can download the file fromhere.
EXECUTE sp_execute_external_script@language = N'R',
@script = N'
library(rio)
excelfile <- import("C:\\temp\\ExcelFile.xlsx")
print(excelfile)
'

Import JSON File
Now let’s try to work with one of the leading formats for storing and exchanging data typically over web, which is JSON. You can refer to any sample JSON file or can download one fromhere. Below is the JSON data file that we will be referring to in this tip.

Once the below code is executed, it will import the JSON data and structure and the JSON data can be queried using JSON$columns.
EXECUTE sp_execute_external_script@language = N'R',
@script = N'
library(rio)
JSONFile <- import("C:\\temp\\JSONFile.JSON")
print(JSONFile$columns)
'

Import Data from a GZip File
The rio package not only helps to read uncompressed/unzipped files, but it also allows direct import from compressed (.gzip) files. Zip files makes data storage and transfer way more efficient and faster by keeping related files together.
You can take any sample CSV file and zip it in the format of <filename>.csv.gz or can download it fromhere. Execute below code to import zip file in R.
EXECUTE sp_execute_external_script@language = N'R',
@script = N'
library(rio)
gzipFile <- import("C:\\temp\\CSVFile.csv.gz")
print(gzipFile)
'

These were a few examples of h