Quantcast
Channel: CodeSection,代码区,SQL Server(mssql)数据库 技术分享 - CodeSec
Viewing all articles
Browse latest Browse all 3160

Introduction to Microsoft R Services in SQL Server 2016

$
0
0

R is a statistical programming language, mainly used for statistical computing and analysis, data mining and machine learning, and it is a very strong tool for graphics and data visualizations. It is a free software environment, is cross-platform (UNIX, windows, MacOS), with most of the libraries and package development driven by community efforts.

With R Services available in SQL Server 2016, Microsoft is making the R language available for more flexible data analysis, making sharing data insights much easier, and overcoming memory (RAM) limitations. This mean the integration of R into SQL Server makes analysis on larger datasets, real time OLTP analysis and any kind of big data analysis (document system such as Hadoop, Teradata and others) much easier than ever before.

R Integration

Before SQL Server R Integration, many users, as well as many organizations, have had different challenges in terms of overcoming barriers for successful data analysis. Microsoft have stressed and overcome some of the major problems. With R integrated in SQL Server environment, knowledge of data science can now be shared easier and faster as it moves from data scientists to data engineers and data stewards. This ability to offload this work represents also increase in productivity, as the data scientist can now focus on core analysis, shortening the time to deploy the prediction models, and facilitating real-time analytics.

The existing infrastructure (SQL Server databases, roles, access, security) also help operationalize the results as R in SQL Server is built to work on an enterprise scale and can cope with large volumes of data. The use of R will grow beyond a small database and help bring data in cloud as well as data on-premises closer together. In the end, this R environment is built to move faster and better respond to change (in terms of faster preparing new data models, deploying and utilizing machine learning algorithms).

Product family

Revolution analytics was acquired by Microsoft in April 2015. Along with this acquisition, two versions of the R Engine came together: Revolution R Open (abbr.: RRO) for community and Revolution R Enterprise (abbr.: RRE) for commercial purposes.


Introduction to Microsoft R Services in SQL Server 2016

After the acquisition, Microsoft kept Revolution R Open almost the same, and it became Microsoft R Open. Revolution R Enterprise became SQL Server R Services (this is the in-database version of Microsoft R Server for the Windows operating system on SQL Server) and Microsoft R Server; two products, which were both already available in Revolution R Enterprise. Microsoft R Server (also known as Microsoft R Server Standalone) is primarily for linux (Red Hat or SUSE Linux distribution) on a Hadoop or Teradata system with support for a connection to Azure Cloud.


Introduction to Microsoft R Services in SQL Server 2016

Therefore, the product family of Microsoft R language is available as following products:

Microsoft R Open Microsoft R Client Microsoft R Server SQL Server R Services

The components of Microsoft R Server are presented as:


Introduction to Microsoft R Services in SQL Server 2016
Microsft R Open

R Open is the enhanced R distribution version by Microsoft, and it is 100% open source. This version is fully compatible with any existing R engine, making also R code fully compatible with existing code. This Microsoft R Open distribution has additional high-performance multi-threaded enhancements (both on Windows or on Linux platforms) when using the Math Kernel Library (MKL) for vector/matrix based mathematical operations.

This edition is fully compatible with the CRAN repository. GitHub packages can also be used within R Open. Unfortunately, R Open is limited by memory available, meaning that only data that can fit into the memory of a computer can be processed. Proprietary ScaleR algorithms and functions (RevoScaleR library) will not run under R Open version (but are available in Microsoft Client and Microsoft Server Version).

R Open will run on any SQL Server 2016 edition, except on Express or Express with Tools. Microsoft R Client / Server will run only on Enterprise or Developer editions.

Microsoft R Client

The R Client version is a 100% free version, built on top of Microsoft R Open. A data scientist can use any CRAN/GitHub based library, This edition introduces the powerful RevoScaleR Library to allow heavy parallelization and multi-threaded computing. The RevoScaleR library is a library using ScaleR technology and its proprietary functions for parallel computation.

There are some limitations to R Client. Firstly, memory is limited to local memory, which means that the data must fit into local memory. ScaleR functions can use parallel computation, but processing is limited to two threads only (regardless of computer having more cores and supporting multi-threated operations). All the computation is limited to client capabilities: disk, ram and speed.

A good thing about Microsoft R Client is that a user can push the computational operations to Microsoft R Server or SQL Server R Services and R Server for Hadoop to achieve better performance. The great part about R Client is that allows user to still run high performance analytics without having Microsoft R Server installed locally but still get all the benefits by using the computational powers of Microsoft R Server.

Microsoft R Client is compatible with following flavors of R Server: Microsoft R server for Linux, Microsoft R Server for Teradata DB, Microsoft R Server for Hadoop, Microsoft R HDInsight and both versions of Microsoft R Server Standalone and SQL Server R Services.

Microsoft R Server

Microsoft R Server is the most often used version of R in the Microsoft R product family, especially for enterprise analytics purposes. Like R Open / R Client version, it supports all statistical analysis, data mining and predictive analytics with machine learning, but for big data as well. R Server is also fully compatible with CRAN / Github / Bioconductor library repositories, and the ScaleR algorithms with its functions are capable of parallel and multi-threated data processing and computation with data much larger than the server memory suze. An R-based application will be able to use multiple platforms using ConnectR and be deployable across multiplatform as well (using DeployR functions). Disk scalability is also available with this version.

SQL Server R Services (in-database Microsft R Server)

Microsoft SQL Server R Services is essentially in-database version of Microsoft R Server version and it covers most of ScaleR algorithms for scalable and high-performance environment. Memory and disk will be managed by your instance of SQL Server. To support R execution, an additional SQL Server service will be installed locally, called the SQL Server Trusted Launchpad. Also, the Microsoft R Client is capable of communication between SQL Server and R Server; with a slight difference that DeployR and ConnectR functions will not work, as they were prepared for other purposes. But, as already said, ScaleR algorithms will be available also in this database version.

Installation

Microsoft R Server is always installed separately as a standalone version. So, once you install your SQL Server 2016, you will need to install R Server as well, if you would want to have it installed on your server.


Introduction to Microsoft R Services in SQL Server 2016

SQL Server R Services installation will be prompted as feature selection as R Services (In-Database).


Introduction to Microsoft R Services in SQL Server 2016
Prior to the installation itself (in both cases of in-database or server version) you

Viewing all articles
Browse latest Browse all 3160

Latest Images

Trending Articles