Big Data Clusters In SQL Server 2019

October 28, 2018, 8:26 am

≫ Next: Slides and Scripts from IT/Dev Connections

≪ Previous: How SQL Server DBA’s can use Power BI Report Server

While extract, transform, load (ETL) has its use cases, an alternative to ETL is data virtualization, which integrates data from disparate sources, locations, and formats, without replicating or moving the data, to create a single “virtual” data layer. The virtual data layer allows users to query data from many sources through a single, unified interface. Access to sensitive data sets can be controlled from a single location. The delays inherent to ETL need not apply; data can always be up to date. Storage costs and data governance complexity are minimized. See the pro’s and con’s of data virtualization via Data Virtualization vs Data Warehouse and Data Virtualization vs. Data Movement .

SQL Server 2019 big data clusters with enhancements to PolyBase act as a virtual data layer to integrate structured and unstructured data from across the entire data estate (SQL Server, Azure SQL Database, Azure SQL Data Warehouse, Azure Cosmos DB, mysql, PostgreSQL, MongoDB, Oracle, Teradata, HDFS, Blob Storage, Azure Data Lake Store) using familiar programming frameworks and data analysis tools:

↧

Slides and Scripts from IT/Dev Connections

October 28, 2018, 8:24 am

≫ Next: The SQL Server PHP Code Ignitor shows one or more relationships

≪ Previous: Big Data Clusters In SQL Server 2019

Slides and Scripts from IT/Dev Connections K. Brian Kelley Posted on 23 October 2018 Comments

If you attended either of my presentations for this past IT/Dev Connections, here are the promised slides and scripts.

As I mentioned at the presentation, there’s nothing special about the scripts, so feel free to use them as you’d like.

Top Down SQL Server Security (.zip file 815 KB)

SQL Server Forensics (.zip file 786 KB)

Slides and Scripts from IT/Dev Connections

Databases Infrastructure Security

Brian Kelley is an author, columnist, and Microsoft SQL Server MVP focusing primarily on SQL Server security. He is a contributing author for How to Cheat at Securing SQL Server 2005 (Syngress), Professional SQL Server 2008 Administration (Wrox), and Introduction to SQL Server (Texas Publishing). Brian currently serves as an infrastructure and security architect. He has also served as a senior Microsoft SQL Server DBA, database architect, developer, and incident response team lead.

↧

The SQL Server PHP Code Ignitor shows one or more relationships

October 28, 2018, 8:22 am

≫ Next: Clear column data

≪ Previous: Slides and Scripts from IT/Dev Connections

i want to show data from table_one,where tableone_id is included on table_two..which is one to many relationship..here is the example.

table_one table_two id | name id | name |tableone_id 1 | A 1 | C | 1 2 | B 2 | D | 1 3 | E | 2 4 | F | 2

The result i was expecting on my php is..

Number | Name | Linked Item | 1 | A | C | | | D | 2 | B | E | | | F |

i already tried some code like

**Controller.php** $head = $this->db->query("SELECT * from table_one)->result_array(); foreach($head as $key => $value) { $head[$key]['items'] = $this->db->query("SELECT a.id, b.id, b.name as tabletwo_name, FROM table_one a JOIN table_two b on a.id = b.id where b.id =".$value['id'])->result_array(); }

the code i make ,fails.how did i do the right code for that case ?

Thank you

your db queries should be in the model, not in the controller you are missing double quotes in the first select, your string is not closed why are you selecting all results from table_one and then doing JOINed select? You are not even selecting the data you want to have in your output (like table_two.name).

Change your first query to:

$this->db->query("SELECT table_one.id as number, table_one.name as name, table_two.name as linked_item FROM table_one LEFT JOIN table_two ON table_one.id = table_two.tableone_id")->result_array();

and then just do this:

foreach($head as $key => $value) { $head[$key]['items'] = $value; }

↧

Clear column data

October 28, 2018, 8:20 am

≫ Next: Set Up Your Workstation to Create SQL Server Reporting Services Reports for Dyna ...

≪ Previous: The SQL Server PHP Code Ignitor shows one or more relationships

Cassandra CQL 3.2.1 Clear column data

I need to clear the data from a column in a table using CQL I've tried the following test on a single node and it works fine. But is this going to fly on many nodes and different replication factors? DROP KEYSPACE IF EXISTS testColumnDrop; CREATE KEY

Distribute column data in multiple rows

I have data currently in my table like below under currently section. I need the selected column data which is comma delimited to be converted into the format marked in green (Read and write of a category together) Any ways to do it in SQL Server? Pl

Create a two-column data frame using column names

I have multiple lists similar to those below. a = c(1,2,3) b = c(4,5,6) c = c(7,8,9) I know I can create a 3x3 dataframe using cbind.data.frame. However, I would like to create a 2 column data frame, where rows 1-3 have a in column 1, rows 4-6 have b

How do I get a data frame on pandas where the columns are the following n-elements from another column data frame?

A very simple example just for understanding. I have the following pandas dataframe: import pandas as pd df = pd.DataFrame({'A':pd.Series([1, 2, 13, 14, 25, 26, 37, 38])}) df A 0 1 1 2 2 13 3 14 4 25 5 26 6 37 8 38 Set n = 3 First example How to get R - ddply to act on 2 columns in 3 columns data.frame

I am trying to use ddply to act on 2 columns in 3 column data.frame. I know I've done this before, but for the life of me I cannot get anything to work. Here's the example: func = function(x, y) { if(x>y) { x-y } else { 0 } } df = data.frame(name=c('

SQL Get column data in a single column

I have a table in which Employee Punches are saved. For each date for each employee there are columns in the table as Punch1, Punch2 till Punch10. I want all this Punch Columns data in a Single Column. e.g. If in a row i have dates stored in Punch1,

Browser Clearing Form Data

This question already has an answer here: How to prevent buttons from submitting forms 13 answers I have a form on a website and I am validating it using JS. The issue I have is, after the form has failed validation, the browser clears the data which

Can I delete single column data that uses its line ID using my procedure?

I have one special case in my sql server 2008. I want to delete single column data which is using ID of its row with the help of my procedure. How can i achive this can any one please help me?You say that you want to do If(Id=1001) delete Mycolumn fr

Clear user data or Clear cache on Phonegap android

How can I clear user data or clear cache using PhoneGap and Android? The code below does not work. Where should I make the changes; On the HTML side or on the Java side? Also I'm accessing an AJAX request and on the second attempt of a PUT method, th

The best way to display multi-column data in grid format in winforms?

What is the best way to display multi-columned data in a grid format using C# WinForms? Is it ListView or DataGridView?DataGridView is your best bet for simple grid display. However, if you are needing nested (collapsible/expandable) display for chil

SQL Server 2008 R2: Best way to export BLOB and other column data to the file system

I have read some arts about BCP or CLR code that will export BLOBs to individual files on the hard drive, but I need to BCP or CLR out the entire table data together (meaning the other columns which are character or integer or datetime data need to c

How to clear form data when refreshing in Firefox

I am writing a web page with a form on it but when I refresh the page the form does not clear the data like I want it to, this only happens in Firefox. Does anyone know how to get around this aside from using JQuery to go through and clear the forms

Check the separate value by ignoring the other column data

If I have a table with data like this : Col1 | Col2 | Col3 111 | a | 222 | b | 1 111 | | And I wish to select Distinct Col1 only from my temporary table then insert into another table.But with the table above , the second "111" will be selected

because of the transformation of a single column data frame into a vector after the subset

I have a single columned data frame structure. Based on a condition, I am subsetting the data frame during run time. I have observed that the data frame is being converted into a vector after subsetting. I have achieved the data frame structure back

↧

Set Up Your Workstation to Create SQL Server Reporting Services Reports for Dyna ...

October 28, 2018, 8:18 am

≫ Next: Implement SFTP File Transfer with SQL Server Integration Services and PSFTP

≪ Previous: Clear column data

Summary The process to set up a workstation to create SQL Server Reporting Services (SSRS) reports for Dynamics 365 involves a lot of steps and causes some confusion. The report authoring tool is Visual Studio, but a full licensed version of Visual Studio is not required. The Dynamics 365 Report Authoring Extensions has been recently updated to address TLS 1.2 requirements of Dynamics 365 v9 Revisited

A couple of years ago I wrote a posting on how to set up a workstation to create SSRS reports for Dynamics 365/CRM. It turned out to be one of my more popular posts. If you are writing reports for older versions of Dynamics 365/CRM, those instructions are still very relevant.

Click here for the original post.

Since that post, there have been a couple of updates and also an issue with connecting to Dynamics 365 v9 in regards to TLS 1.2 when using…

↧

Implement SFTP File Transfer with SQL Server Integration Services and PSFTP

October 28, 2018, 8:16 am

≫ Next: More efficient SQL Server UPDATE for VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(M ...

≪ Previous: Set Up Your Workstation to Create SQL Server Reporting Services Reports for Dyna ...

By: Koen Verbeeck || Related Tips:More > Integration Services Development

Problem

I want to transfer files from an FTP server to my local server. The server is an SFTP server however, so I can’t use the Integration Services FTP task. How can I resolve this?

Solution

In this tip, we’ll use the free tool PSFTP (from the PuTTy software family) in combination with SQL Server Integration Services (SSIS) to download a file from an SFTP server.

Some remarks:

This tip is an update from the tip Using SFTP with SQL Server Integration Services . There were a couple of questions about the use of PSFTP in the comments and this tip will try to answer those. There are plenty of other tools which also can handle SFTP (or FTPS) file transfers, such as WinSCP . This tip focuses solely on PSFTP, but the method used is very much similar when you would use other tools. The tip uses basic authentication with username and password. Complex set-up with public/private keys is out of scope. You don’t really need SSIS to transfer files with SFTP, you could use PSFTP (or other tools) from batch scripts just as easily. But typically, such a file transfer is part of a larger ETL process orchestrated by SSIS. For example, after downloading the file it could be ingested and written to a database, after which the file is archived. All of this can also be done in the same SSIS work flow. The example package in this tip is created with SSDT 15.7.3 for Visual Studio 2017. The project settings are configured to target SQL Server 2017, but the provided solution is valid for at least SSIS 2012 and up (probably SSIS 2005 and 2008 as well, but this hasn’t been tested). SFTP File Transfer with SSIS Test Set-up

When you want to transfer a file with SFTP, you need an SFTP server of course. The SolarWinds SFTP server is free to use and easy to set up. In the settings, we configure the root directory of the SFTP server, which is a folder on your hard drive:

Implement SFTP File Transfer with SQL Server Integration Services and PSFTP

In the TCP/IP settings, you can optionally change the port:

In the Users tab, we add a user called mssqlTIPS and we specify a password:

Click OK to exit the settings. Don’t forget to actually start the SFTP server. You might need administrative permissions to do this. You can also start the service from the windows Services Manager (services.msc).

As the last step of the test set-up, we place a random file in the root directory. This file will be “downloaded” by SSIS and PSFTP:

Creating the SSIS Package

Add a new package to your SSIS project. On the control flow, add an Execute Process Task . This task will call the PSFTP executable.

Open the editor of the task. Configure it as follows:

RequireFullName : this configures if the task fails if the executable cannot be found at the specified path. If the executable is added to the PATH environment variable, you can set this to False . Executable : the name of the executable you want to execute, in our case psftp . If you don’t have psftp added to the PATH environment variable, you can either copy psftp.exe to the working directory, or you can specify the full path to the executable: C:\Program Files\PuTTY\psftp.exe. Arguments : the arguments that you want to pass to psftp. First, we specify the user and the server name, using the syntax [emailprotected] . Then we specify the password with the switch -pw. Then the switch -be is used, which means continue on error when using a batch file. We also use the -batch switch to avoid any interactive prompts. Finally, we specify a batch file containing all the SFTP commands we want to execute on the SFTP server, using the -b switch. Typically, you’ll want to use an expression to pass the value to the arguments, so you can use parameters to make them more dynamic. WorkingDirectory : this is the location where we want to put the downloaded file. WindowStyle : this needs to be set to Hidden. We don’t want any interactive windows.

A full list of all the arguments you can pass to psftp (including how to specify an alternative port) can be found here . The contents of the batch file SFTPCommands.bat are the following:

cd MSSQLTIPS
get BimlBasics.pptx

With the command cd , we specify the directory on the SFTP server we want to browse to. With the get command, we indicate which file we want to download to our working directory.

The Execute Process Task will give a warning when the psftp is not located in the working directory:

Let’s test the package. After running it, you should see the file in the working directory:

You can also verify on the SFTP server itself:

Deploying the Package to a Server

Normally you’re not going to run the package manually, but rather schedule it on a server. In this tip, we’ll use SQL Server Agent to schedule the package. There’s a bit of prep work that needs to be done:

Make sure psftp is also downloaded and installed on the server. Optionally add it to the PATH environment variable there as well. You can either use the SQL Server Agent account to execute the SSIS package, but best practice is to use a proxy account. The tip Running a SSIS Package from SQL Server Agent Using a Proxy Account explains how you can set this up. Give the proxy account read/write permissions on the working directory and execute permissions on PSFTP. Use the same folder structure for your working directory, or parameterize your set-up. Adjust any firewall settings if necessary. Deploy the SSIS package/project

↧

More efficient SQL Server UPDATE for VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(M ...

October 28, 2018, 8:14 am

≫ Next: SDU Tools: List Foreign Key Columns in a SQL Server Database

≪ Previous: Implement SFTP File Transfer with SQL Server Integration Services and PSFTP

By: Douglas Correa || Related Tips:More >T-SQL

Problem

When you need to update SQL Server data in columns that have data types like VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX) that contain values like JSON or XML script without have performance issues.

Solution

SQL Server 2005 introduced new large value data types to replace the deprecated text, ntext and image data types. These data types can store up to 2^31-1 bytes of data. Updates can be made without rewriting the entire column value, but there is no difference to inserting a large value with a regular INSERT.

To update a large value data type, the UPDATE can use the WRITE clause as follows:

UPDATE dbo.LargeTableObject
SET col.WRITE (expression, @offset, @length )
WHERE id = 1;
GO

You can see the difference between a regular UPDATE and UPDATE with the WRITE clause.

Example of using SQL Server UPDATE with WRITE Clause

Using this method can be a good choice and to demonstrate it I’ll create a table with a varchar(max) column data type.

DROP TABLE IF EXISTS LargeTableObjectGO
CREATE TABLE LargeTableObject (
id BIGINT IDENTITY
,col1 VARCHAR(MAX)
)GO

Next, a row is inserted. I'm using a JSON file in this example. You can download the JSON file from here .

INSERT INTO dbo.LargeTableObject (col1)SELECT BulkColumn
FROM OPENROWSET (BULK 'C:\temp\citylots.json', SINGLE_CLOB) as j
GO

We can see how SQL Server saves this data, by running the query below.

SELECT OBJECT_NAME(I.object_id)
, I.name
, AU.total_pages
, AU.used_pages
, AU.data_pages
, P.rows
, AU.type_desc
FROM sys.allocation_units AS AUINNER JOIN sys.partitions AS P ON AU.container_id = P.partition_id
INNER JOIN sys.indexes AS I ON I.index_id = P.index_id AND I.object_id = P.object_id
WHERE P.object_id = OBJECT_ID('LargeTableObject')
GO

The image below shows we are using LOB_DATA pages and we are using 7513 total pages (each page is 8K) to store this data, which is about 60MB.

More efficient SQL Server UPDATE for VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(M ...

We can also see this data if we you sp_spaceused on this table.

The next step is clear the plan cache and run a checkpoint to clear the transaction log. Also, we will set statistics io on to get how many pages are needed for the update.

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
CHECKPOINT
SET STATISTICS IO ON
SELECT * FROM fn_dblog(null,null)
GO

With the function fn_dblog you can check to see that the log is clean.

Before use the WRITE clause, let’s update the column adding text at the end using a regular update.

UPDATE dbo.LargeTableObject
SET col1 = col1 + REPLICATE(' regular update ', 2 )
WHERE id = 1
GO

The statistics io shows the following:

Table 'LargeTableObject'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 66981, lob physical reads 11, lob read-ahead reads 98106.

Table 'Worktable'. Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 77360, lob physical reads 0, lob read-ahead reads 25557.

(1 row affected)

And, let’s take a look at the transaction log again.

SELECT * FROM fn_dblog(null,null)
GO

We can see there are a lot more rows now.

The next step I will use the WRITE clause for the UPDATE, but before that I’ll clean my plan cache and transaction log again.

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
CHECKPOINT
SELECT * FROM fn_dblog(null,null)
GO
More efficient SQL Server UPDATE for VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(M ...

We will do a similar update as we did above, but use the WRITE clause.

UPDATE dbo.LargeTableObject
SET col1.WRITE (REPLICATE(' write update ', 2 ), NULL , NULL )
WHERE id = 1
GO

The statistics io for this update is as follows. This is a big difference than the regular update.

Table 'LargeTableObject'. Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 3, lob physical reads 2, lob read-ahead reads 0.

Now let's look at the transaction log.

SELECT * FROM fn_dblog(null,null)
GO

We can see there are a lot less rows after the update using the WRITE clause.

Compare Execution Plans

If we look at the execution plans for both queries, we can see they almost look the same. But as we saw, there is a lot less activity when using the WRITE clause.

Conclusion

There is a big difference when using WRITE clause and this improvement is a good reason to change the way you update large value data types. The execution plans will not show what’s the better choice, so you need to use set statistics io to get more information.

Next Steps Getting IO and time statistics for SQL Server Queries How to read the SQL Server Database Transaction Log More about the WRITE clause here .

Last Update: 2018-10-24

About the author
More efficient SQL Server UPDATE for VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(M ...

Douglas Correa is a database professional, focused on tuning, high-availability and infrastructure. View all my tips

Related Resources

More Database Developer Tips...

↧

SDU Tools: List Foreign Key Columns in a SQL Server Database

October 29, 2018, 12:44 am

≫ Next: Understanding Query Optimizer Timeouts

≪ Previous: More efficient SQL Server UPDATE for VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(M ...

In aprevious post, I talked about the ListForeignKeys procedure as part of our free SDU Tools for developers and DBAs. That procedure returned one row per foreign key. Sometimes though, you need to process each column of a foreign key separately. So we've provided the ListForeignKeyColumns tool to do that.

The tool also detects any keys that are using system-generated names. (We don't recommend that).

You can see how to execute it in the main image above. The procedure takes these parameters:

@DatabaseNamesysname This is the database to process

@SchemasToListnvarchar(max) a comma-delimited list of schemas to include (ie: 'Sales,Purchasing') or the word 'ALL'

@TablesToListnvarchar(max)- a comma-delimited list of tables to include (ie: 'Customers,Orders') or the word 'ALL'

One row is returned for each foreign key column, rather than for each foreign key.

The columns returned are ForeignKeyName, IsDisabled, IsNotTrusted, IsSystemNamed, SchemaName, TableName, ColumnID, ColumnName, ReferencedSchemaName, ReferencedTableName, ReferencedColumnName.

You can see it in action here:

To become an SDU Insider and to get our free tools and eBooks, please just visit here:

http://sdutools.sqldownunder.com

↧

Understanding Query Optimizer Timeouts

October 29, 2018, 12:42 am

≫ Next: SQL Server 2019 Enhanced PolyBase Part 1

≪ Previous: SDU Tools: List Foreign Key Columns in a SQL Server Database

What Is Optimizer Timeout?

SQL Server uses a cost-based query optimizer. Therefore, it selects a query plan with the lowest cost after it has built and examined multiple query plans. One of the objectives of the SQL Server query optimizer (QO) is to spend a “reasonable time” in query optimization as compared to query execution. Therefore, QO has a built-in threshold of tasks to consider before it stops the optimization process. If this threshold is reached before QO has considered most, if not all, possible plans then it has reached the Optimizer TimeOut limit. An event is reported in the query plan as Time Out under “Reason For Early Termination of Statement Optimization.” It’s important to understand that this threshold isn’t based on clock time but on number of possibilities considered. In current SQL QO versions, over a half million possibilities are considered before a time out is reached.

Optimizer timeout is designed in Microsoft SQL Server and in many cases encountering it is not a factor affecting query performance. However, in some cases the SQL query plan choice may be affected by optimizer timeout and thus performance could be impacted. When you encounter such issues, if you understand optimizer timeout mechanism and how complex queries can be affected in SQL Server, it can help you to better troubleshoot and improve your performance issue.

↧

SQL Server 2019 Enhanced PolyBase Part 1

October 29, 2018, 12:40 am

≫ Next: Data Classification in SQL Server 2019

≪ Previous: Understanding Query Optimizer Timeouts

SQL Server 2019 is recently launched in the ignite 2018 event by Microsoft. We can get an overview of SQL 2019 preview version and learn how to install it on windows environment by following up the article SQL Server 2019 overview and installation .

We will explore SQL 2019 Enhanced PolyBase feature in a series of article. In this first part of the article, we will explore below topics

Overview of ETL and PolyBase Install PolyBase into SQL 2019 Overview and Installation of Azure Data Studio SQL Server 2019 preview extension in Azure Data Studio Overview of ETL and PolyBase

In today’s industry requirement, we have data in various databases such as Oracle, MongoDB, Teradata, PostgreSQL, etc. The application requires accessing data from these various data sources and combining data into a single source. It is a challenging task for the database developers and data scientists. We normally use ETL (Extract-Transform-Load) to move the data around the different sources.

Below are the steps involved in ETL processes

Read data from the data source of your choice and extract the specific data Transform process works on this data based on the logic, rules, and convert data Load process writes the data to the destination database
SQL Server 2019 Enhanced PolyBase Part 1

SQL Server 2019 Enhanced PolyBase Part 1

ETL provides great values to apply business logic to the data transform data from various sources and move the data into a single destination or multiple formats. ETL process is having some challenges as below:

We need to move data from the source that will require extra resources in terms of disk space Data security is also another aspect. Copy of the data should be should be secured from unauthorized access An ETL process is slow to process and requires efforts to maintain due to its complex logic

In SQL Server 2016, we came across new feature ‘PolyBase’ that allows querying relational and non-relational databases. This data virtualization allows integrating data from the multiple sources without moving the data. This actually creates a virtual data layer called as data lake or data hub. We can access all data from the single sources that allows controlling security as well from a single point. We can query Hadoop and Azure Blob Storage using PolyBase in SQL Server 2016.

In the article, SQL Server 2016 PolyBase tutorial , we explored query a CSV file stored in Azure Blob storage from SQL Server 2016 using PolyBase.

SQL 2019 provides enhancement to PolyBase to access data from various data sources such as Oracle, Teradata, MongoDB, and PostgreSQL. We can also access data from any data sources with an ODBC driver. We can create external tables that link to these data sources (SQL Server, Oracle, Teradata, MongoDB, or any data source with an ODBC). Users can access these data from external tables similar to a relational database table. These external tables are linked to the data sources and when we execute any query, data from an external table is retrieved and shown to the user.

On the image below, we can see PolyBase in SQL Server 2019:

Install PolyBase into SQL Server 2019

Let us first install PolyBase into SQL 2019. In an earlier article, SQL Server 2019 installation on Windows , we installed SQL 2019 preview version. Therefore, I will not cover complete installation here.

Put a checkbox against ‘PolyBase Query Service for external data’ in the feature selection page.

You need to install Oracle JRE 7 update 51 or higher to install Polybase. If it is not installed, you will get below error message while checking the rules for installation.

To fix this error, go to ‘ Java SE Runtime Environment 8 Downloads ‘ and download Java SE Runtime Environment 8u191E. Double click on the setup file to install it.

In the next page, we need to do the PolyBase Configuration. If we are installing PolyBase on a standalone instance, select the option ‘ Use this SQL Server as standalone PolyBase enabled instance’

We can also set up PolyBase a scale-out configuration in which we define the head node and compute nodes. This allows getting performance improvement for the large data sets. You can get more information about this option from PolyBase scale-out groups as shown in below image obtained from this page.

In this article, we will use PolyBase on standalone SQL Server instance. Therefore, select the first option ‘Use this server as a standalone PolyBase enabled instance’ and click Next .

In the next page, we can specify the service accounts for below two PolyBase services. Service account should be the same for both the services.

SQL Server PolyBase engine SQL Server PolyBase data movement
SQL Server 2019 Enhanced PolyBase Part 1

Review the configuration and click on Install .

Below is the confirmation page after ‘PolyBase Query Service for External data’ service installation is successful.

Check the services in the configuration manager. It should be in running state.

Overview and Installation of Azure Data Studio

In the previous articles, SQL Operations Studio , we learned that SQL Operations Studio is a new GUI based tool that works on Windows, Mac OS and linux operating systems. It connects to SQL Server, Azure database, and SQL Data Warehouse.

Azure Data studio is now a new name for SQL Operation Studio. Azure Data studio provides support for the SQL Server 2019 new features in the October release such as support to big data clusters, enhanced PolyBase, Azure notebook, Azure resource explorer.

We can install Azure Data Studio on Windows, Linux, and MacOS. In this article, we will install on the windows environment.

Follow the below steps:

Download the latest October release of Azure Data Studio from the link
SQL Server 2019 Enhanced PolyBase Part 1

Once the setup download is complete, double-click to launch the setup wizard.

Accept the license agreement and click on Next .

Specify the destination directory. The default location is ‘C:\Program Files\Azure Data Studio’. We need to have at least 365.2 MB of free disk space in the disk.

Setup creates the start menu folder. We can select the folder in the start menu. If we do not want to create the startup menu folder, put a checkbox on ‘Don’t create a Start Menu folder’.

We can also select to create a desktop icon. This also adds a PATH in the environment variable.

We can also register Azure Data studio to use an editor for the supported file types. To do so, put a check here as shown below.

Configuration is now completed, Click on Install to complete the installation process of Azure Data Studio.

We get the below screen once the setup is complete for the Azure Data Studio. We can launch the Azure Data Studio from here itself or from the Start menu.

Default screen for the Azure Data Studio is as shown below. SQL Server 2019 is in preview state so here we get the option whether we want to enable preview features. Click on yes to enable the preview features.

Enter the connection details like instance name, authentication type, server group (we can select existing server group or create a new group).

Azure Data Studio also allows specifying the friendly name for the connection in the recent release.

As shown below, we are connected to SQL 2019 preview instance with the friendly name in Azure Data Studio.

Now in order to use SQL 2019 preview version all features, we need to install ‘SQL Server 2019 (Preview)’ extension from the Marketplace.

Click on the ‘SQL Server 2019 (Preview)’ extension in the Marketplace and we can get an overview of the preview extension. You can go through it to get more information about the extension.

Click on Install opens up a webpage where we can download the SQL Server 2019 extension (preview) .vsix file.
SQL Server 2019 Enhanced PolyBase Part 1

Now go to file -> “Install Extensions from VSIX Package” and provide the path of the downloaded .vsix file.

Click Yes to install the extension. This will take some time to install this SQL Server 2019 preview extension.

We get the below message after the extension is successfully installed. Click on Reload Now to install its dependencies and take this extension into effect.

Conclusion

In this article, we took an overview of SQL 2019 PolyBase enhancements, Azure Data Studio installation and its extension to support SQL Server 2019 preview features. In the next article, we will create sample database objects in the Oracle and create external tables to access these objects from the SQL 2019 PolyBase external tables.

↧

Data Classification in SQL Server 2019

October 29, 2018, 12:38 am

≫ Next: SQL Server 2019 Enhanced PolyBase Part 2

≪ Previous: SQL Server 2019 Enhanced PolyBase Part 1

One of the areas that Redgate is working on is making data classification easier. Microsoft added some capabilities to SSMS 17.5 and Redgate has an EAP out for the next version of our data catalog tool .

Azure SQL Database has had some advanced options they were building into the database engine, and we get our first look in the on-premises version with SQL Server 2019 CTP 2.0.

The ADD SENSITIVITY CLASSIFICATION and DROP SENSITIVITY CLASSIFICATION DDL is now available, and here are few examples of how this works.

Let’s look at a database that has some potential data to classify. I’ve got a sample database with a few tables. In fact, if I look at the data classification suggestions in SSMS, I see 7 columns.

I can accept any of these, but if I do, these are written to extended properties, which isn’t the best way of storing this data.

However, the ADD SENSITIVITY CLASSIFICATION syntax works well. If I take that dbo.Contacts.Email column and decide this is Confidential according to the GDPR, I can do this:

ADD SENSITIVITY CLASSIFICATION TO dbo.Contacts.Email WITH (LABEL = 'Confidential - GDPR')

If I then query my meta data table, I’ll see this:

There are other items I can add, such as the information type and then IDs for the label and type. I can, however, update that data like this:

ADD SENSITIVITY CLASSIFICATION TO dbo.Contacts.Email WITH (INFORMATION_TYPE = 'Contact', INFORMATION_TYPE_ID = '5BFAE3B8-4549-4989-BEB6-F9BF6434DAD1')

Note I still haven’t given the Label_ID a value, but that’s OK. This allows me to add human readable metadata to columns, as well as add IDs that I might get from some external auditing system.

This feels primitive, but it’s slightly better than extended properties, and it’s somewhat built into the engine, so we can code this as a part of development and ensure classification is added to our sensitive data.

If this is an area you’re interested in, we’d love to have you try the Redgate tool and give us feedback. We’re working on this problem and trying to find ways that are both useful and sustainable over time.

↧

SQL Server 2019 Enhanced PolyBase Part 2

October 28, 2018, 8:30 pm

≫ Next: SQL Server Custom Rounding

≪ Previous: Data Classification in SQL Server 2019

In the previous article of the series, we took an overview of PolyBase in SQL Server 2017. We also learned about the Azure Data Studio and SQL Server 2019 preview extension to explore SQL Server 2019 features.

In this article, we will use PolyBase to connect to Oracle database and see how we can create external tables pointing to Oracle database and access data without moving the data into the SQL Server 2019 database.

Therefore, in this article, we will explore below topics

Install Oracle Express Edition database Insert Sample database into the DB Create an external table using Azure SQL Data Studio Access data table from an external table pointing to Oracle DB Install Oracle Express Edition database

Firstly, we will install Oracle Express Edition 11g Release 2 and prepare sample database and tables. Later we will use access to this table from the SQL Server 2019 using an external table.

Download Oracle Express Edition 11g Release 2 from the link for windows x64 platform.

SQL Server 2019 Enhanced PolyBase Part 2

Once set up file download is completed, downloaded, double-click on it to launch the installation wizard.

We get the below welcome screen to install Oracle Database 11g Express Edition. Click on Next .

Accept the license agreement and click on Next .

By default, Setup installs the Oracle Database 11g Express Edition into C:\oracleexe folder. If we want to change, click on Browse and give the new path.

Enter the password for SYS and SYSTEM database admin accounts. Password will be the same for both the accounts. Both accounts are created automatically during the installation.

Both SYS and SYSTEM accounts can perform all administration tasks in Oracle while SYSTEM account cannot do the backup, recovery and database upgrades. You can refer to SYS and SYSTEM Users for more details.

In this page, review the installation setting. We can see here that default locations are:

Oracle Home: C:\oraclexe\app\oracle\product\11.2.0\server

Oracle Base: C:\oraclexe

Port for Oracle Database listener: 1521

Click on Install to begin installing Oracle Database 11g Express Edition.

We get the progress of the status of installation as shown below:

We get below message once the Oracle Database 11g Express Edition is installed successfully.

We can see a new folder in start menu “Oracle Database 11g Express Edition”.

Click on Get Started and it opens a web page of Oracle Database XE 11.2 with all configuration options, session, parameters details, SQL editor etc.

Log in with a database user having DBA role. We can log in here with the SYSTEM account created while doing the installation.

In the next step, we will create a shared work area (workspace) which works as a virtual private database. Enter the database username, application express username, and password.
SQL Server 2019 Enhanced PolyBase Part 2

We can see in below image that the workspace is created successfully. Now we will log in to the workspace with the credentials created.
SQL Server 2019 Enhanced PolyBase Part 2

Enter the credentials.

We can see the workspace where we can run the SQL query, create objects etc.

In the next step, we will run the script that will create the sample objects and insert data into the objects. Copy the script and provide a name to the script.
SQL Server 2019 Enhanced PolyBase Part 2

Click on Run Now to execute the script.

We can see that the script is executed successfully.

Now go to object browser and we can see that objects and the data into that. For example, in below screen, we can see the data in the Employees table.

Now we have the Oracle database and sample object ready. Therefore, in the next step, we will use the Azure Data Studio to create an external table for Oracle data source.

Azure Data Studio to access external data in Oracle using PolyBase

As discussed, so far below are the requirements to access Oracle database using PolyBase with Azure Data Studio

SQL Server 2019 preview 4 Azure Data Studio with SQL Server 2019 extension Oracle Data Source Polybase services should be running with SQL Server database services.
SQL Server 2019 Enhanced PolyBase Part 2

If PolyBase is not installed, we will get the error “the Operation requires PolyBase to be enabled on the target server”.

This feature is available for SQL Server 2019 only, we get the below error if we try to use external table wizard for instances other than SQL Server 2019.

Steps to Create External Tables in Azure Data Studio

In this step, we will configure the external table using PolyBase with the help of External table wizard in Azure Data Studio.

Right click on the Database and Create External Table.
SQL Server 2019 Enhanced PolyBase Part 2

This launches the below external table wizard. This shows the two data sources: SQL Server and Oracle.

By default, SQL Server is highlighted. In this article, we want to create a data source for Oracle.

In this step, we will create the Database Master Key. We will provide the master key password.

If a master key already exists on the database, we get the message that master key already exists on this database.

Alternatively, we can create Database master key using the below script

CREATE MASTER KEY ENCRYPTION BY PASSWORD = “Complex password”;
SQL Server 2019 Enhanced PolyBase Part 2

Click on Next to create a connection to Data source. Enter the below details:

Server name: Server name should be of format server: port

Database Name: Default service name for Oracle express edition is XE. We can give the service name as per our DB configurations.

Credentials: Enter the database-scoped credential or we can create new credentials here.

Click on Next to move forward.

In the next step, we will choose the external table to access from the SQL Server. In this demo, we will select DEMOUSER.Employees table.

Once we select this table, we can see the source table and its corresponding external table name. We can also see the source and destination column mapping and properties.

Next steps show a summary of the tasks such as destination database, database scoped credential name, external data source name, and external table name.

If we want to generate a script for this external table configuration, click on Generate Script . This will create a script in a new query window.

Click on Create to create an external table.

In the task history, we can see that the external table is created successfully.

We can see in the database dbo. Employees table exists. We can easily identify external tables with EXTERNAL keywords as a suffix to the table name in Azure Data Studio.

As shown below, we can view the records in the table similar to a relational database table.

Below is the script generated by the external table creation wizard in Azure Data Studio. We will explain this script in further articles.

BEGIN TRY BEGIN TRANSACTION T35c299624c5449ae8a5e37d96282f89 USE [SQLShackDemo]; CREATE DATABASE SCOPED CREDENTIAL [test] WITH IDENTITY = system, SECRET = ABC@system1; CREATE EXTERNAL DATA SOURCE [Test] WITH (LOCATION = oracle://192.168.225.185:1521, CREDENTIAL = [test]); CREATE EXTERNAL TABLE [dbo].[EMPLOYEES] ( [EMPLOYEE_ID] DECIMAL(6,0) NOT NULL, [FIRST_NAME] VARCHAR(20) COLLATE Latin1_General_CI_AS, [LAST_NAME] VARCHAR(25) COLLATE Latin1_General_CI_AS NOT NULL, [EMAIL] VARCHAR(25) COLLATE Latin1_General_CI_AS NOT NULL, [PHONE_NUMBER] VARCHAR(20) COLLATE Latin1_General_CI_AS, [HIRE_DATE] DATE NOT NULL, [JOB_ID] VARCHAR(10) COLLATE Latin1_General_CI_AS NOT NULL, [SALARY] DECIMAL(8,2), [COMMISSION_PCT] DECIMAL(2,2), [MANAGER_ID] DECIMAL(6,0), [DEPARTMENT_ID] DECIMAL(4,0) ) WITH (LOCATION = [XE].[DEMOUSER].[EMPLOYEES], DATA_SOURCE = [Test]); COMMIT TRANSACTION T35c299624c5449ae8a5e37d96282f89 END TRY BEGIN CATCH IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION T35c299624c5449ae8a5e37d96282f89 DECLARE @ErrorMessage NVARCHAR(4000) = ERROR_MESSAGE(); DECLARE @ErrorSeverity INT = ERROR_SEVERITY(); DECLARE @ErrorState INT = ERROR_STATE(); RAISERROR(@ErrorMessage, @ErrorSeverity, @ErrorState); END CATCH;

In SQL Server Management Studio, the external table is present in tables -> external tables section.

If we view the query execution plan for this external table in Azure Data Studio, we can see the operator Remote Query that shows data is extracted from the remote data source when we run the query and actually does not hold any data.

Similar to Azure Data Studio, we can get more details of the execution plan and operator as shown below. We can see that remote source is Polybase_ExternalConfiguration.

Let us update the records in Oracle database. In this below example, we can see that the employee name for employee id 100 is updated from Steven King to Rajendra Gupta.

Now let us verify the updated employee name using an external table. Therefore, we can view the live data using the external table. We do not need to bring the data again since it accesses live data from the data source. It does not store of the copy of the data.
SQL Server 2019 Enhanced PolyBase Part 2

We can create statistics on an external table to get optimal performance.

CREATE STATISTICS EMPLOYEESKeyStatistics ON Employees (Employee_ID) WITH FULLSCAN;
SQL Server 2019 Enhanced PolyBase Part 2

Conclusion

SQL Server 2019 preview (SQL Server vNext CTP 2.0) provides the ability to access relational and non-relational data using data virtualization technique PolyBase. This is very useful and nice enhancements to access all data at a single place only. We can access this data the similar way of a relational data. In the next article, we will create an external table using T-SQL for the same data source pointing to Oracle and explore more features of external tables.

↧

SQL Server Custom Rounding

October 28, 2018, 10:50 pm

≫ Next: Shortcut: Configuring registered servers in SQL Server Management Studio

≪ Previous: SQL Server 2019 Enhanced PolyBase Part 2

I have a situation where :

24.9999 should be 25 24.5000 should be 25 24.4999 should be 24 24.1111 should be 24

I tried Ceiling , but the result will be 25, where Floor will be 24 for all of them.

How to accomplish this?

Thanks a lot for your time.

Note: it might be helpful to let you know that I want this functionality to be inside a computed column .

Use:

Round(YourNumber, 0)

The 0 indicates the precision (i.e. number of decimal places); if you wanted to round 42.51 to 42.5, you'd replace the 0 with 1, for example.

Make sure not to use float s - they can sometimes be approximated, which causes values to be rounded incorrectly on occasion.

↧

Shortcut: Configuring registered servers in SQL Server Management Studio

October 28, 2018, 10:48 pm

≫ Next: Discover how SQL Server can use Python to access any NoSQL engine

≪ Previous: SQL Server Custom Rounding

When working with SQL Server systems, it can be hard to remember the names of all the servers, to remember connection details for the ones that need SQL logins (instead of windows authentication), and to remember other details of those servers, such as which environments they are part of (eg: production, UAT, test)

SQL Server Management Studio (SSMS) has a facility to help you to do this. It allows you to register server details in a single place.

By default, the window isn't shown, but from the View menu, you can choose Registered Servers .

Shortcut: Configuring registered servers in SQL Server Management Studio

When the window opens, you can see this:

Note the toolbar near the top of the window. It is showing that we're configuring database servers but the other icons let you know that you can also work with Analysis Services, Integration Services, and Reporting Services servers.

The first decision that you need to take is to decide where the details will be stored. Local Server Groups are stored on your local system ie: the system that is running SSMS. If you move to a different system to work, you won't have those connection details. Alternately, a Central Management Server can be configured. This is a server that agrees to hold connection details. While this seems a great idea (because the details would be held in a single place), one of the down-sides of this arrangement is that only Windows authentication can then be used. Local Server Groups can also work with SQL logins.

Let's create a server group as an example. If I right-click Local Server Groups, here are the available options:

Note that there is an option to Import (and Export) these details. This at least allows you to move details between systems.

Let's create a new Server Group:

It just needs a name and an optional description, then OK. When it's created, right-click it, and choose New Server Registration:

I've connected to the server SDUPROD and I've given the registered server the same name. Note that you don't need to do that. I could have called it PayrollServer or some other more meaningful name. You'll also notice that there are tabs for configuring other connection properties.

I've then created a second server called HRServer and under the covers, I've pointed it to another server.

Now I have all my servers in groups, in an appropriate location. I can right-click them to open new queries to them, and to do much more.

↧

Discover how SQL Server can use Python to access any NoSQL engine

October 29, 2018, 12:36 am

≫ Next: I have a broken leg

≪ Previous: Shortcut: Configuring registered servers in SQL Server Management Studio

By: Maria Zakourdaev || Related Tips:More >python

Problem

In the world of polyglot persistence, each data asset should get the best matching database management tool. Many companies these days keep their data assets in multiple data stores. Many companies that I have worked at have used other database systems alongside SQL Server, such as mysql instances, Redis, Elasticsearch or Couchbase. There are situations when the application, that uses SQL Server as their main database, needs to access data from another database system. For instance, in legacy applications, developers might demand a transparent solution at the database level, so their interfaces would not change.

Some datastores have ODBC/JDBC drivers so you can easily add a linked server and use it inside your procedures. Some datastores do not have such an easy way of accessing its data, so we will look at what we can do with SQL Server and Python to access these other platforms.

Solution

Python support was introduced in SQL Server 2017 and opens a new perspective on querying remote databases. Using the sp_execute_external_script procedure allows us to query any database that has a Python library. The query results can be used later on in any Transact SQL logic inside SQL Server stored procedures. Such solutions offload the hard work of scanning, filtering and aggregating data to the remote database while your application continues to work with SQL Server. Moreover, you can access huge tables, that may cause query performance issues, from any fast datastore and still serve the queries through SQL Server.

Preliminary Setup Steps

In order to use the sp_execute_external_script procedure to execute Python scripts, you will need to follow these 3 steps:

Step 1.Enable external scripts execution:

exec sp_configure 'external scripts enabled', 1
RECONFIGURE

Step 2.In order to avoid firewall errors, I disabled the windows Firewall, but the right way would be to figure out how to set the correct configurations. If the Windows Firewall is on, Python scripts that are trying to access external resources will fail.

Step 3.In order to use additional Python libraries, we will install them using the PIP utility. The PIP utility has been installed as a part of the SQL Server installation wizard ( Machine Learning > Python support ). See this for Detailed installation steps .

For the purpose of this tip I will show how to access data stored in the NoSQL database - Elasticsearch. Elasticsearch is widely used as an addition to other databases, more like a fast data reading layer. You can read more about Elasticsearch .

We will install the elasticsearch library to connect to the Elasticsearch cluster. Start windows cmd. Note that the path below can vary depending on the disk where you have installed the SQL Server binaries.

cd C:\Program Files\Microsoft SQL Server\mssql14.MSSQLSERVER\PYTHON_SERVICES\Scripts
pip install <library> , in my case pip install elasticsearch
Discover how SQL Server can use Python to access any NoSQL engine

Discover how SQL Server can use Python to access any NoSQL engine

That’s it, we are ready to go.

Query Elasticsearch Cluster

In Elasticsearch, data is stored in indexes which is a data logical namespace similar to a database or table, depending on the implementation. You can read more about Elasticsearch indexes here .

The below example is a simple group by query. Elasticsearch query language is somewhat weird, but understandable. This script is searching history_data index. We add a range filter on the report_date column between 20160101 and 20180819 and calculate the amount of documents (count (*) ) per country_name and alias the new column as by_country_name . This query returns top the 100 groups.

{"query": {
"range" : {
"report_date" : {. ―- here is our filter
"gte": 20160101, ―- report_date greater than 20160101
"lte": 20180819 ― report_date less then 20180819
}
}
},
"aggs" : {
"by_country_name" : {. Terms aggregation means GROUP BY , by default uses count function
"terms" : {
"field" : “country_name", ― aggregated field
"size" : 100 ― bring top 100 groups
}
}
},
size:0
}

We will use 2 external libraries in our script:

Pandas library which is preinstalled during the SQL Server installation to pass a python resultset back to SQL Server. Elasticsearch library to connect to the Elasticsearch cluster that we have installed at the beginning of this tip. import pandas as pd
from elasticsearch import Elasticsearch

Connection to Elasticsearch cluster:

es = Elasticsearch(hosts=["servername"], request_timeout=100, timeout=100)

We will execute Elasticsearch query and will print the result set:

cnt = es.search ( IndexName, Query)
print (cnt)

This is what the Elasticsearch query result looks like.

In order to convert this JSON document to a SQL Server resultset, we will do the following:

Define jvalues variable as a list (collection of items) Loop over [“aggregations"]["by_country_name"]["buckets"] part of the json document Define another variable res as a dictionary (collection which contains key:value pairs) We can get the first key “ Key ” (surprisingly) and as you see from the above its value contains the Country name We will call the second key “ total ” and it will contact “ doc_count ” / count * of documents that belong to this country Then we will add the created dictionary to the jvalues list After the loop is over, we will feed the jvalues list into DataFrame, two-dimensional tabular data structure from pandas library. This is the standard way that SQL Server expects to get results from a Python external script. 1 jvalues = []
2 for val in cnt["aggregations"]["by_country_name"]["buckets"]:
3 res = {}
4 res["key"] = val["key"]
5 res["total"] = int(val["doc_count"])
6 jvalues.append(res)
7 dataset = pd.

↧

I have a broken leg

October 29, 2018, 11:44 am

≫ Next: Overview of the SQL Delete statement in SQL Server

≪ Previous: Discover how SQL Server can use Python to access any NoSQL engine

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.) Todays blog posting is a little bit more personal than technical. It seems that I don’t have that much luck this year, because since Sunday evening I have a broken right foot
I have a broken leg

I don’t want to go into the details about how it happened, but in the first step it seemed that I have “only” a strain trauma. But on Monday and Tuesday the pain wasn’t getting really better, so we went to the hospital on Tuesday evening, and this is it now how my right leg looks like since Tuesday evening:

The diagnosis was that the right ankle joint is broken. I got a hard plaster, which will accompany me for the next 12 weeks (3 months!). And from these 12 weeks I’m not allowed to put any force on the leg for the following 8 10 weeks. So more or less my life is now an alternation between sitting and staying in the bed. This is now a lot of fun for me, because in a “normal” week I was running constantly around 50 70 kilometer. Running? Haha
I have a broken leg

When I get the hard plaster off in January 2019, I can’t move my right ankle anymore, I have lost muscles, and my whole endurance. So I have to start with everything again which relates to running. But as a friend told me yesterday afternoon: this is also a second chance, because now I make everything right with running. And trust me: I have done a lot of things wrong in the first place (because I didn’t knew it better), and now I can retry everything with a better understanding of running, and all the other things around it that are also very important (running style, strength training, etc.)

Besides my running “problem”, I have also the real problem that I can’t go anywhere, and that means that I can’t visit for the next 10 weeks any customers physically. As I have said I’m not allowed to put any force on the leg, so flying is also a no-go option
I have a broken leg

So everything that I’m doing now has to be done online. And that’s now the point where you come into the game. As you might know, I’m running over the next few weeks some online trainings about SQL Server in the Cloud , and about SQL Server Performance Troubleshooting . I’m still looking here for some sign-ups so that I can run these online trainings. If you are interested in these topics, just sign-up and make me happy
I have a broken leg

If you or your company is interested in other trainings topics around SQL Server and/or how to run SQL Server successfully on VMware , just drop me a note. I have a lot of different training content that I can deliver immediately as online trainings. Or maybe you are facing some critical SQL Server Performance problems, and you are interested in a SQL Server Health Check to find out the root cause of the performance problems?

The great thing about this downtime is now that I have also plenty of time available for my various research projects. I’m interested to learn a lot of new things about SQL Server 2019, linux, Docker, Kubernetes, VMware NSX, and VMware PKS. And as you know me, I have also a lot of other side-projects like my CPU and OS development. And we have a Netflix subscription…

Thanks for your time,

-Klaus

↧

Overview of the SQL Delete statement in SQL Server

October 29, 2018, 11:42 am

≫ Next: SQL Server Multi Statement Table Value Function (MTVFS) Performance Difference B ...

≪ Previous: I have a broken leg

This article on the SQL Delete is a part of the SQL essential series on key statements, functions and operations in SQL Server.

To remove a row from a tableis accomplished through a Data Manipulation Language, aka DML statement, using the delete keyword.The SQL delete operation is by far the simplestof all the DML commands.On execution of the delete command, we don’t have to worry about gettingany form of data from the table,and we don’t have to worry aboutworking with any data that we get back from the table(s).We just simply tell the databaseto delete a specific record,and it either does or it doesn’t.It’s that simple.

First, let’s quickly reviewwhat an SQL delete statement looks like. We need to tell the database and table from where it should delete the data.It’s a good idea to add a conditionclause to set the scope of data deletion.Otherwise, it will delete everything in the table.

Let’s take a look at our tableand removing some records.

How to delete rows with no where clause

The following example deletes all rows from the Person.Person the table in the AdventureWorks2014 database. There is no restriction enforced on the SQL delete statement using a WHERE clause.

USE Adventureworks2014; GO DELETE FROM [Person].[Person]; How to delete rows with where clause The following example deletes rows from the [Person].[Person] table in the AdventureWorks2014 database in which the value in the businessEntityID column is greater than 30,,000 USE Adventureworks2014; GO DELETE FROM [Person].[Person] WHERE businessEntityID > 30000;

Note: An unfortunate mistake that may occur is to accidently run a SQL Delete with no Where clause and inadvertently delete all of your data. To prevent this from happening consider using the Execution guard feature in ApexSQL Complete, to warn against such potentially damaging actions, before you execute them. Learn more: Execution alerts

How to delete rows using Top with where clause

The following example deletes 50 random rows from the Person.Person table in the AdventureWorks2014 database. The value in the BusinessEntityID must be in between 30,000 and 40,000

USE Adventureworks2014; GO DELETE TOP(50) FROM [Person].[Person] WHERE BusinessEntityID between 30000 and 40000

Note: The when the TOP (n) clause is used with the SQL Delete statement and any DML statement (i.e. Select, Insert, Delete and Update), the operation is performed on a random selection of a number of rows specified in the Top clause.

How to delete duplicate rows

In the real-world, we tend to gather datafrom different sources; it’s not uncommonto have duplicate records. One approach to the duplicate problem is first to identify where the duplicates have occurred. And run a select query on those columns.

EATE TABLE tb_spaceused (database_name NVARCHAR(128), database_size VARCHAR(18), [unallocated space] VARCHAR(18), reservedVARCHAR(18), dataVARCHAR(18), index_sizeVARCHAR(18), unusedVARCHAR(18) ); INSERT INTO tb_spaceused EXEC sp_msforeachdb @command1 = "use ? exec sp_spaceused@oneresultset = 1"; SELECT * FROM tb_spaceused order by database_name

The following example uses thePARTITION BYargument to partition the query result set by all the columns of tb_spaceused table. The Row_Number (), a window function, which means it operates over an ordered set. The ORDER BYclause specified in theOVERclause orders the rows in each partition by the entire columns tb_spaceused table.

WITH CTE AS (SELECT *, ROW_NUMBER() OVER(PARTITION BY database_name, database_size, [unallocated space], reserved, data, index_size, unused ORDER BY database_name ) AS Row_Num FROM tb_spaceused) SELECT * FROM CTE WHERE Row_Num <> 1;
Overview of the SQL Delete statement in SQL Server

Overview of the SQL Delete statement in SQL Server

Replacing the Select statement with a Delete removes all the duplicates of the table.

WITH CTE AS (SELECT *, ROW_NUMBER() OVER(PARTITION BY database_name, database_size, [unallocated space], reserved, data, index_size, unused ORDER BY database_name ) AS Row_Num FROM tb_spaceused) --SELECT * --FROM CTE --WHERE Row_Num <> 1; DELETE FROM CTE WHERE Row_Num <> 1;
Overview of the SQL Delete statement in SQL Server

How to delete rows using SQL sub-queries

In the following example, the rows in one table are deleted based on data in another table. In the examples, the rows from the SalesPersonQuotaHistory table are deleted based on the SalesYTD column of the SalesPersontable.

DELETE FROM Sales.SalesPersonQuotaHistory WHERE BusinessEntityID IN ( SELECT BusinessEntityID FROM Sales.SalesPerson WHERE SalesYTD > 4900000.00 ); GO How to delete rows usingSQL Joins

In this section, we will use the SQL Delete statementto delete the data from the Adeventureworks2014 database.Deleting data, at first sight, sound trivial,but once we get into a large database design things might not be same and easy anymore.

In many cases, the tables are related via a primary and foreign key relationship. In the following example, we can see a use of joins to delete the data from the Sales.SalesPersonQuotaHistory.

DELETE sq FROM Sales.SalesPersonQuotaHistory sq INNER JOIN Sales.SalesPerson sp ON sq.BusinessEntityID = sp.BusinessEntityID WHERE sp.SalesYTD > 4500000.00; GO How to delete rows from a remote table using linked servers and OpenQuery

The following example uses the SQL delete statement to delete rows from a remote table using the linked server named, hqdbt01. Then query the remote table using four-part object naming convention to delete the rows from the remote table

DELETE FROM [hqdbt01].AdventureWorks2014.[HumanResources].[Shift] WHERE ShiftID = 2;

The following example, the remote table is queried by specifying the OPENQUERY rowset function along with the delete command.

DELETE OPENQUERY (hqdbt01, 'SELECT * FROM AdventureWorks2014.HumanResources.Department WHERE DepartmentID = 18'); How to delete rows using SSMS

Using the SQL Server Management Studio (SSMS), Graphical User Interface (GUI) to delete rows involves a manual search.In reality, it will be much easier and quicker to delete records with a SQL query.

Let’s go ahead and locate the table to use a SQL delete statement, in this case, table dbo.cities is selected. Now, right-click and choose Edit Top 200 rows . This opt

↧

SQL Server Multi Statement Table Value Function (MTVFS) Performance Difference B ...

October 29, 2018, 11:40 am

≫ Next: Audit the SQL Server Schema&quest;

≪ Previous: Overview of the SQL Delete statement in SQL Server

Problem

As a part of the SQL Server programming language, you can create user defined functions which are routines that accept different parameters to perform calculations and return the value based on the action performed. There are different types of functions supported in SQL Server: User Defined Functions and built in System Functions. As a part of user Defined Functions, you can have table value functions or scalar functions.

Table value functions can be Inline Table-Valued Functions or Multi Statement Table-Valued Functions. It has been distinguished that Inline Table-Valued Functions perform better than Multi Statement Table-Value Functions. If your code uses Multi Statement Table-Valued Functions you could have a performance bottleneck and the function can perform differently based on the SQL Server version.

Solution

We will walk through an example and show how the query plan information is different using different SQL Server compatibility levels.

Setup SQL Server Test Environment

I have SQL Server 2017 Developer Edition installed and I am using the AdventureWorks2017 database . After downloading the database, I did a restore of the database and then created the following Table Valued Function.

USE ADVENTUREWORKS2017
GO
CREATE OR ALTER FUNCTION TEST_MTVF(@dtOrderMonth datetime)
RETURNS @orderDetail TABLE
(
ProductID INT,
SalesOrderID INT,
SalesOrderNumber nvarchar(30),
CustomerID INT,
AccountNumber nvarchar(30),
OrderDate datetime,
ChrFlag char(1)
)
AS
BEGIN
INSERT INTO @orderDetail
select sod.ProductID,
soh.SalesOrderID,
soh.SalesOrderNumber,
soh.CustomerID,
soh.AccountNumber,
soh.OrderDate,
'N'
FROM Sales.SalesOrderHeader soh
inner join Sales.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderID
WHERE YEAR(soh.OrderDate) = YEAR(@dtOrderMonth)
UPDATE @orderDetail
SET ChrFlag = 'Y'
WHERE OrderDate < Cast(DATEADD(DAY,-1,GETUTCDATE()) AS DATE)
RETURN
END

This is the query we will use to test each execution using different compatibility levels to see how the query plan changes with each run.

SELECT tst.CustomerID, COUNT(*)
FROM Sales.Customer c
INNER JOIN dbo.TEST_MTVF('1/1/2014') tst on c.CustomerID = tst.CustomerID
INNER JOIN Production.Product prod on tst.ProductID = prod.ProductID
GROUP BY tst.CustomerID

I did not install each version of SQL Server, so what I will do is use SQL Server 2017 and then change the compatibility level for each execution, so we can see the differences.

Testing MTVF performance with compatibility level SQL Server 2012(110)

First, we will use compatibility level 110 which is SQL Server 2012 compatibility.

SQL Server Multi Statement Table Value Function (MTVFS) Performance Difference B ...

Now the AdventureWorks2017 database is working under SQL Server 2012 compatibility.

I will use the query above to get the total number of customers versus the sales detail using the above MTVF function. I will execute the code with the Include Actual Execution Plan, so we can look at the plan.

After many runs of the query, I was getting results from 0-1 second. In order to get the cardinality estimation, I hover my mouse on the Table Scan [TEST_MTVF] operator.

We can see the estimated versus the actual rows. The Estimated Number of Rows = 1 and the Actual Number of Rows = 37339 . The bad estimation was found when using the MTVF in SQL Server 2012 compatibility. So, the estimated 1 row could degrade performance.

Testing MTVFS performance with compatibility level SQL Server 2014(120)

Now, I am going to change database compatibility level to SQL Server 2014.

I ran the same query several times and used the Include Actual Execution Plan option.

Again, the query took from 0-1 second. I hovered my mouse on the Table Scan [TEST_MTVF] operator.
SQL Server Multi Statement Table Value Function (MTVFS) Performance Difference B ...

Now we are getting the Estimated Number of Rows = 100 and the Actual Number of Rows = 37339 . Again, we have a bad estimation.

In SQL Server 2014, a new Cardinality Estimator (CE) was introduced. Per MSDN I summarized the improvements.

SQL Server introduces a new CE which is active for all databases with compatibility level of SQL Server 2014. The new CE is calculating combined filter density/selectivity differently. The new CE is treating ascending/descending key scenarios differently. There are significant changes in how column densities of different tables in join situations are evaluated and density. The different changes in calculation can end up in different plans for a query compared with the old cardinality estimation. Dependent on the workload or the application used, there might be the need for more intensive testing of the new CE algorithms in order to analyze the impact on business processes. Testing MTVFS performance with compatibility level SQL Server 2016(130)

Now, I am going to change the database compatibility level to SQL Server 2016.

I ran the same query several times and used the Include Actual Execution Plan option.

The query ran in 0-1 second. I hovered my mouse on the Table Scan [TEST_MTVF] operator.
SQL Server Multi Statement Table Value Function (MTVFS) Performance Difference B ...

The results are Estimated Number of Rows = 100 and the Actual Number of Rows = 37339 . Still getting a bad estimation. In SQL 2016 , many changes were made by Microsoft to improve cardinality estimation, but I still get the same results.

Testing MTVFS performance with compatibility level SQL Server 2017(140)

I am going to change the database compatibility level to SQL Server 2017.

Again, I ran this several times with the Include Actual Execution Plan option.

The query took 0-1 second to complete. Now I am going to check the cardinality estimation by hover my mouse on Table Scan [TEST_MTVF] operator.
SQL Server Multi Statement Table Value Function (MTVFS) Performance Difference B ...

We can see the Estimated Number of Rows = 37339 and the Actual Number of Rows = 37339 . The numbers match. In

↧

Audit the SQL Server Schema&quest;

October 29, 2018, 11:38 am

≫ Next: Voting for SQLBits general sessions is now open

≪ Previous: SQL Server Multi Statement Table Value Function (MTVFS) Performance Difference B ...

We have a SQL Server 2008 Enterprise database with two different schemas, a locked one that we maintain and an open one that we allow outside development teams to add to and modify for their own needs. Usually this works out OK for us but one particular team likes to really muck it up and it is impacting everyone else. So 2 questions:

In hindsight I wish we had set up something robust from the outset but we did not, just the default install. It would be nice to be able to see what has been done to the schema so far, even if its as simple as 'User XYZ changed Procedure ABC on 07/12/2012 at 9:00 AM'. Is there anything built into SQL Server and enabled by default that tracks this that we might leverage, and if so where/how? As far as a long term solution goes, what would you recommend for this? I've been reading up on DDL triggers a bit and that seems like a promising option. If you've used this approach can you share a bit with how it worked and what you could do with it?

thank you

I've got a system that uses a DDL trigger for exactly this type of thing. It works well enough for my needs. It was originally developed on Sql Server 2005, and now lives on a Sql Server 2008R2 system. It's similar to the one described by the link in Aaron Bertrand's comment.

Create a table similar to this one.

CREATE TABLE [Audit].[SchemaLog]( [SchemaLogID] [int] IDENTITY(1,1) NOT NULL, [PostTimeUtc] [datetime] NOT NULL, [DatabaseUser] [nvarchar](128) NOT NULL, [Event] [nvarchar](128) NOT NULL, [Schema] [nvarchar](128) NULL, [Object] [nvarchar](128) NULL, [TSQL] [nvarchar](max) NOT NULL, [XmlEvent] [xml] NOT NULL, CONSTRAINT [PK_SchemaLog_1] PRIMARY KEY CLUSTERED ( [SchemaLogID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]

Make sure everyone has insert permissions on the table then create a ddl trigger similar to this.

CREATE TRIGGER [ddlDatabaseTriggerLog] ON DATABASE FOR DDL_DATABASE_LEVEL_EVENTS AS BEGIN SET NOCOUNT ON; DECLARE @data XML; DECLARE @schema sysname; DECLARE @object sysname; DECLARE @eventType sysname; SET @data = EVENTDATA(); SET @eventType = @data.value('(/EVENT_INSTANCE/EventType)[1]', 'sysname'); SET @schema = @data.value('(/EVENT_INSTANCE/SchemaName)[1]', 'sysname'); SET @object = @data.value('(/EVENT_INSTANCE/ObjectName)[1]', 'sysname') IF @object IS NOT NULL PRINT ' ' + @eventType + ' - ' + @schema + '.' + @object; ELSE PRINT ' ' + @eventType + ' - ' + @schema; IF @eventType IS NULL PRINT CONVERT(nvarchar(max), @data); INSERT [Audit].[SchemaLog] ( [PostTimeUtc] , [DatabaseUser] , [Event] , [Schema] , [Object] , [TSQL] , [XmlEvent] ) VALUES ( GETUTCDATE() , CONVERT(sysname, CURRENT_USER) , @eventType , CONVERT(sysname, @schema) , CONVERT(sysname, @object) , @data.value('(/EVENT_INSTANCE/TSQLCommand)[1]', 'nvarchar(max)') , @data ); END;

↧

Voting for SQLBits general sessions is now open

October 29, 2018, 11:36 am

≫ Next: Enhancements To Polybase In SQL Server 2019

≪ Previous: Audit the SQL Server Schema&quest;

Reading Time: 2 minutes

For those of you that still have to attend SQLBits I will do a quick recap. SQLBits is the largest SQL Server in Europe. Every year it’s hosted in a different location in the UK. Over the years I’ve had the pleasure of going a lot. I’ve learnt new things and met amazing people.

Voting for SQLBits general sessions is now open

Anyway, it was announced this week that voting for SQLBits

generalsessions is now open. That means that if you are registered on the SQLBits website you can vote what sessions you want to see at SQLBits here . This way you can have your say in what sessions this event holds.

You will need to scroll down a lot this year on the site as the list is impressive. And it can still grow due to last minute entries before the deadline on 26th October. There is a large choice of submissions from speakers all over the world. Plus the variety is amazing. To give you an idea of how much variety there is I’ll do my own quick overview of submissions for the various tracks.

This year the voting list is impressive. As of the time of writing this post there is already a large choice of submissions from speakers all over the world. To give you an idea of how amazing the variety is I’ll do a quick overview of submissions for the various tracks.

With its 128 submissions this is apparently the most popular track for the speakers. There’s a wide variety of BI submissions for this year’s event.including topics like Power BI and other Azure related offerings. In fact if you are keen on Power BI you are going to be spoilt for choice on which sessions to vote for.

DBA

For this track there are some very interesting submissions with a lot of variety. The focus on a lot of these 113 submissions is on Performance Tuning and Azure. This highlights the way the future of DBA’s seems to be moving.

Yours truly has submitted his session from his previous post ‘ Choose your own Database Adventure’ . It’d be an honour for me to present this atSQLBits. If you want to see the session there then feel free to vote for it.

Dev

Another track which has a large amount of variety in it. I make it 83 submissions so far at the time of writing this post. A lot of sessions about the internals of SQL Server here, as well as plenty about Azure offerings.

Career Development

Now this track has some very useful sessions in its 13 submissions. A lot of them relating to soft skills: presentation skills and being a consult. If you’re thinking about becoming a future speaker then it’s worth voting for one of these sessions.

Final word

Well that’s a quick review from me. I’d like to thank you for taking the time to read my personal views on the submissions. However these are my own views so take your time and vote for the sessions you want to see. At the end of the day your vote counts.

↧