Quantcast
Channel: CodeSection,代码区,SQL Server(mssql)数据库 技术分享 - CodeSec
Viewing all 3160 articles
Browse latest View live

Lots of Scaffolding T-SQL Tuesday #107 The Death March

$
0
0

Lots of Scaffolding T-SQL Tuesday #107 The Death March
It’s an interesting topic for this month’s T-SQL Tuesday, and while I think this isn’t necessarily a SQL Server project, because usually SQL Server isn’t the problem with building complex software, my death march involves SQL Server.

This month’s invitation is from Jeff Mlakar and I really am looking forward to reading some of the entries.

Adopting OOP

If you’ve ever worked on a greenfield OOP project, you’ve probably felt like there is a lot of work investment up front that builds a base for your application. For a long time bit, it feels like very little is getting done as you build classes and subclasses, and experiment with the structures that make sense. Your users and managers may complain wonder about what work is actually being completed in this stage.

Eventually this tips over and all of a sudden you have something to show for your efforts and your objects start becoming useful and visible to others.

This is my story.

Leaving FoxPro

Early in my career, I learned to program using FoxPro. This served me well and let me build a number of useful applications for various clients and employers. Eventually I was hired to manage a tiny small IT department that had to maintain and existing FoxPro inventory and order entry system and the internal infrastructure, as well as manage a rewrite of this application to a more powerful, useful system.

Prior to my being brought on, the company had hired a consultant to handle the rewrite, while I managed another employee who maintained desktops, servers, printers, and more. The consultant had decided to move to Visual FoxPro backed by SQL Server. Since I had experience with FoxPro, SQL Server, and infrastructure, I ended up managing and participating in both sides of this environment.

Visual FoxPro

I had more experience with SQL Server and the consultant had more experience with Visual FoxPro, so I handled setting up the database and discussing some of the data modeling choices with him. He made all the OOP and VFP decisions, with the estimation that this would be a 8-10 month project. After all, we had an existing system, most of the business rules were already encoded, and we weren’t looking to add much functionality.

Over the first 2-3 months, regular meetings showed me code being written, with some basic scaffolding of the objects we’d used and the various properties and methods that were required. We agreed on a object to relational mapping, and things seemed to be going well.

Disaster Strikes

The first issue that set us back was a disk crash early in month 3. The consultant had been backing up work to a zip file on his machine, but not the network. You can cry along with me guess what happened. His machine crashed, and we were slightly stuck. Fortunately, we discovered a copy of code that was weeks old, and managed to restart development.

At this point, we had no VCS in place and my boss wasn’t interested in buying VSS. I decided on a simple setup. We worked off file shares for all code, with coordination on which files we’d edit (we were across the hall from each other). We also scripted a simple method to zip up all code and copy it to a different folder for each day of the week. Rudimentary, but it gave us some control in the mid 90s.

Complaints Curiosity Rises

About 6 months in, my boss started to ask questions. Where were the prototypes? Where is something to test. We put him off, showing him some basic objects, and explaining how we were front loaded with setting up new classes to better handle our business rules. We were coming from a procedural system, so much of the code had to be rewritten to work in an OOP environment.

I was told that we needed to move, so I started joining in with coding. Up to this point, I’d just done some reviews, and now I needed to brush off some of my OOP skills from college as well as learn more about VFP, which was different from FoxPro.

Over the next few months, I started pushing us to make decisions and move, rather than debate whether we needed to set parent methods that would be shared by multiple subclasses. I argued with our consultant, insisting we move on, even if we had to store some code in both subclasses. We needed to get work done.

A Large System

This was a fairly complex inventory and order entry system, unique to our business, and somewhat large for two people. At 8 months, we still didn’t have a working prototype, though we had certain items built and could show to users.

We also had change orders coming in, with requests to alter some functionality that wasn’t as useful or relevant as it had been in the past. My ability to help rose and fell as other issues came up, so development was sporadic.

A New Server

Finally at about 12 months, we were close enough for testing that we decided to order a production database server. It was a nice, large Compaq. I think a PII, with 2GB of RAM and 5 or 6 internal SCSI drives. It came in multiple boxes, and we spent a day or two assembling pieces and installing the OS and SQL Server.

We built the database, ran a test migration that worked flawlessly failed miserably, but we managed to tweek things across a few days and get most data moved to create what would be our QA environment. We were feeling good as we walked from the server room to our desks. We walked back to our desktops and started the application, and checked on some data. Things worked, so time for QA.

Caching is amazing horrible

We let some management know that we were close and one of the department heads volunteered one of his order entry people to do some testing. This young lady came over and sat at our desk, trying to process a few orders that she’d entered the week before. She entered a customer and clicked search.

And watched. It took maybe 15s to load the information, which was waaaaaaayyyyyy slower than our Fox for DOX application. Still, she persisted. She entered a partial product and searched for inventory, and

the results appeared to her delight after about 1 minute and a half. She was less than impressed.

We were slightly horrified. How could this server run slower than our workstations, which were mostly 64MB of RAM. Why wasn’t our 2GB server flying compared to the DOS app?

It took a little time, some research, and lots of questions to realize that the amount of data we had was coming from disk, with SQL Server’s optimizer deciding how to search, how to compile a plan, and how to read from disk. This was a problem, with spinning disks not performing as optimally as we’d like. Things were better once we’d warmed the cache, but users were not thrilled with early tests.

Auto-Start Procedures

We continued working, across a few more months to finish the application to the point where users could switch. At this point, we were about 16 months in and it felt like we’d never get there, but we got far enough that we could migrate data and switch, changing a few procedures for functionality that would take longer.

My boss and the owner, who were business people, were less than thrilled with the length of this project, but we had improved a lot of features along the way, and they didn’t fire us because of the delays. We cut over, and while slow in the first hour, the system improved over the day and once users got used to some new ergonomics, they were pleased.

One of the things that saved us was the use of some

Review: Stellar Phoenix SQL Database Repair

$
0
0
Review: Stellar Phoenix SQL Database Repair

Of the several issues encountered with SQL Server ― SQL Server gets slower with time, generation of reports becomes tedious, SQL Server crash, performance issues and more ― the most troublesome for DBAs is the crash or failure of SQL Server. Reason being, the repercussion is a damaged or corrupt SQL Server database.

This is a huge setback as the SQL database cannot be accessed anymore, thereby restricting the access of the data stored in the SQL Server database as MDF (primary) and NDF (secondary) file. Consequently, it brings in the necessity of using a SQL recovery software to repair the damaged or corrupt SQL Server database (MDF and NDF) files easily and efficiently.

How does Stellar Repair for MS SQL Technician rescue a failed/crashed SQL Server

This is what happens when a SQL Server database fails or crashes, and how the Stellar software deals with those situations.

The SQL Server database gets damaged or corrupt with the failure of SQL Server. This may happendue to hardware issues, bugs in SQL server, OS malfunction, unexpected system shutdown, and virus attacks, etc.

Irrespective of the reason, the software repairs the SQL server database.

A failed or crashed SQL Server makes MDF and NDF file inaccessible

The SQL database repair software repairs the damaged or corrupt SQL database MDF file (and associated NDF file) successfully.

When it is used, the result is a robust SQL Server database.

The SQL database objects also become inaccessible in situations of a failed or crashed SQL Server

The Stellar software recovers the following SQL Server objects.

Tables, Triggers, and Views, Stored Procedure, Collations, Synonyms, Functions, Defaults and Default constraint Primary Keys, Foreign Keys, and Unique Keys Identity Clustered, and Non Clustered indexes Check constraints, User-defined data types, and Null/Not Null Predefined defaults, default values, and Rules ROW and PAGE compressed data Column Row GUID COL Property and Sp_addextended Property Others deleted records (optional), XML data types, file stream data types, and more.

Note: The software saves non-recovered views, queries, tables, stored procedures, etc. in a separate text file so that it is easy to identify the non-recovered objects.

Damaged SQL database displays different error codes & messages

The SQL database repair software fixes different SQL server database corruption errors such as:

Schema corruption, Consistency error, Header corruption, Error code 5171, 8942, 3414, etc. Not a primary database file, SQL Server Database in suspect mode, Clustered or non-clustered index corruption, And more. Findings: SQL database repair software

After the review of SQL repair software , I have comprehended the following about it.

Option to save SQL database in XML, HTML, and CSV

Not just the Live or new SQL Server database, the software helps to save the recovered objects in other file formats such as XML, CSV, and HTML.

Thus, the recovered data of SQL Server can be accessed on multiple platforms.

Preview of SQL Server database objects

Displays all recoverable items of SQL Server database in a tree-like structure before saving. This helps to select and preview the recoverable data easily.

The benefit of the preview feature is that you can verify whether the recovered data would be intact and in its original form.

Allows to search the recovered Objects

The software provides an option to search Objects with the use of ‘Find Items’.

The advantage here is that users need not search the objects manually ─ which would take more time ─ from the list displayed after the scan process.

Plus, users can search for intended items based on ‘Match whole word’ or ‘Match case’ criteria.

Option to recover selective SQL Database objects

This option enables users to save only the required SQL database objects.

From the list of the recovered database objects, users can select the required objects and save them.

Saves recovered Objects at the desired location

The software offers to save the recovered SQL Objects at the user-defined location in addition to the default location.

Moreover, it allows to save the non-recovered Objects such as queries, views, stored procedures, etc. in a text file.

How to repair SQL databased with Stellar Repair for MS SQL Technician?

I found that the software works in three simple steps:

Select the damaged or corrupt SQL database MDF file, Repair the corrupt or damaged SQL database MDF file, and Save the recovered data of SQL Server

You can also check the three simple steps in the main interface of the software that is displayed in the figure below:


Review: Stellar Phoenix SQL Database Repair

The software acts diligently when the ongoing SQL database repair process gets disrupted. It reconnects automatically and completes the repair process.

Next, the software automatically saves the scan information once the Scan process gets complete. The saved Scan information can be used to repair the database file at a later stage without the need to scan the MDF file again. This, thereby saves time.

Conclusion

Whenever DBAs need to repair a damaged or corrupt SQL Server database, they can use this handy and self-explanatory tool. With its advanced GUI ― a ribbon in the software interface with various options including customizable Menu items ― the software is easy to use. The automatic saving of Scan information that avoids the need to re-scan, simple three-step repair process, selective recovery for SQL objects, preview of recoverable objects, saving of objects at desired location, option to save in multiple formats, recovery of SQL objects, etc. ─ all these make the software effective to deal with SQL database corruption. These multiple functionalities owe to the advanced algorithms of the software. It is comparatively better than its previous versions as well as many other SQL repair software.

Nonetheless, there is a scope to increase the efficiency of the software so that it gives better results. This will make it perfect for DBAs handling SQL Server. Based on the features, benefits, and ease-of-use, I can rate the software 8 on 10.

References:

https://www.cybrary.it/0p3n/3-best-sql-database-repair-tool-database-administrators-dbas/

Simple way to Import XML Data into SQL Server with T-SQL

$
0
0
Problem

XML is a data format used to share data in a form that can be easily used and shared. There is often the need import XML files into SQL Server which can be done several ways and in this tip we will look at a simple way to do this using just T-SQL commands.

Solution

There many possible ways to perform this type of import and in this tip we will show how this can be done using T-SQL and OPENROWSET to read the XML data and load into a SQL Server table. In order for you to understand it better, let’s walk through an example.

Step 1 Create table to store imported data

Let’s create a simple table thatll store the data of our customers.

USE mssqltips_db
GO
CREATE TABLE [CUSTOMERS_TABLE](
[ID] [int] IDENTITY(1,1) NOT NULL,
[DOCUMENT] [varchar](20) NOT NULL,
[NAME] [varchar](50) NOT NULL,
[ADDRESS] [varchar](50) NOT NULL,
[PROFESSION] [varchar](50) NOT NULL,
CONSTRAINT [CUSTOMERS_PK] PRIMARY KEY ([Id])
)
GO Step 2 - Create Sample XML File

Below is sample XML data. You can use this as is or modify for your own tests. I copied this data and stored in a file named MSSQLTIPS_XML.xml .

<?xml version="1.0" encoding="utf-8"?>
<Customers>
<Customer>
<Document>000 000 000</Document>
<Name>Mary Angel</Name>
<Address>Your City, YC 1212</Address>
<Profession>Systems Analyst</Profession>
</Customer>
<Customer>
<Document>000 000 001</Document>
<Name>John Lenon</Name>
<Address>Your City, YC 1212</Address>
<Profession>Driver</Profession>
</Customer>
<Customer>
<Document>000 000 002</Document>
<Name>Alice Freeman</Name>
<Address>Your City, YC 1212</Address>
<Profession>Architect</Profession>
</Customer>
<Customer>
<Document>000 000 003</Document>
<Name>George Sands</Name>
<Address>Your City, YC 1212</Address>
<Profession>Doctor</Profession>
</Customer>
<Customer>
<Document>000 000 004</Document>
<Name>Mark Oliver</Name>
<Address>Your City, YC 1212</Address>
<Profession>Writer</Profession>
</Customer>
</Customers> Step 3 Importing the XML data file into a SQL Server Table

Now all we need is to make SQL Server read the XML file and import the data via the OPENROWSET function . This function is native to T-SQL and allows us to read data from many different file types through the BULK import feature, which allows the import from lots of file types, like XML.

Here is the code to read the XML file and to INSERT the data into a table.

INSERT INTO CUSTOMERS_TABLE (DOCUMENT, NAME, ADDRESS, PROFESSION)
SELECT
MY_XML.Customer.query('Document').value('.', 'VARCHAR(20)'),
MY_XML.Customer.query('Name').value('.', 'VARCHAR(50)'),
MY_XML.Customer.query('Address').value('.', 'VARCHAR(50)'),
MY_XML.Customer.query('Profession').value('.', 'VARCHAR(50)')
FROM (SELECT CAST(MY_XML AS xml)
FROM OPENROWSET(BULK 'C:\temp\MSSQLTIPS_XML.xml', SINGLE_BLOB) AS T(MY_XML)) AS T(MY_XML)
CROSS APPLY MY_XML.nodes('Customers/Customer') AS MY_XML (Customer); The first thing we are doing is a simple INSERT into our table CUSTOMERS_TABLE. The columns in the SELECT are pulled from the alias we created named MY_XML and we are querying each element of the Customer node. The FROM clause is derived by using the OPENROWSET operation using the BULK option and the SINGLE_BLOB option to have the data returned from the XML file into a single column and row. The function nodes() along with CROSS APPLY allows navigation through the XML element’s in order to get all of Customer objects properly encapsulated. Step 4 - Check the Imported XML Data

After the insert, you can query the table to check the results:


Simple way to Import XML Data into SQL Server with T-SQL
Next Steps Check out some other related tips: Using OPENROWSET to read large files into SQL Server Importing and Processing data from XML files into SQL Server tables Importing XML documents using SQL Server Integration Services SQL Server XML Bulk Loading Example

Last Update: 2018-10-10


Simple way to Import XML Data into SQL Server with T-SQL
Simple way to Import XML Data into SQL Server with T-SQL
About the author
Simple way to Import XML Data into SQL Server with T-SQL
Diogo Souza has been passionate about clean code, data manipulation, software design and development for almost ten years. View all my tips

Related Resources

More Database Developer Tips...

SQL Server实际执行计划COST"欺骗"案例

$
0
0

有个系统,昨天Support人员发布了相关升级脚本后,今天发现系统中有个功能不能正常使用了,直接报超时了(Timeout expired)错误。定位到相关相关存储过程后,然后在优化分析的过程中,又遇到了执行计划COST “ 欺骗 ” 我们的这种情况,其实在我这篇博客 ” SQL SERVER中用户定义标量函数(scalar user defined function)的性能问题 “ 有提及这个问题,但是很多时候,我们优化SQL的时候,会习惯去查看实际执行计划COST所占的开销比例,从而判断性能开销最大SQL语句。当然大多数时候,这也是正确的。我们先来看看这个案例吧,如下所示,这个存储过程的部分实际执行计划如下(实际执行计划实在太长,无法全部展现):


SQL Server实际执行计划COST

我们将实际执行计划保存为sqlplan类型的文件(Execution Plan Files),然后用Plan Explorer展现出来,如下所示,Est Cost% 和 Est CPU Cost% 显示第一个SQL语句是整个存储过程里面开销消耗最大的SQL语句。然后去测试验证,发现这个SQL不是开销最大的SQL,也就是说执行计划欺骗了我们,实际上,下面Est Cost %为13.3的SQL才是性能开销最大的SQL


SQL Server实际执行计划COST
SQL Server实际执行计划COST

从实际执行计划中找到elapsed time最长的SQL 这个SQL才是真正影响性能的SQL语句,然后查看这个SQL,发现其查询条件(WHERE)使用了自定义标量函数(因为修改业务逻辑,查询条件添加了自定义函数过滤数据),然而这个从实际执行计划去看也是看不出问题的,因为这个自定义标量函数哪怕调用了几十万次,它的开销代价在实际执行计划中并没有呈现出来。具体原因截取 “ SQL SERVER中用户定义标量函数(scalar user defined function)的性能问题 “ 中的一段翻译如下:

翻译:

但是需要再次注意,执行计划在欺骗你,首先,它意味着只调用了UDF一次,其实不是这样。其次,从成本(Cost)来看,你可能会认为0%是向下舍入影响,因为单次执行函数的开销如此之小,以至于执行100,000次的成本也很小。但如果你检查执行计划的功能迭代器的属性,你会发现所有的操作代价和子树代价实际的估计为0,这是一个最糟糕的谎言。因为它可能不只是为了欺骗我们,而是SQL SERVER为了欺骗它自己。实际上是查询优化器认为调用函数的成本为0,因此它生成的所有执行计划都是基于调用UDF是免费的。其结果是即使调用标量UDF的代价非常昂贵,查询优化器也不会考虑优化它。

其实又单独总结一下这个问题,是因为人们或多或少受习惯性思维的影响,哪怕我之前多次遇到这种案例,但是在调优过程中,我还是会习惯性按照实际执行计划的COST比例去定位性能开销大的SQL语句,直到我通过验证推翻了这个判断,然后通过elapsed time最长的SQL语句才定位到开销最大的SQL。所以在调优、优化过程中,一定要多方位着手,反复推敲验证,不能被经验主义牵着鼻子走!

Calculating Effective Rights For A SQL Server Principal

$
0
0

In my blog Calculating a Security Principal’s Effective Rights. I built a view, named Utilty.EffectiveSecurity that you could query to fetch a security principal’s rights to objects in a database. In that blog I tested the code and showed how it works. Now I have taken this to the extreme and expanded the view to include all of the user’s security by finding all of their rights to all of the things that the get rights for.

The list of possible permissions you can fetch can be retrieved from:

SELECT DISTINCT class_desc FROM fn_builtin_permissions(default) ORDER BY class_desc;

This returns the following 26 types of things that can have permissions assigned and returned by the sys.fn_my_permissions function:

Table Variables in SQL Server

$
0
0

Table variables are another types of temporary objects to store transient data. Please refer Temporary Table objects for more details about Temporary Tables in SQL Server.

Differences between Table Variables and Temporary tables
Table Variables in SQL Server
When do you use Temporary Tables over Table Variables, vice versa

So, now we should be good enough to take a decision on when do we need to use table variable and Temporary tables. I would like to reiterate statistics as one of the most important factor. Table variable does not have statistics where as temporary table has statistics maintained. As SQL Server optimizer is based on cost- based optimization, statistics are very important factor for Cost based approach to identify the best plan for your query. If your transient data is more than 100, I would suggest to use Temporary tables(with right indexes) over table variables to make use of the statistics to help SQL Server to identify the best plan for us.

DECLARE @TabVariableTesting TABLE (
id INT PRIMARY KEY,
ScrambledData Varchar(7000)
)
INSERT INTO @TabVariableTesting
SELECT number, Replicate('SQLZEalot',100)
FROM master..spt_values WHERE NAME IS NULL
SELECT * FROM @TabVariableTesting WHERE id > 75
CREATE TABLE #TempTableTesting (
id INT PRIMARY KEY,
somecolumn Varchar(7000)
)
INSERT INTO #TempTableTesting
SELECT number, Replicate('SQLZEalot',100)
FROM master..spt_values WHERE NAME IS NULL
SELECT * FROM #TempTableTesting WHERE id > 75
DROP TABLE #TempTableTesting
Table Variables in SQL Server

In other words, if you have very less data, then I would prefer to use table variables over temporary tables. If you need to store data in a user defined function, table variables are the way for you currently. The choice is not hard rule one, but choose the best for your needs/requirements.

Hope you enjoyed this post, please share your feedback and thoughts.

Send a mail from the SQL server with an attachment &lpar;attachment to for ...

$
0
0

I have to send the report to management every morning . For that i have create schedule in Sql Server agent with following code.

EXEC msdb.dbo.sp_send_dbmail @recipients='<a href="/cdn-cgi/l/email-protection" data-cfemail="4c2e253f242239622e242d22282d3e250c2b212d2520622f2321">[email protected]</a>', @body='Dear sir, <Br>Please find the attachment. <P>Regards<Br> <Br>IT Department', @subject ='TOURISM-GL( Auto By System) ', @body_format = 'html', @profile_name = 'emailserver', @file_attachments='C:\PUMORI_NEW\**001_TOURISMGL_(14072014)_(SOD).TXT**'

Now the problem is, the file which i need to send as attachment will generate every day with new name. File name will be in format

001_TOURISMGL_(14072014)_(SOD).TXT

In above file name only the date value will be change. The date will be in ddmmyyyy format.

Now Kindly suggest me how can i achieve this. How send mail automatically with attachment.

Could you try,

declare @pathname varchar(200) = 'C:\PUMORI_NEW\**001_TOURISMGL_(,' + REPLACE(CONVERT(VARCHAR(10),GETDATE(),101),'/', '') + ', )_(SOD).TXT**'; EXEC msdb.dbo.sp_send_dbmail @recipients='<a href="/cdn-cgi/l/email-protection" data-cfemail="22404b514a4c570c404a434c4643504b62454f434b4e0c414d4f">[email protected]</a>', @body='Dear sir, <Br>Please find the attachment. <P>Regards<Br> <Br>IT Department', @subject ='TOURISM-GL( Auto By System) ', @body_format = 'html', @profile_name = 'emailserver', @<a href="/cdn-cgi/l/email-protection" data-cfemail="ff9996939aa09e8b8b9e9c97929a918b8cc2bf8f9e8b97919e929a">[email protected]</a>

Resource Governor MAXDOP Setting Can Lead to Poor Plan Choices

$
0
0

Resource Governor can be used to enforce a hard cap on query MAXDOP , unlike the sp_configure setting. However, query plan compilation does not take such a MAXDOP limit into account. As a result, limiting MAXDOP through Resource Governor can lead to unexpected degradations in performance due to suboptimal query plan choices.

Create Your Tables

We start with the not often seen here three table demo. I’d rather not explain how I came up with this sample data, so I’m not going to. I did my testing on a server with max server memory set to 10000 MB. The following tables take about half a minute to create and populate and only take up about 1.5 GB of space:

DROP TABLE IF EXISTS dbo.SMALL; CREATE TABLE dbo.SMALL (ID_U NUMERIC(18, 0)); INSERT INTO dbo.SMALL WITH (TABLOCK) SELECT TOP (100) 5 * ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM master..spt_values t1 CROSS JOIN master..spt_values t2 OPTION (MAXDOP 1); DROP TABLE IF EXISTS dbo.MEDIUM; CREATE TABLE dbo.MEDIUM (ID_A NUMERIC(18, 0)); INSERT INTO dbo.MEDIUM WITH (TABLOCK) SELECT TOP (600000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM master..spt_values t1 CROSS JOIN master..spt_values t2 CROSS JOIN master..spt_values t3 OPTION (MAXDOP 1); DROP TABLE IF EXISTS dbo.LARGE; CREATE TABLE dbo.LARGE ( ID_A NUMERIC(18, 0), ID_U NUMERIC(18, 0), FILLER VARCHAR(100) ); INSERT INTO dbo.LARGE WITH (TABLOCK) SELECT 2 * ( RN / 4), RN % 500, REPLICATE('Z', 100) FROM ( SELECT TOP (8000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN FROM master..spt_values t1 CROSS JOIN master..spt_values t2 CROSS JOIN master..spt_values t3 ) q OPTION (MAXDOP 1) CREATE INDEX IA ON LARGE (ID_A); CREATE INDEX IU ON LARGE (ID_U); The Long-Awaited Demo

I thought up the theory behind this demo on a car ride back from a SQL Saturday, but wasn’t able to immediately figure out a way to get the query plan that I wanted. I ended up finally seeing it in a totally different context and am now happy to share it with you. Consider the following query:

SELECT LARGE.ID_U FROM dbo.SMALL INNER JOIN dbo.LARGE ON SMALL.ID_U = LARGE.ID_U INNER JOIN dbo.MEDIUM ON LARGE.ID_A = MEDIUM.ID_A OPTION (MAXDOP 1);

The MAXDOP 1 hints results in a serial plan with two hash joins:


Resource Governor MAXDOP Setting Can Lead to Poor Plan Choices

This is a perfectly reasonable plan given the size and structure of the tables. There are no bitmap filters because row mode bitmap filters are only supported for parallel plans. Batch mode is not considered for this query because I’m testing on SQL Server 2017 and there isn’t a columnstore index on any of the tables referenced in the query. On my machine a single query execution uses 2422 of CPU time and 2431 ms of elapsed time.

A parallel plan at MAXDOP 4 is able to run more quickly but with a much higher CPU time. A single execution of the MAXDOP 4 query uses 5875 ms of CPU time and 1617 ms of elapsed time. There are multiple bitmap filters present. I zoomed in on the most interesting part of the plan because I haven’t figured out how images work with WordPress yet:


Resource Governor MAXDOP Setting Can Lead to Poor Plan Choices

Instead of doing a scan of the LARGE table, SQL Server instead chooses an index intersection plan. The cost of the additional hash join is reduced by multiple bitmap filters. There are only 2648396 and 891852 rows processed on the build and probe side instead of 8 million for each side, which is a significant gain.

Worse Than A Teen Running for Governor

Some end users really can’t be trusted with the power to run parallel plans. I thought about making a joke about an “erik” end user but I would never subject my readers to the same joke twice. After enforcing a MAXDOP of 1 at the Resource Governor level, you will probably not be shocked to learn that the query with the explicit MAXDOP 1 hint gets the same query plan as before and runs with the same amount of CPU and elapsed time.

If you skipped or forget the opening paragraph, you may be surprised to learn that the query with a MAXDOP 4 hint also gets the same query plan as before. The actual execution plan even has the parallel racing arrows. However, the query cannot execute in parallel. The parallelism and bitmap operators are skipped by the query processor and all of the rows are processed on one thread:


Resource Governor MAXDOP Setting Can Lead to Poor Plan Choices

I uploaded the query plan here if you want to look at it. This type of scenario can happen even without Resource Governor. For example, a compiled parallel query may be downgraded all the way to MAXDOP 1 if it can’t get enough parallel threads.

The query performs significantly worse than before, which hopefully is not a surprise. A single execution took 12860 ms of CPU time and 13078 ms of elapsed time. Nearly all of the query’s time is spent on the hash join for the index intersection, with a tempdb spill and the processing of additional rows both playing a role. The tempdb spill occurs because SQL Server expected the build side of the hash join to be reduced to 1213170 rows. The bitmap filtering does not occur so 8 million rows were sent to the build side instead.

In this case, adding a MAXDOP 1 hint to the query will improve performance by about 5X. Larger differences in run times can be easily seen on servers with more memory than my desktop.

Final Thoughts

If you’re using using Resource Governor to limit MAXDOP to 1, consider adding explicit MAXDOP 1 hints at the query level if you truly need the best possible performance. The MAXDOP 1 hint may at first appear to be redundant, but it gives the query optimizer additional information which can result in totally different, and sometimes significantly more efficient, query plans. I expect that this problem could be avoided if query plan caching worked on a Resource Governor workload group level. Perhaps that is one of those ideas that sounds simple on paper but would be difficult for Microsoft to implement. Batch mode for row store can somewhat mitigate this problem because batch mode bitmap filters operate even under MAXDOP 1, but you can still get classic row mode bitmaps even on SQL Server 2019. Thanks for reading!


Discovering New System Objects and Functions in SQL Server 2019

$
0
0
Problem

SQL Server 2019 CTP was recently released. There is some information posted about new features such as this document by Microsoft What is new in SQL Server 2019 and this article on mssqlTips What's New in the First Public CTP of SQL Server 2019 . What other new features are there that can be useful to a DBA's daily operations?

Solution

As a SQL Server DBA, we are always excited about a new release of SQL Server. I once wrote a tip Identify System Object Differences Between SQL Server Versions and by using the script in that tip, I have explored the new objects and new changes for existing objects, such as new columns for views/tables or new parameters for functions/stored procedures.

Here are the detailed steps and some interesting findings in SQL Server 2019.

Step 1 Environment Setup

To find what’s new in SQL Server 2019, we need two versions of SQL Server. I will use SQL Server 2017 and SQL Server 2019. To follow along, you should have these two SQL Server instances installed.

Step 2 Collect Data Follow the detailed instructions in this previous tip, Identify System Object Differences Between SQL Server Versions , once this is done, we should have three inventory tables populated. I put all these three tables in [TempDB] for convenience. The three inventory tables will be shown in the analysis scripts below. Step 3 Analyze Data

Once step 2 is done, we can start to do some interesting exploration. We first take a look at the system object changes in SQL Server 2019 since SQL Server 2017.

-- 1. find new system objects
declare @tgt_ver int = 15; -- sql server 2019
declare @src_ver int = 14; -- sql server 2017
select [schema], [object], [type] from tempdb.dbo.allobject
where [version]<a href="/cdn-cgi/l/email-protection" data-cfemail="88b5c8fceffc">[email protected]</a>_ver
except
select [schema], [object], [type] from tempdb.dbo.allobject
where [version]<a href="/cdn-cgi/l/email-protection" data-cfemail="fbc6bb888998">[email protected]</a>_ver
go

We get 53 new objects, the last few are shown below:


Discovering New System Objects and Functions in SQL Server 2019
-- 2. find dropped system objects
declare @tgt_ver int = 15; -- sql server 2019
declare @src_ver int = 14; -- sql server 2017
select [schema], [object], [type] from tempdb.dbo.allobject
where [version]<a href="/cdn-cgi/l/email-protection" data-cfemail="a994e9dadbca">[email protected]</a>_ver
except
select [schema], [object], [type] from tempdb.dbo.allobject
where [version]<a href="/cdn-cgi/l/email-protection" data-cfemail="221f62564556">[email protected]</a>_ver
go

The result is two extended stored procedures are dropped:


Discovering New System Objects and Functions in SQL Server 2019

Now we take a look at the new columns added to system tables and views:

--3. new columns to sys views/tables
declare @tgt_ver int = 15; -- sql server 2019
declare @src_ver int = 14; -- sql server 2017
; with c as (
select distinct o.id, o.[schema], o.[object], o.[type], o.version--, p2.[object_type]
from dbo.AllObject o
inner join dbo.AllObject o2
on o.[schema]=o2.[schema]
and o.[object]=o2.[object]
and o.[version]<a href="/cdn-cgi/l/email-protection" data-cfemail="2e136e5a495a">[email protected]</a>_ver
and o2.[version]<a href="/cdn-cgi/l/email-protection" data-cfemail="2914695a5b4a">[email protected]</a>_ver
), w as (
select [Object]=ac.[schema]+'.'+ac.[Object]
, ac.[column], ac.column_type, ac.max_length, ac.[version]--, c.[version]
from dbo.AllColumn ac
inner join c
on c.[object]=ac.[object] and c.[schema]=ac.[schema] and c.[version]=ac.[version]
left join dbo.AllColumn ac2
on ac2.[schema]=ac.[schema] and ac2.object = ac.object
and ac2.[column]=ac.[column]
and ac2.[version]<a href="/cdn-cgi/l/email-protection" data-cfemail="526f12212031">[email protected]</a>_ver
where ac2.[column] is null)
select [object], [column], [column_type], max_length
from w
order by 1

We see 75 changes; a sample of the changes is listed as below:


Discovering New System Objects and Functions in SQL Server 2019

Now we will check the parameter changes to functions and stored procedures:

--4. find new parameters added to SPs/functions
declare @tgt_ver int = 15;
declare @src_ver int = 14;
; with c as (
select distinct p1.id, p1.[schema], p1.[object], p1.[object_type]--, p2.[object_type]
from dbo.AllParam p1
inner join dbo.AllParam p2
on p1.[schema]=p2.[schema]
and p1.[object]=p2.[object]
and p1.[version]<a href="/cdn-cgi/l/email-protection" data-cfemail="0835487c6f7c">[email protected]</a>_ver -- sql server 2017
and p2.[version]<a href="/cdn-cgi/l/email-protection" data-cfemail="3c017c4f4e5f">[email protected]</a>_ver -- sql server 2012
)
select [Object]=p1.[schema]+'.'+p1.[Object]
, p1.[param], p1.param_type, p1.max_length,p1.is_output, p1.[version]
from dbo.AllParam p1
inner join c
on c.id=p1.id
left join dbo.AllParam p2
on p2.[schema]=p1.[schema] and p2.object = p1.object
and p2.[param]=p1.[param]
and p2.[version]<a href="/cdn-cgi/l/email-protection" data-cfemail="1d205d6e6f7e">[email protected]</a>_ver
where p2.[param] is null;
go

We can see the following changes:


Discovering New System Objects and Functions in SQL Server 2019
Step 4 Test new objects

In the new stored procedure / functions, I am pretty interested in the 3 new extended stored procedures:

xp_copy_file
xp_copy_files
xp_delete_files

Before SQL Server 2019, for all copy and delete operations, we needed to rely on xp_cmdshell with embedded copy and delete commands, but with these three new SPs, we can do our work in a more T-SQL native way.

The basic syntax is:

-- NO wildcard allowed
exec master.sys.xp_copy_file 'c:\test\a.txt' -- source
, 'c:\temp\a1.txt' destination
-- Wildcard allowed
exec master.sys.xp_copy_files 'c:\test\a*.txt' -- source
, 'c:\temp\' destination
-- Wildcard allowed
exec master.sys.xp_delete_files 'c:\test\a*.txt' -- source

There is another interesting new table valued function, sys.dm_db_page_info, it has the following syntax

Sys.dm_db_page_info(<db_id>, <file_id>, <page num>, 'option') where [option] can be of ‘limited’ or ‘detailed’, and <page num> starts from 0.

This view will surely be an important tool for internal storage troubleshooting. A quick example is like the following:

select * from sys.dm_db_page_info(db_id('master'), 1, 1, 'detailed')

I get the following:


Discovering New System Objects and Functions in SQL Server 2019
In theory, we can loop through each page of a database file via this view and then do analysis. For example, after an index rebuild, check how IAM s

Which is faster? IN (list) or IN (SELECT FROM Temp)

$
0
0

If you’ve done much with IN (list) then you’ve realized that it basically translates out to

col=val1 OR col=val2 OR ....

You’ve probably also realized that you can throw all the values into a temp table and do this

SELECT *
FROM tablename
WHERE col IN (SELECT col FROM #temp

But which is faster? Well, let’s find out. First I’m going to create a table of 1,000,000 values to pull against. This way I will hopefully get some useful times and not one that takes 10ms and the other 9ms.

tl;dr;It was pretty inconclusive even at large numbers of values.

-- Set up the lookup table
USE Test;
GO
WITH x AS (
SELECT TOP (100)
CAST(ROW_NUMBER() OVER (ORDER BY val.[name]) AS INT) AS x
FROM [master]..spt_values val),
Nums AS (
SELECT
CAST(ROW_NUMBER() OVER (ORDER BY x.x) AS INT) AS Num
FROM x
CROSS JOIN x y
CROSS JOIN x z)
SELECT
REPLICATE('a',ABS(CHECKSUM(NEWID()) % 1000)) AS col1,
REPLICATE('a',ABS(CHECKSUM(NEWID()) % 1000)) AS col2,
REPLICATE('a',ABS(CHECKSUM(NEWID()) % 1000)) AS col3,
REPLICATE('a',ABS(CHECKSUM(NEWID()) % 1000)) AS col4,
DATEADD(minute,ABS(CHECKSUM(NEWID()) % 10000),'1/1/2000') AS DateCol
INTO ListTable
FROM Nums;
GO

I’m going to do some tests of various sizes. The first one will be a small test of just 10 values.

Small test (10 values) -- Get a small list of values to look up
SELECT TOP 10 DateCol INTO #ListTemp
FROM ListTable
ORDER BY newid();
DECLARE @ListVar nvarchar(1000)
SELECT @ListVar =
STUFF((SELECT ', ' + QUOTENAME(CONVERT(varchar(30), DateCol, 121),'''')
FROM #ListTemp
FOR XML PATH(''),TYPE).value('.','VARCHAR(MAX)')
, 1, 2, '');
DECLARE @sql varchar(max)
SET @sql =
'SELECT * FROM ListTable
WHERE DateCol IN ('+@ListVar+');'
PRINT @sql;

Now using the query printed out:

SET STATISTICS TIME, IO ON
SELECT * FROM ListTable
WHERE DateCol IN ('2000-01-02 01:28:00.000', '2000-01-06 07:05:00.000', '2000-01-05 20:24:00.000', '2000-01-02 18:15:00.000', '2000-01-02 08:12:00.000', '2000-01-07 03:48:00.000', '2000-01-07 18:07:00.000', '2000-01-06 03:31:00.000', '2000-01-03 19:55:00.000', '2000-01-04 22:13:00.000');
SELECT * FROM ListTable
WHERE DateCol IN (SELECT DateCol FROM #ListTemp);

I ran this twice and discarded the first run because it was loading the data into memory. Yes, I realize that could be useful information, but I’m skipping that part of it for today.

SQL Server parse and compile time:
CPU time = 12 ms, elapsed time = 12 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(1049 row(s) affected)
Table 'ListTable'. Scan count 3, logical reads 289125, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 844 ms, elapsed time = 984 ms.
(1049 row(s) affected)
Table '#ListTemp___________________________________________________________________________________________________________000000000004'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ListTable'. Scan count 3, logical reads 289125, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 484 ms, elapsed time = 472 ms.

Ok, so the logical reads are about the same. More from the IN (SELECT) actually, but only by a fraction because of the read on the temp table and some work tables. The time, however, was almost exactly half using the temp table. Now, this was without any indexes. So now I’m going to run the same queries again but this time having put a non-clustered index on ListTable.

CREATE INDEX ix_ListTable ON ListTable(DateCol);

This is a pretty simple index, and I don’t have a clustered index, nor is this a covering index but honestly, it’s a SELECT * so a covering index isn’t likely anyway.

This time the results are:

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 3 ms.
(1049 row(s) affected)
Table 'ListTable'. Scan count 10, logical reads 1091, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 264 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.
(1049 row(s) affected)
Table 'ListTable'. Scan count 10, logical reads 1081, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ListTemp___________________________________________________________________________________________________________000000000004'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 208 ms.

Oddly enough slightly less reads for the IN (SELECT) this time and it’s only slightly faster. I’m also tried adding an index on the temp table but with no significant change.

Summary (10 values)

With a small test (only 10 values) the IN (SELECT) was faster, with or without an index.

Large test (1000 values)

I’m going to use the exact same process as above but I’m not going to actually post the query because I feel like it will be too long for a blog post. Here is the code to generate the query though if you want to play along. I did have to make a few changes. The TOP became a TOP (1000) instead of a TOP (10) , the varchar(1000) became a varchar(max) and the PRINT became a SELECT because you get more information that way.

SELECT TOP 1000 DateCol INTO #ListTemp
FROM ListTable
ORDER BY newid();
DECLARE @ListVar varchar(MAX)
SELECT @ListVar =
STUFF((SELECT ', ' + QUOTENAME(CONVERT(varchar(30), DateCol, 121),'''')
FROM #ListTemp
FOR XML PATH(''),TYPE).value('.','VARCHAR(MAX)')
, 1, 2, '');
DECLARE @sql varchar(max)
SET @sql =
'SELECT * FROM ListTable
WHERE DateCol IN ('+@ListVar+');'
SELECT @sql;

Without the index: (I’m skipping IO this time because it didn’t seem to tell us much last time)

(96517 row(s) affected)
SQL Server Execution Times:
CPU time = 1704 ms, elapsed time = 5472 ms.
(96517 row(s) affected)
SQL Server Execution Times:
CPU time = 1250 ms, elapsed time = 6031 ms.

Hmm, faster for the IN (list) this time. Let’s try it with an index on ListTable.

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(95297 row(s) affected)
SQL Server Execution Times:
CPU time = 1375 ms, elapsed time = 5047 ms.
(95297 row(s) affected)
SQL Server Execution Times:
CPU time = 1250 ms, elapsed time = 5787 ms.

Still a bit faster for the IN (list). And again, I tried with an index on the temp table with no major difference.

Summary (1000 values)

This time the IN (list) was faster with and without indexes. By less than a second though.

Summary (10000 values)

I also did a test with 10,000 values. In this case, the IN (list) was still faster without an index, but once I put an index on the table the IN (SELECT) became faster, and with an index on both the table and the temp table it was a bit faster.

Over all summary

My results were really pretty inconclusive. I was really surprised since I truly expected the IN (SELECT) to be faster. Particularly with an index. But in the end, the timing wasn’t significantly different (a few seconds here or there) and I can’t be certain if that’s not something I’ve done wrong. So for most cases, I’m just going to use the version more convenient. For cases where even a fraction of a second matters (100’s of runs a second anyone?) I’m going to have to test to see which is working fastest in that case.

Interview Drew Furgiuele

$
0
0

I decided to start a series of blogs where I interview key people in the SQL Server community. Instead of me asking technical questions, I plan on asking about their outlook on the future, books they read (non-fiction and/or technical), and their overall thoughts on where technology (mainly SQL Server) is headed. You can find more interviews here .

Next up: Drew Furgiuele ( b | t ):


Interview   Drew Furgiuele
Drew Furgiuele

Mohammad: Do you think people who dismiss the cloud as a “fad” or just don’t take it serious enough to learn about it (i.e. Azure, AWS, etc), will be in a tough spot to find a job 5 years from now?

Drew: I think we’re well enough into the era of cloud computing that everyone should at least know not just what a cloud provider like Azure or AWS is, but also what different services they offer. It’s not enough anymore to say “oh yeah, I’ve worked with the cloud.” I think most people who want to explore this space should be at least to say something like “I’ve stood up a database as a service in Azure” or “I’ve built a VM in EC2.” And if you can’t, there’s still plenty of time to start learning. Oh and by the way, more and more services are coming to cloud providers each and every day. I don’t think that NOT knowing these things puts any kind of expiration date on your current job, but if you’re looking for new opportunities, you might start to find yourself at a disadvantage.

Mohammad: Do you ever see the traditional SQL Server DBA role being replaced/eliminated?

Drew: Nah, I don’t think DBA jobs are going away… but I *do* think the DBA roles (as well as future DBA job descriptions) are changing. It’s not enough to know just how to take backups and apply indexes anymore. I think the DBA role needs things like automation experience (like PowerShell, or bash), and some exposure to DevOps. You can’t discount the need to learn new platforms either, like linux. Suddenly, it’s not enough to know just how to restart a SQL Server instance anymore; you need to know how to restart a service on windows AND Linux. We’re also in new territory with containers, too. It’s and exciting time for DBAs… if you’re not against setting aside time to learn .

Mohammad:What are you most proud of doing/accomplishing for the SQL Server community so far in your career?

Drew: I know I have a lot to be thankful for and proud of but honestly, I think I was at my best when I was working on the HASSP project . It was silly, stressful, and fun all at once and I got to work with some pretty amazing people on it. We’re not done, either, so if you want to catch up on what we’ve done so far that link will take you to the first post announcing the project, and I’ve got a running diary of blog posts detailing the journey (so far) .

Mohammad: What non-technical/non-fiction book/s would you recommend? If you only read technical books…what do you recommend?

Drew: I like a good technical book as much as the next person, but I really like historical, non-fiction books the most. I’ve recently finished a couple really good ones. First, it was “ The Hunter Killers ” by Dan Hampton which tells the story of the Wild Weasel program in Vietnam, and the nerves of steel that those pilots and radar operators had to possess to basically be missile bait and put their lives on the line for other air crews. It’s got a nice mix of anecdotal references and historical storytelling to make it a fun read about a difficult time to be an airman. I also just finished “ The Taking of K-129 ” by Josh Dean which talks about one of my favorite stories ever: Project Azorian, or the CIA’s attempt to steal a long-thought-lost Russian submarine from an impossible depth. It’s equal parts spy thriller and technical manual that talks about just of how crazy of an idea it was, to the people who helped solve incredible technical problems using then-unproven technology. It’s harrowing and unbelievable, but they really did it (and just how successful were they? Or do we STILL not know the whole story? Check it out!

Mohammad: For someone who’s career focus has been on one aspect of SQL Server (i.e. Database Engine), do you think it would be wise for them to become a “jack of all trades” by starting to learn, SSRS/IS/Azure, etc. or remain focused on their area of expertise? In another words, which would you say is more valuable? mile wide / inch deep or inch wide / mile deep?

Drew:I think the answer lies somewhere in the middle. I know that for me, my career really took off once I started exploring automation. Just saying “I want to learn how to do ‘X’ with PowerShell” has opened a lot of avenues for me. For instance, my day job is “Senior DBA” but I recently joined a project team focused on a big data implementation, and one of the first things I’m doing for that team? Automation! I get to do something new and exciting, all while learning about big data pipelines and specifically, how to set up automation around getting data to a distributed system, training models, implementing the models, and then ultimately processing and consuming the data. I did have to step away from some of the day-to-day DBA things, but in the end I’m getting to apply skills I already have and learn a bunch of new things in the process, like Azure Databricks and python. I think this just a really long way of saying that I think the days of just knowing SQL Server isn’t enough, and you should absolutely diversify your skills portfolio .

Mohammad: Lastly, I really believe in not only learning from your mistakes but, if possible…learning from the mistakes of others. What is your biggest mistake? If you could go back in time, is there something that you regret not doing? And if so, what?

Drew: My biggest mistake was getting too comfortable. I was at my last job for eleven years and while I can’t say with the utmost confidence that had I stayed I would still be there, but there was plenty to work on with lots and lots and lots of technical debt that needed solved. For all I know, it might still need solved. Regardless, my skills languished because I wasn’t getting to work on too much new stuff, and I had this crazy idea that I wanted to try people management. I did get my shot at it, and it was fine… but I found myself missing the deep technical challenges I was used to solving. When I did finally muster up the courage to leave that role (and company) it turned out to be the best thing I ever did for career and well-being. I have never been more energized, challenged, and ultimately fulfilled than I am now. I get to work with great people, solve tough problems, and still have a ton of fun. So my biggest regret is not having the courage to do it even sooner than I did. It all worked out, but it’s a fascinating “what if?”

Data science in SQL Server: pivoting and transposing data

$
0
0

In data science, understanding and preparing data is critical. In this article, in the series, we’ll discuss understanding and preparing data by using SQL transpose and pivot techniques.

Transposing a matrix means reversing rows and columns. The data frames in R and python are actually matrices, and therefore transposing is not a problem. A SQL Server table is a slightly different structure, where rows and columns are not equivalent and interchangeable. A row represents an entity, and a column an attribute of an entity. However, you might get data in SQL Server form a matrix from other systems, and meet the need to transpose it. Transposing data does not involve aggregations.

Pivoting is a similar operation. You need three columns for the operation. The first column is giving the row groups. The second column is transposed; its values are used for the new columns’ names. The third column gives the values, which are aggregated over rows and columns.

T-SQL Pivot operator

As you are probably already used to in my data science articles, I will start with preparing some data. I am using the data from the AdventureWorksDW2017 database. Note that I switched from the SQL Server 2016 version of this database I used in the previous articles to the SQL Server 2017 version. Don’t worry, the structure of the database and the data is nearly the same.

USE AdventureWorksDW2017; -- Data preparation SELECT g.EnglishCountryRegionName AS Country, g.StateProvinceName AS State, g.EnglishCountryRegionName + ' ' + g.StateProvinceName AS CountryState, d.CalendarYear AS CYear, SUM(s.SalesAmount) AS Sales INTO dbo.SalesGeoYear FROM dbo.FactInternetSales s INNER JOIN dbo.DimDate d ON d.DateKey = s.OrderDateKey INNER JOIN dbo.DimCustomer c ON c.CustomerKey = s.CustomerKey INNER JOIN dbo.DimGeography g ON g.GeographyKey = c.GeographyKey WHERE g.EnglishCountryRegionName IN (N'Australia', N'Canada') GROUP BY g.EnglishCountryRegionName, g.StateProvinceName, d.CalendarYear; GO

You can make a quick overview of the data with the following two queries. Note the distinct years returned by the second query.

SELECT TOP 5 * FROM dbo.SalesGeoYear; SELECT DISTINCT CYear FROM dbo.SalesGeoYear;

The distinct years with sales are from 2010 to 2014. Besides years, I aggregated sales over countries and states as well. I also added a combined column CountryState in the table dbo.SalesGeoYear I will use further in this article.

Let me start with the T-SQL PIVOT operator. The following query calculates the sum of the sales over countries and years. Please note the syntax for the PIVOT operator. The sales column is used for the aggregation, and the CYear column for the labels of the new pivoted columns. Grouping is implicit; all other columns, not used for pivoting or aggregation, are used in an implicit GROUP BY.

WITH PCTE AS ( SELECT Country, CYear, Sales FROM dbo.SalesGeoYear ) SELECT Country, [2010], [2011], [2012], [2013], [2014] FROM PCTE PIVOT (SUM(Sales) FOR CYear IN ([2010], [2011], [2012], [2013], [2014])) AS P;

Here is the result of the query.


Data science in SQL Server: pivoting and transposing data

Of course, you can change the aggregate function. For example, the following query calculates the count of the sales over countries and years.

WITH PCTE AS ( SELECT Country, CYear, Sales FROM dbo.SalesGeoYear ) SELECT Country, [2010], [2011], [2012], [2013], [2014] FROM PCTE PIVOT (COUNT(Sales) FOR CYear IN ([2010], [2011], [2012], [2013], [2014])) AS P;

You probably noticed that I used a common table expression to prepare the rowset for pivoting; I am not using the table directly. This is due to the implicit grouping. Somebody that defined the syntax for the PIVOT operator wanted to make the code shorter; however, because of that, you need to write more code to be on the safe side. Columns that are not used for pivoting and aggregating are used for grouping. What happens if you read a column more, like in the following query?

WITH PCTE AS ( SELECT Country, State, CYear, Sales FROM dbo.SalesGeoYear ) SELECT Country, [2010], [2011], [2012], [2013], [2014] FROM PCTE PIVOT (SUM(Sales) FOR CYear IN ([2010], [2011], [2012], [2013], [2014])) AS P;

I read also the column State in the CTE. I am not using it in the outer query. However, the result is quite different from the previous one.


Data science in SQL Server: pivoting and transposing data

The query did implicit grouping over two columns, Country and State.

The PIVOT operator is not really intended for transposing the table. You always need to have an aggregate function. However, you can simulate transposing when you have a single value over rows and pivoted columns with the MIN() or MAX() aggregate functions. For example, the following query does not work:

WITH PCTE AS ( SELECT CountryState, CYear, Sales FROM dbo.SalesGeoYear ) SELECT CountryState, [2010], [2011], [2012], [2013], [2014] FROM PCTE PIVOT (Sales FOR CYear IN ([2010], [2011], [2012], [2013], [2014])) AS P;

But, as mentioned, it is easy to change it to a query that does work, the query that just transposes the data, without aggregation. Or, to be precise, the aggregation exists, with the MAX() function, on a single value, returning the value itself.

WITH PCTE AS ( SELECT CountryState, CYear, Sales FROM dbo.SalesGeoYear ) SELECT CountryState, [2010], [2011], [2012], [2013], [2014] FROM PCTE PIVOT (MAX(Sales) FOR CYear IN ([2010], [2011], [2012], [2013], [2014])) AS P;

The PIVOT operator is T-SQL proprietary operator. It is not part of the ANSI SQL standard. You can write pivoting queries with ANSI standard SQL as well, using the CASE expression, like the following query shows.

SELECT Country, SUM(CASE WHEN CYear = 2010 THEN Sales END) AS [2010], SUM(CASE WHEN CYear = 2011 THEN Sales END) AS [2011], SUM(CASE WHEN CYear = 2012 THEN Sales END) AS [2012], SUM(CASE WHEN CYear = 2013 THEN Sales END) AS [2013], SUM(CASE WHEN CYear = 2014 THEN Sales END) AS [2014] FROM dbo.SalesGeoYear GROUP BY Country;

Besides implicit grouping, there is another problem with the PIVOT operator. You can’t get the list of the distinct values of the pivoted column dynamically, with a subquery. You need to use dynamic SQL for this task. I am showing how to create a pivoting query dynamically in the following code. Note that I create the concatenated list of pivoted column names with the STRING_AGG() function, which is new in SQL Server 2017.

DECLARE @stmtvar AS NVARCHAR(4000); SET @stmtvar = N' WITH PCTE AS ( SELECT Country, CYear, Sales FROM dbo.SalesGeoYear ) SELECT * FROM PCTE PIVOT (SUM(Sales) FOR CYear IN (' + (SELECT STRING_AGG(Cyear, N', ') WITHIN GROUP (ORDER BY CYear) FROM ( SELECT DISTINCT QUOTENAME(CYear) AS CYear FROM dbo.SalesGeoYear ) AS Y) + N')) AS P;'; EXEC sys.sp_executesql @stmt = @stmtvar;

This will be enough T_SQL for this article, I am switching to R now.

Transposing and Pivoting in R

As always, I am starting with reading the data from SQL Server in an R data frame. I am also using the View() function to show it immediately.

library(RODBC) con <- odbcConnect("AWDW", uid = "RUser", pwd = "Pa$$w0rd") SGY <- as.data.frame(sqlQuery(con, "SELECT Country, State, CountryState, CYear, Sales FROM dbo.SalesGeoYear;"), stringsAsFactors = TRUE) close(con) View(SGY)

The simplest way to transpose the data is with the t() function from the basic package:

t(SGY)

The transposed matrix is not very readable. Here is the partial result.

SQL Server Always On Monitor The Redo_Queue_Size

$
0
0

Cause sometimes you need to check why replication is slow or halted.

For example, like during an IO storm no hurricane pun intended. Say you have transnational replication over an Always On (SQL Server Availability Groups) environment that’s a publisher.

use Jarvis go create or alter proc uspAlertRedoQueueSize as set nocount on; declare @minrds int = 5242880, @rds bigint, @dbname nvarchar(128), @node varchar(128), @msg nvarchar(max), @to varchar(max) = 'tony.stark@avengers.com' select top 1 @rds = coalesce(max(drs.redo_queue_size), 0), @dbname = db_name(drs.database_id), @node = ar.replica_server_name from sys.dm_hadr_availability_replica_states ars left join sys.availability_replicas ar on ar.replica_id = ars.replica_id left join sys.availability_groups ag on ag.group_id = ars.group_id left join sys.dm_hadr_database_replica_states drs on drs.replica_id = ars.replica_id where ars.role_desc = 'secondary' and drs.redo_queue_size > @minrds group by db_name(drs.database_id), ar.replica_server_name order by coalesce(max(drs.redo_queue_size), 0) desc; if @rds >= 10485760 begin set @msg = null; select @msg = @node + '.' + @dbname + ': ' + convert(varchar(20), @rds); exec msdb.dbo.sp_send_dbmail @recipients = @to ,@subject = 'Crit: Max redo_queue_size >= 10GB' ,@body = @msg ,@importance = 'high' end if @rds between @minrds and 10485760 begin select @msg = @node + '.' + @dbname + ': ' + convert(varchar(20), @rds); exec msdb.dbo.sp_send_dbmail @recipients = @to ,@subject = 'Warn: Max redo_queue_size > 5GB' ,@body = @msg end go

To see in the Dashboard just add the column, this is what it looks like when there’s a problem. If you’re wonder that number was 284.86GBs yup, it was a very busy day.


SQL Server Always On   Monitor The Redo_Queue_Size
Awesome PowerShell core calculation
SQL Server Always On   Monitor The Redo_Queue_Size

So to automate things, add the alert monitor as a step to a recurring SQL Agent Job to all replicas where it runs only if the node is Primary and you’ll get a nice email.

if exists (select 1 from sys.dm_hadr_availability_replica_states as ars inner join sys.availability_group_listeners as agl on ars.group_id = agl.group_id inner join sys.availability_replicas as arcn on arcn.replica_id = ars.replica_id where ars.role_desc = 'PRIMARY' and ars.operational_state_desc = 'ONLINE' and agl.dns_name = 'STARK-LG' and arcn.replica_server_name = @@SERVERNAME) begin --job step here: exec Jarvis.dbo.uspAlertRedoQueueSize end else begin print 'Server is not Primary for LG.' end

P.S. Go get ’em tiger.

emoji

Hiram

New Command in SQL Server 2019 ADD SENSITIVITY CLASSIFICATION

$
0
0

By: Aaron Bertrand || Related Tips:More >SQL Server 2019

Problem

Microsoft added a task in SQL Server Management Studio 17.5 called " Classify Data ." The purpose is to help identify columns that are potentially sensitive in nature, and that may need to be protected in order to remain compliant under various regulations, including SOX, HIPAA, PCI, and GDPR. It does a pretty good job of identifying vulnerable columns and helping you classify them depending on the risk and the type of exposure they present. But aside from showing you the results of this classification, this feature does not raise any additional visibility to these sensitive columns, never mind any suspicious access to them.

Solution

For background on the functionality introduced in SQL Server 2017 and SSMS 17.5, imagine we have this table:

CREATE TABLE dbo.Contractors
(
ContractorID int,
FirstName sysname,
LastName sysname,
SSN char(9),
Email varchar(320),
PasswordHash varbinary(256),
HourlyRate decimal(6,2)
);

When I run the wizard, I get the following recommendations:


New Command in SQL Server 2019 ADD SENSITIVITY CLASSIFICATION

You might not agree with all of the classifications, so you can change them, and even can add your own, specifying various information types and sensitivity levels. These are:


New Command in SQL Server 2019 ADD SENSITIVITY CLASSIFICATION

When you select the columns you want to classify, click Accept, and then Save, it adds extended properties around those columns. The report in SSMS uses those extended properties, and displays them as follows:


New Command in SQL Server 2019 ADD SENSITIVITY CLASSIFICATION

You could of course write your own app that consumes them, and maybe there are already tools out there that do, but nothing really happens here inside of SQL Server, except these extended properties exist and they surface the relevant columns on the report.

SQL Server SENSITIVITY CLASSIFICATION

In SQL Server 2019, there is a lot more automation built into the system, allowing you to easily add sensitive data labels to columns, which will get pulled into audits by default. Let's take the same table above, and add our own sensitivity classifications similar to the ones we added above with extended properties, using the ADD SENSITIVITY CLASSIFICATION command ( already available in Azure SQL Database ):

ADD SENSITIVITY CLASSIFICATION TO
dbo.Contractors.FirstName,
dbo.Contractors.LastName
WITH (LABEL = 'Confidential - GDPR', INFORMATION_TYPE = 'Contact Info');
ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.SSN
WITH (LABEL = 'Highly Confidential', INFORMATION_TYPE = 'National ID');
ADD SENSITIVITY CLASSIFICATION TO
dbo.Contractors.email,
dbo.Contractors.PasswordHash
WITH (LABEL = 'Confidential', INFORMATION_TYPE = 'Credentials');
ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.HourlyRate
WITH (LABEL = 'Highly Confidential', INFORMATION_TYPE = 'Financial');

These don't create extended properties, but rather we can see these in sys.sensitivity_classifications:


New Command in SQL Server 2019 ADD SENSITIVITY CLASSIFICATION

If we are creating an audit, we don't have to do anything special to pick up these classifications, except be auditing the table (in other words, existing audits will simply start inheriting these classifications as they are added). So, if we have a server audit:

USE master;
GO
CREATE SERVER AUDIT GDPRAudit TO FILE (FILEPATH = 'C:\temp\Audit\');
GO
ALTER SERVER AUDIT GDPRAudit WITH (STATE = ON);
GO

Then a database audit that is monitoring read activity on the table:

USE HR;
GO
CREATE DATABASE AUDIT SPECIFICATION AuditContractors
<strong> </strong>FOR SERVER AUDIT GDPRAudit
<strong> </strong>ADD (SELECT ON dbo.Contractors BY dbo) WITH (STATE = ON);

With the audit enabled, if we run a couple of queries:

SELECT * FROM dbo.Contractors;
SELECT FirstName, LastName, HourlyRate FROM dbo.Contractors;

We can observe access to specific types of information this way, with the new column data_sensitivity_information that is included in the audit:

SELECT
session_server_principal_name,
event_time,
<strong> </strong>[host_name],
[object] = [database_name] + '.' + [schema_name] + '.' + [object_name],
[statement],
data_sensitivity_information = CONVERT(xml, data_sensitivity_information)
FROM sys.fn_get_audit_file ('c:\temp\Audit\GDPRAudit_*.sqlaudit', default, default)
WHERE action_id = 'SL'; -- SELECT

Results:


New Command in SQL Server 2019 ADD SENSITIVITY CLASSIFICATION

You can click on any XML column value to see these results:

-- from the first row, with SELECT *:
<sensitivity_attributes>
<sensitivity_attribute label="Confidential - GDPR" information_type="Contact Info" />
<sensitivity_attribute label="Highly Confidential" information_type="National ID" />
<sensitivity_attribute label="Confidential" information_type="Credentials" />
<sensitivity_attribute label="Highly Confidential" information_type="Financial" />
</sensitivity_attributes>
<strong> </strong>
-- from the second row, with specific columns:
<sensitivity_attributes>
<sensitivity_attribute label="Confidential - GDPR" information_type="Contact Info" />
<sensitivity_attribute label="Highly Confidential" information_type="Financial" />
</sensitivity_attributes>

It is interesting to note that FirstName and LastName combined to yield only one element in the XML. This means that whether you have one or 50 columns with a specific label and information type combination, it won’t add any unnecessary data to the audit. Anyway, for those of you that are XML-savvy, you can likely see how easy it would be to extract that information as nodes and then run queries against it, making it very easy to identify all the accesses to specific labels or label/type combinations.

I feel like there might be a market for a conversion tool that would translate these data classifications you may have already invested in (or built tools that consume) from extended properties to sensitivity classifications, or vice-versa. I'm not going to handle synchronization here, but I'll offer up a start, taking the extended properties and building dynamic statements to migrate them:

DECLARE @sql nvarchar(max) = N'';
SELECT @sql += N'ADD SENSITIVITY CLASSIFICATION TO '
+ QUOTENAME(s.name) + QUOTENAME(o.name) + QUOTENAME(c.name)
+ ' WITH (LABEL = '''
+ REPLACE(CONVERT(nvarchar(256), l.value), '''', '''''')
+ ''', INFORMATION_TYPE = '''
+ REPLACE(CONVERT(nvarchar(256), t.value), '''', '''''')
+ ''');' + CHAR(13) + CHAR(10)

Select unique random rows from the SQL Server table but still duplicates

$
0
0

i am not the best with sql but i try my best to get my Problems done. I have a table "just" which is filled with Columns ( ID (PK, Identity), Char, Serv , Random ) . Now i want to select a random row from this table and insert it into the Table "f_WinGet". So far all my Procedures take this Step fine , but i always get Duplicates in the second table.

First table : 84 rows Second table: needed 35 random out of 84.

I have tried many other ways but i always get the same result. All my Procedure for random are binded to a Button i a C# Programm. All is working fine so far , but i always have some Duplicates of Rows in my Table.

INSERT INTO f_TWinGet SELECT TOP 1 Percent Char, Serv, Random FROM ( select distinct Char, Serv, Random from dbo.just) as derived1 ORDER BY CHECKSUM(NEWID())

It would be nice , if anyone hase an Idea how i can fix my Problem. I am still trying , but all what i get are always the same result.

With a table as small as yours you can use something like:

INSERT INTO f_TWinGet SELECT TOP 1 j.Char, j.Serv, j.Random FROM dbo.just j LEFT JOIN f_TWinGet f ON f.Char = j.Char AND j.Serv = f.Serv AND j.Random = f.Random WHERE f.Char IS NULL ORDER BY NEWID()

This way making sure that the values you're trying to insert is not on the final table.


Batch Mode part 4 (“Some of the limitations”)

$
0
0

This blog post is a part of the whole Batch Mode series, which can be found at the Nikoport Batch Mode page .

This post will focus on some of the initial limitations of the Batch Execution Mode on the Rowstore Indexes.

Please consider this as something that is absolutely expected to change in the future releases, and even though in some cases I seriously ask myself if the fixes will come rather late then soon, I still expect them to solve them all.

In any case without jumping ahead to the conclusion let’s consider what the Batch Execution on the Rowstore Indexes technically is. For me, it is a rather huge improvement over the previous hacks to inject the Batch Mode into the queries with the non-Columnstore Indexes … And given that the infrastructure for the processing of the Hash Match, Window Aggregates, Hash Join, different types of Sort and other operators supporting Batch Execution Mode, the essential 2 missing pieces were:

Heuristics (and take a look at the Batch Mode part 3 (“Basic Heuristics and Analysis”) for more details). Deciding when to kick of with the Batch Execution Mode is absolutely essential and this is the part which gives me a lot of fears of the scenarios when it shall kick off absolutely unnecessary, killing of the

The Index Scan iterator which would kick off with the Batch Mode, eliminating potentially a huge improvement of the data processing, should our tables have a huge number of rows.

Do not get me wrong, there were some stuff that I know and do not to put here, and there must have been tons of details that unless we become programmers at Microsoft we shall never find out, but I consider those 2 pieces to be the biggest blocks in the implementation.

We have already seen how Batch Mode can successfully turbo-charge the queries and improve them, I have also shown at Batch Mode part 2 (“Batch Mode on Rowstore in Basics”) an example of TPCH query, making the Batch Mode over the Rowstore Indexes to perform significantly slower, and in this blog post, let’s consider a couple of more scenarios where things do not go exactly the way we expect them to go over, by default.

A good table

Let’s build a “good” table with a Clustered Primary Key on the Rowstore Indexes to which we shall load just 2 million rows:

DROP TABLE IF EXISTS dbo.RowstoreDataTable;
CREATE TABLE dbo.RowstoreDataTable (
C1 BIGINT NOT NULL,
Constraint PK_RowstoreDataTable_Rowstore PRIMARY KEY CLUSTERED (C1),
) WITH (DATA_COMPRESSION = PAGE);
INSERT INTO dbo.RowstoreDataTable WITH (TABLOCK)
SELECT t.RN
FROM
(
SELECT TOP (2000000) ROW_NUMBER()
OVER (ORDER BY (SELECT NULL)) RN
FROM sys.objects t1
CROSS JOIN sys.objects t2
CROSS JOIN sys.objects t3
CROSS JOIN sys.objects t4
CROSS JOIN sys.objects t5
) t
OPTION (MAXDOP 1);

Let’s execute a rather dull aggregation over the only column that it contains and half the result while converting to the Decimal and grouping in the chunks of 10:

SELECT SUM(C1)/2.
FROM dbo.RowstoreDataTable
GROUP BY C1 % 10

You will find the execution plan of this query below with all expected iterators running in the Batch Execution Mode:


Batch Mode   part 4 (“Some of the limitations”)

Everything is shiny this is the standard we expect to have, if running a similar calculations against a single table, right ?

In-Memory

Let’s build an In-Memory table with the same 2 Million Rows and try running the very same query against it:

ALTER DATABASE Test
ADD FILEGROUP [Test_Hekaton]
CONTAINS MEMORY_OPTIMIZED_DATA
GO
ALTER DATABASE Test
ADD FILE(NAME = Test_HekatonDir,
FILENAME = 'C:\Data\TestXtp')
TO FILEGROUP [Test_Hekaton];
GO DROP TABLE IF EXISTS dbo.HekatonDataTable;
CREATE TABLE dbo.HekatonDataTable (
C1 BIGINT NOT NULL,
Constraint PK_HekatonDataTable_Hekaton PRIMARY KEY NONCLUSTERED (C1)
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);
INSERT INTO dbo.HekatonDataTable
SELECT t.RN
FROM
(
SELECT TOP (2000000) ROW_NUMBER()
OVER (ORDER BY (SELECT NULL)) RN
FROM sys.objects t1
CROSS JOIN sys.objects t2
CROSS JOIN sys.objects t3
CROSS JOIN sys.objects t4
CROSS JOIN sys.objects t5
) t
OPTION (MAXDOP 1);

Running our test query against this table produces the following execution plan in the SQL Server 2019 CTP 2.0:

SELECT SUM(C1)/2.
FROM dbo.HekatonDataTable
GROUP BY C1 % 10
Batch Mode   part 4 (“Some of the limitations”)
Batch Mode   part 4 (“Some of the limitations”)
While both Compute Scalar & Hash Match iterators run in the Batch Execution Mode, the Table Scan of the dbo.HekatonDataTable is being executed in the Row Execution Mode, like if we would apply an old hack against it by joining an empty table with a Columnstore Index. The amount of the performance lost will be proportional to the amount of data that is stored in our In-Memory table and it does not really matter if the table is a persisted or a schema-only, to my understanding we won’t be able to escape the traditional slow Row Execution Mode process for data extraction from the In-Memory table and we shall have to pass through the adapter converting the rows into batches for the later Batch Mode execution. This penalty is nothing extraordinary in its nature, but still a rather disappointing situation, given quite honestly the lack of love and investment that after initial 2 releases that In-Memory technology has received. I do not expect this fix to appear rather fast, but in the end, for the consistency there will be a need for it. LOBs

As we all know, LOBs are almost always one of the last feature to be implemented & supported think Online Rebuilds, even Columnstore Indexes (Clustered) got them in SQL Server 2017, while Nonclustered Columnstore Indexes still do not support them at all. This has probably to do with the amount of changes, complexity and storage requirements, and there won’t be a huge exception for the Batch Execution Mode consider the following example below where I create a new table that would almost be a copy of the original “good” one, but with a second column C2, that will be using VARCHAR(MAX) and where shall simply store NULLs:

DROP TABLE IF EXISTS dbo.RowstoreWithLOBDataTable;
CREATE TABLE dbo.RowstoreWithLOBDataTable (
C1 BIGINT NOT NULL,
C2 VARCHAR(MAX) NULL,
Constraint PK_RowstoreWithLOBDataTable_RowstoreWithLOB PRIMARY KEY CLUSTERED (C1),
) WITH (DATA_COMPRESSION = PAGE);
INSERT INTO dbo.RowstoreWithLOBDataTable WITH (TABLOCK)
(C1)
SELECT t.RN
FROM
(
SELECT TOP (2000000) ROW_NUMBER()
OVER (ORDER BY (SELECT NULL)) RN
FROM sys.objects t1
CROSS JOIN sys.objects t2
CROSS JOIN sys.objects t3
CROSS JOIN sys.objects t4
CROSS JOIN sys.objects t5
) t
OPTION (MAXDOP 1);

The only thing that will change is that we shall insert a predicate searching for the NOT NULL rows in C2:

SELECT SUM(C1)/2.
FROM dbo.RowstoreWithLobDataTable
WHERE C2 IS NOT NULL
GROUP BY C1 % 10

The execution plan will tell the whole story about how it affects the query, but the point here to make is that not a single iterator in this case will run in the Batch Execution Mode:


Batch Mode   part 4 (“Some of the limitations”)

One of the very important points here is that it is enough just to search for the data in the LOB column in order to avoid the Batch Execution Mode, you do not have to aggregate on it and this will be a very big factor for some of the installations I know, where just a simple message column defined with a wrong data type will do a huge damage by not allowing the Batch Execution Mode to kick in.

I do not expect this one to be solved quickly and this worries me much more then just In-Memory tables.

XML

XML ? What on the planet Earth do you mean by that, Niko ?

Now, in the age of JSON, who cares about the XML ?

Well … A lot of application around still do, and they will … :)

Let’s build another test table dbo.RowstoreXMLDataTable with an additional column C2, containing XML, that will contain primitive copy of the C1 row number surrounded by the delightful? <root> tag:

DROP TABLE IF EXISTS dbo.RowstoreXMLDataTable;
CREATE TABLE dbo.RowstoreXMLDataTable (
C1 BIGINT NOT NULL,
C2 XML NOT NULL
Constraint PK_RowstoreXMLDataTable PRIMARY KEY CLUSTERED (C1),
INDEX IX_RowstoreXMLDataTable_C1 NONCLUSTERED (C1) WHERE C1 > 500000
) WITH (DATA_COMPRESSION = PAGE);
INSERT INTO dbo.RowstoreXMLDataTable WITH (TABLOCK)
SELECT t.RN, ' <root> ' + CAST(t.RN as VARCHAR(10)) + ' </root>'
FROM
(
SELECT TOP (2000000) ROW_NUMBER()
OVER (ORDER BY (SELECT NULL)) RN
FROM sys.objects t1
CROSS JOIN sys.objects t2
CROSS JOIN sys.objects t3
CROSS JOIN sys.objects t4
CROSS JOIN sys.objects t5
) t
OPTION (MAXDOP 1);

Running the summing query over our XML column will result in the following execution plan:

SELECT SUM(C2.value('(/root)[1]', 'bigint' ) )/2.
FROM dbo.RowstoreXMLDataTable
GROUP BY C1 % 10
Batch Mode   part 4 (“Some of the limitations”)

The execution plan has got it all … The Table Valued Functions, UDX, Stream Aggregates, etc. The only thing missing is the Batch Execution Mode and it ain’t coming back home.

While I am not the biggest fan of XML, it is a still very much needed format and its support should be implemented. It is not as high on the priority list as the LOBs but still pretty neat thing to have.

Spatial

You did not expected it, did not you ?

DROP TABLE IF EXISTS dbo.SpatialDataTable;
CREATE TABLE dbo.SpatialDataTable
(
c1 int primary key,
c2 geometry
);
CREATE SPATIAL INDEX IX_SpatialDataTable_c2
ON dbo.SpatialDataTable(c2)
WITH
(
BOUNDING_BOX = ( xmin=-16, ymin=16, xmax=-9, ymax=21 ),
GRIDS = ( LEVEL_3 = HIGH, LEVEL_2 = HIGH )
);
INSERT INTO dbo.SpatialDataTable WITH (TABLOCK)
(c1)
SELECT t.RN
FROM
(
SELECT TOP (2000000) ROW_NUMBER()
OVER (ORDER BY (SELECT NULL)) RN
FROM sys.objects t1
CROSS JOIN sys.objects t2
CROSS JOIN sys.objects t3
CROSS JOIN sys.objects t4
CROSS JOIN sys.objects t5
) t
OPTION (MAXDOP 1); DECLARE @g GEOMETRY = 'POINT(-112.33 65.4332)';
SELECT SUM( c1 )/2.
FROM dbo.SpatialDataTable WITH(INDEX(IX_SpatialDataTable_c2))
WHERE c2.STDistance(@g) <=(30 * 1000)
GROUP BY c1 % 10;

Which is pity and given that it is nice feature, totally underused by the Data Professionals:


Batch Mode   part 4 (“Some of the limitations”)

We still have some iterators execution with the Batch Mode, but the Scanning of the Spatial Indexes is a slow RBAR. :)

Given the complexity of the task and demand for it, I do not expect this one to get fixed until 2038. :)

Interesting here is that we can actually see the last_optimization_level property for the Batch Mode Heuristics to show a real value of 2 ...

Cursors

Nope. Like in the case with Columnstore Indexes.

Full Text

Ha!

Bitmap Filters

At this point we are back to the TPCH drawing board and the query 19 is still here to stay - giving us some of the most incredible headaches, and not because the Batch Mode is not enabled in it - it is.

Below again the Batch Mode over Rowstore and Row Mode over Rowstore TPCH Query 19:

SELECT SUM(L_EXTENDEDPRICE* (1 - L_DISCOUNT)) AS REVENUE
FROM LINEITEM, PART
WHERE (P_PARTKEY = L_PARTKEY AND P_BRAND = 'Brand#12' AND P_CONTAINER IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') AND L_QUANTITY >= 1 AND L_QUANTITY <= 1 + 10 AND P_SIZE BETWEEN 1 AND 5
AND L_SHIPMODE IN ('AIR', 'AIR REG') AND L_SHIPINSTRUCT = 'DELIVER IN PERSON')
OR (P_PARTKEY = L_PARTKEY AND P_BRAND ='Brand#23' AND P_CONTAINER IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') AND L_QUANTITY >=10 AND L_QUANTITY <=10 + 10 AND P_SIZE BETWEEN 1 AND 10
AND L_SHIPMODE IN ('AIR', 'AIR REG') AND L_SHIPINSTRUCT = 'DELIVER IN PERSON')
OR (P_PARTKEY = L_PARTKEY AND P_BRAND = 'Brand#34' AND P_CONTAINER IN ( 'LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') AND L_QUANTITY >=20 AND L_QUANTITY <= 20 + 10 AND P_SIZE BETWEEN 1 AND 15
AND L_SHIPMODE IN ('AIR', 'AIR REG') AND L_SHIPINSTRUCT = 'DELIVER IN PERSON')
SELECT SUM(L_EXTENDEDPRICE* (1 - L_DISCOUNT)) AS REVENUE
FROM LINEITEM, PART
WHERE (P_PARTKEY = L_PARTKEY AND P_BRAND = 'Brand#12' AND P_CONTAINER IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') AND L_QUANTITY >= 1 AND L_QUANTITY <= 1 + 10 AND P_SIZE BETWEEN 1 AND 5
AND L_SHIPMODE IN ('AIR', 'AIR REG') AND L_SHIPINSTRUCT = 'DELIVER IN PERSON')
OR (P_PARTKEY = L_PARTKEY AND P_BRAND ='Brand#23' AND P_CONTAINER IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') AND L_QUANTITY >=10 AND L_QUANTITY <=10 + 10 AND P_SIZE BETWEEN 1 AND 10
AND L_SHIPMODE IN ('AIR', 'AIR REG') AND L_SHIPINSTRUCT = 'DELIVER IN PERSON')
OR (P_PARTKEY = L_PARTKEY AND P_BRAND = 'Brand#34' AND P_CONTAINER IN ( 'LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') AND L_QUANTITY >=20 AND L_QUANTITY <= 20 + 10 AND P_SIZE BETWEEN 1 AND 15
AND L_SHIPMODE IN ('AIR', 'AIR REG') AND L_SHIPINSTRUCT = 'DELIVER IN PERSON')
OPTION (USE HINT('DISALLOW_BATCH_MODE'));

This is the query for the Batch Execution Mode:


Batch Mode   part 4 (“Some of the limitations”)

, while the faster query which is using the Row Execution Mode:


Batch Mode   part 4 (“Some of the limitations”)

The problem in the query above is that the Bitmap Filters are executed at the very late stage, after processing the regular filters, with all the rows being directly read from the dbo.part & dbo.lineitem tables - contrary to the Rowstore Indexes with the Row Execution Mode and being different to the situation of the Columnstore Indexes.

Strings are "kind of evil" and the Batch Mode is not their best friend, especially right now. If you are looking to test your workload right away with CTP 2.0, while working with a lot of strings (and oh so many Data Warehouses do that), please consider waiting a little bit before making a final decision, you will not regret it.

I have been told by Microsoft that this is one of the issues that is being worked on and that it is very much expected to be resolved before the RTM of the SQL Server 2019.

Final Thoughts

The limitations of the Batch Execution Mode on Non-Columnstore Indexes is mostly based on not-Rowstore data sources and their scans. There are some noticeable exceptions, such as LOBs, and they should be kept pretty much in mind.

Mostly those limitations are delighters, and not getting an extra snack is not the same as getting almost entire one.

Given that Microsoft expects the String Filters to get a significant improvement before the RTM, I feel pretty much comfortable looking into the future.

While of course, those perky LOBs, disabling the Batch Execution Mode are nothing short of being fine ... But I am a believer that this limitation will fade away in a relatively near future. (I have no such feedback/information from Microsoft, but I see a huge potential in the massive text processing that will eventually gain its space on SQL Server & Azure SQL Database).

The bottom line is to remember that the feature is called Batch Mode on Rowstore (and if you are not using Row Storage, you are most probably out of luck).

to be continued ...

End of support for SQL Server 2008 & 2008 R2

$
0
0

End of support for SQL Server 2008 &amp; 2008 R2
If you are running SQL Server 2008 or SQL Server 2008 R2, what does July 9th, 2019 mean for you? With both of these versions of SQL Server reaching the end of their support lifecycle together, you will no longer be able to get critical security updates. This can cause serious security and compliance issues for your organization.

When these versions of SQL Server were released, they came with 10 years of support; 5 years of Mainstream Support and 5 years of Extended Support. If your organization still has SQL Server 2008/2008 R2 in production, how is your organization planning to address the risk? For organizations that are heavily regulated, this is a big concern.

You need to choose how you’re going to migrate and where you’re going to migrate to, and then make sure you’re not going to hit any roadblocks along the way.

Migration Assessment Tools

If you are planning an upgrade from SQL Server 2008/2008 R2, Microsoft has made things much easier to test and validate your environment. Numerous tools exist that can assist with migration assessments and even handle migration tasks, and they’re all slightly different. These tools include:

Data Migration Assistant Microsoft Assessment and Planning Toolkit Azure Database Migration Service Database Experimentation Assistant

The Data Migration Assistant helps you to upgrade to a modern data platform. It does this by detecting compatibility issues that can impact functionality on the newer version of SQL Server and makes recommendations for performance and reliability improvements for the new environment. Your source can be SQL Server 2005+ with a target of SQL 2012+ and Azure SQL Database.

The Microsoft Assessment and Planning Toolkit has been around for many years and is often referred to as the MAP Tool. It’s great for doing an inventory of your current environment to find where SQL Server (and other applications) exist.

The Azure Database Migration Service integrates some of the functionality of existing tools and services to provide customers with a comprehensive solution for migrating to Azure. The tool generates assessment reports that provide recommendations to guide you through any changes required prior to performing a migration. This service currently requires a VPN or Express Route.

Finally, the Database Experimentation Assistant is a new A/B testing solution for SQL Server Upgrades and it’s a tool you should become familiar with. It leverages Distributed Replay to capture a workload and replay it against a target SQL Server. This can be used to test hardware changes or version differences of SQL Server. You can capture workloads from SQL Server 2005 and up.

Migration Options

On-premises upgrade:One of the easiest migration methods is to upgrade to a newer version of SQL Server. In this case, you have SQL Server 2012, 2014, 2016, or 2017 to pick from. I encourage clients to upgrade to the latest version that they can. SQL Server 2012 is already out of Mainstream Support and SQL Server 2014 goes out of Mainstream Support on July 9 th , 2019. Upgrading can be very time consuming and costly to organizations due to all the planning and testing involved, so moving to the latest version can increase the time before the next upgrade. There are also numerous performance and functionality improvements in SQL Server 2016 and 2017 that make migrating to SQL Server 2012 or 2014 a very poor choice at this time.

A common approach for on-premises upgrades is to build new and migrate, regardless of a physical or virtual environment. By building new, you can restore your databases and conduct numerous rounds of testing and validation to make sure everything works as expected before moving production.

Upgrade and migrate to an Azure VM:For organizations that are looking to migrate to the cloud, Azure Infrastructure as a Service (IaaS) is a great option. Running SQL Server on an Azure VM is much like on-premises. You specify the size of the VM (number of vCPUs and memory) and configure your storage for your I/O and size requirements. You are still responsible for supporting the OS and SQL Server for configuration and patching. Azure IaaS gives you the ability to easily scale your workloads by scaling the size of your virtual machine up or down as your workload needs change, as well as take advantage of Azure Active Directory integration, threat detection, and many other Azure benefits.

Migrate to Azure SQL Database:Another option you have is to migrate to Azure SQL Database. Azure SQL Database can be thought of as a Database as a Service and is part of Microsoft’s Platform as a Service (PaaS). Azure SQL Database functionality is database scoped, which means certain things such as cross database queries, SQL Server Agent, Database Mail, and more are not available. However, many customers that have applications that utilize a single database have been able to migrate to Azure SQL Database with minimal effort. You can quickly test for compatibility with Azure SQL Database by using the Data Migration Assistant. With Azure SQL Database, you can size your databases by DTU (Database Transaction Units) or vCores individually, or group databases into an Elastic Pool. Azure SQL Database allows you to scale your resources up and down with minimal effort and downtime.

Migrate to Azure SQL Managed Instance:A new option (as of 2018) is to migrate to Azure SQL Managed Instance. This is a new product that is currently generally available as of October 1 st for the General-Purpose tier. Managed Instance was built using the instance-level programming model. This means that functionality we are used to with the full version of SQL Server is supported. The goal of Managed Instance is to have 100% surface area compatibility with on-premises. All databases in the instance are on the same server, so cross-database queries are supported, as are Database Mail, SQL Server Agent, Service Broker, and much more. There are two pricing tiers; General Purpose, that includes a non-readable secondary for HA, and Business Critical, that has two non-readable secondaries and a readable secondary. Managed Instance is part of Microsoft’s PaaS offering, so you get all the built-in features and functionality of PaaS.

Move as-is to Azure Virtual Machines:Microsoft is offering three years of Extended Security Updates at no additional charge if you move your SQL 2008/SQL 2008 R2 instances to an Azure VM. The goal is to give you a bit more time to upgrade to a newer version of SQL Server when you are ready.

Pay to Stay:This isn’t a migration option, but you do have an option to purchase up to three years of Extended Security Updates. There are restrictions around this option. You must have active Software Assurance for those instances or Subscription licenses under an Enterprise Agreement. If this applies to you, then this option can buy you more time to plan and migrate off of SQL Server 2008/2008 R2.

Migration Best Practices

When performing any migration or upgrade, there are certain things you need to be aware of. First, you need baselines and I can’t stress this enough. Anytime you make a change to an environment, you need to be able to measure how that change impacts the environment. Knowing key performance metrics for your environment can help you when troubleshooting any perceived impact. You can manually collect these metrics using perfmon and DMVs or invest in a performance monitoring platform. I wrote about both techniques in more detailin a previous post, and right now you can get an extended, 45-day evaluation of SentryOne . Having baseline metrics for things like CPU utilization, memory consumption, disk metrics, and more can quickly let you know if things look better or worse after an upgrade or migration.

You should also note your configuration options within your instance. Many times, I’ve been asked to look at a SQL Server instance after an upgrade or migration and found that most of the default settings are in use. If the old system is still available, I’m able to query it and get the previous non-default values that were in place and apply those to the new environment to get them back to a known configuration. It is always good to review sys.configurations on your production server to consider making similar changes on your new environment (cost threshold for parallelism, max degree of parallelism, optimize for ad hoc workloads, and more.) Notice I wrote ‘consider’. If your core count or memory is different on the new server, you need to configure the settings taking the new server’s size into account.

What is your backout plan if things go wrong? Do you have proper backups you can go back to? In most cases with an upgrade or migration, you are moving to a new VM or physical server. Your failback may be to move back to the old server. If you have had data change in the new version of SQL Server, your failback is much more complicated. You cannot restore a SQL Server database backup from a newer version of SQL Server to an older version.

Conclusion

If you are still using SQL Server 2008 or SQL Server 2008 R2, you have a few options available to you to stay in compliance after July 9th, 2019. To stay on SQL Server 2008 or SQL Server 2008 R2, you can purchase Extended Security Updates or move to an Azure virtual machine if you qualify. If you can upgrade, you can migrate to a supported version of SQL Server on-premises or on an Azure VM or consider migrating to a managed solution such as Azure SQL Database or Azure SQL Managed Instance.

Group t-sql by category and get the best values &ZeroWidthSpace;&Zer ...

$
0
0

Imagine I have this table:

Month | Person | Value ---------------------- Jan | P1 | 1 Jan | P2 | 2 Jan | P3 | 3 Feb | P1 | 5 Feb | P2 | 4 Feb | P3 | 3 Feb | P4 | 2 ...

How can I build a t-sql query to get the top 2 value rows and a third with the sum of others?

Something like this:

RESULT: Month | Person | Value ---------------------- Jan | P3 | 3 Jan | P2 | 2 Jan | Others | 1 -(sum of the bottom value - in this case (Jan, P1, 1)) Feb | P1 | 5 Feb | P2 | 4 Feb | Others | 5 -(sum of the bottom values - in this case (Feb, P3, 3) and (Feb, P4, 2))

Thanks

In the assumption you are using SQL Server 2005 or higher, using a CTE would do the trick.

ROW_NUMBER SELECT UNION SQL Statement ;WITH Months (Month, Person, Value) AS ( SELECT 'Jan', 'P1', 1 UNION ALL SELECT 'Jan', 'P2', 2 UNION ALL SELECT 'Jan', 'P3', 3 UNION ALL SELECT 'Feb', 'P1', 5 UNION ALL SELECT 'Feb', 'P2', 4 UNION ALL SELECT 'Feb', 'P3', 3 UNION ALL SELECT 'Feb', 'P4', 2 ), q AS ( SELECT Month , Person , Value , RowNumber = ROW_NUMBER() OVER (PARTITION BY Month ORDER BY Value DESC) FROM Months ) SELECT Month , Person , Value FROM ( SELECT Month , Person , Value , RowNumber FROM q WHERE RowNumber <= 2 UNION ALL SELECT Month , Person = 'Others' , SUM(Value) , MAX(RowNumber) FROM q WHERE RowNumber > 2 GROUP BY Month ) q ORDER BY Month DESC , RowNumber

Kudo's go to Andriy for teaching me some new tricks.

DbForge Query Builder for SQL Server入门教程:分析SQL查询

$
0
0

【 dbForge Query Builder for SQL Server下载 】

您可以使用Query Profiler工具来调试,故障排除,监视和测量应用程序的SQL语句和存储过程。如果您的应用程序存在您认为可能由特别长时间运行的查询引起的性能问题,则可以分析查询持续时间。

在本文中,我们将介绍如何分析简单查询。作为示例,我们将使用Microsoft的AdventureWorks2012测试数据库。

分析SQL语句

我们将从Person表中选择名字为“Robin”的所有人。

1. 在“Start” 页面上,单击“Query Profiler” 。将打开一个新的SQL文档窗口。

2. 在文本编辑器中,键入以下脚本:


DbForge Query Builder for SQL Server入门教程:分析SQL查询

3. 单击“Execute” 。Plan Diagram窗口打开。


DbForge Query Builder for SQL Server入门教程:分析SQL查询

请注意,选择图标包含警告标记,用于警告缺点。将鼠标指针悬停在“Select” 图标上时,会自动显示工具提示。工具提示的底部包含警告消息,告诉我们索引缺失。要添加索引,请执行以下脚本:


DbForge Query Builder for SQL Server入门教程:分析SQL查询

现在我们可以回到Query Profiler,然后单击Get New Results按钮。

每次获得执行查询的分析结果时,它们都会在树视图中显示为具有查询执行时间和日期的新节点。编辑查询时,您想知道更改是否会缩短查询执行时间。Query Profiler可以快速比较分析结果。要比较结果,请按住CTRL键并选择两个计划图。


DbForge Query Builder for SQL Server入门教程:分析SQL查询

注意:在获取查询分析结果后保存查询文件时,后者将自动存储在* .design文件中。

购买dbForge Query Builder for SQL Server正版授权的朋友可以点击" 咨询在线客服 "哦~~~

本站文章除注明转载外,均为本站原创或翻译。

欢迎任何形式的转载,但请务必注明出处,尊重他人劳动成果

文章转载自:{@@Original}

{@@OriginalPath}

Best Approach to View Corrupt MDF File in SQL Server

$
0
0

“I want to view the file size of MDF file but when I tried to retrieve the data from mdf file I am getting SQL errors. I think my master database file got corrupted as my SQL Server got abnormally shutdown few days earlier. I am not sure if I am right or not in case of Mdf file corruption. Please tell me how to view corrupt MDF file in SQL Server, also I want to know reasons behind MDF file corruption.”

MDF File corruption is one of the worst scenario any user faces as all the data of SQL database is stored in MDF File. It stores all the database components like functions, tables, triggers functions etc. But what if it get corrupted? The data present in the MDF file has high chances of getting lost if not repaired properly.

If your MDF File is corrupted and you are unable to access it then get to know how can you view corrupt MDF file. Before proceeding to the solution, you must be thinking the reason of this disaster. Let us understand what could be the possible reasons of getting SQL MDF File corrupted.

How SQL MDF File got Corrupted?

SQL MDF file is highly prone to database corruption. So one must take regular backup check of this file. Coming to the reasons, there could be number of reasons behind MDF File Corruption.

Some of them are as below:

Defected drivers Storage Media Corruption Header Corruption Sudden shutdown of SQL Server Failed Network when database is in use Mailicious attack Bugs in Server
Best Approach to View Corrupt MDF File in SQL Server

One of the major reason of getting SQL database corruption is IO Subsystem. It is estimated that 99% of corruption in mdf file is caused due to this I/O Subsystem.

Beaware from these causes! Prevent your MDF file from getting corrupted by taking regular recent backup!!

How to View Corrupt MDF File ?

We have discussed various possible reasons why MDF File got corrupted. Now let us discuss how can you view your corrupted SQL MDF File.

I have created SAMPLE2 database and made it corrupted. As you can see in the image, when I tried to retrieve the records through the SAMPLE2 database it give me Corruption error 5172.


Best Approach to View Corrupt MDF File in SQL Server

Now I am using SysTools SQL Recovery Tool to view corrupt MDF File. Here is how:

Download and Launch MDF Repair Tool and Click on Open

Free Download

100% Secure


Best Approach to View Corrupt MDF File in SQL Server
Load the corrupted MDF File to the software. Browse the location where your corrupt mdf file is located and load SAMPLE2.mdf file to the software
Best Approach to View Corrupt MDF File in SQL Server
You will two different options for scanning, Quick Scan or Advance Scan . Click on Advance Scan option for scanning as it will repair severe corrupted file.
Hit Auto Detect if you don’t know the version of your database file. Or you can choose it manually also.Click Ok
Best Approach to View Corrupt MDF File in SQL Server
You will get a scanned report of all database objects containing SAMPLE2.mdf file. You will see the number of records, database objects contains.
Best Approach to View Corrupt MDF File in SQL Server
The software will show you a brief preview of all recovered database objects. You can see all database objects like tables, functions, views etc SAMPLE2.mdf contains. Just click on the object on the right side and you will get brief preview.
Best Approach to View Corrupt MDF File in SQL Server

Hence, you are successfully be able to view all recovered database objects, that corrupt SAMPLE2.mdf file contains

Frequently Asked Questions

1. I am using SQL Server version 2014. I don’t have log file of my database. Am I be able to view corrupt mdf file of my database ?

Sol: Yes, MDF File repair tool allow you to preview .mdf file data even if you don’t have log file. Just load the file to the software and you will be able to preview all data.

2. I ran DBCC CHECKDB command with REPAIR_ALLOW_DATA_LOSS option to repair my MDF File so that I can view all the details of it. But when I execute Repair statement, it got failed in repairing my .mdf file. Can I be able to view my MDF file data?

Sol: Yes , You can repair your .mdf file by simply loading it to the software. The software scans and recover MDF File data of SQL version 2017 , 2016 & all below versions.

3. I am using SQL version 2017 and I am getting Error Msg 823 error. Is it possible to recover and view corrupt MDF File while having SQL error 823 with this software?

Sol: Yes, the software repair your MDF File if you are having Msg 823, 824, 825 in SQL Server. Not only this, the tool repair MDF File from all type of corruption like page level corruption, meta data corruption etc.

Conclusion

All those users who are looking for a way to view corrupt MDF File in SQL Server are welcomed here to find out the solution. Possible reasons of MDF getting corrupted has also been discussed. Some user queries has also been discussed for the same.

Viewing all 3160 articles
Browse latest View live