Quantcast
Channel: CodeSection,代码区,SQL Server(mssql)数据库 技术分享 - CodeSec
Viewing all articles
Browse latest Browse all 3160

SQL Server Always On Monitor The Redo_Queue_Size

$
0
0

Cause sometimes you need to check why replication is slow or halted.

For example, like during an IO storm no hurricane pun intended. Say you have transnational replication over an Always On (SQL Server Availability Groups) environment that’s a publisher.

use Jarvis go create or alter proc uspAlertRedoQueueSize as set nocount on; declare @minrds int = 5242880, @rds bigint, @dbname nvarchar(128), @node varchar(128), @msg nvarchar(max), @to varchar(max) = 'tony.stark@avengers.com' select top 1 @rds = coalesce(max(drs.redo_queue_size), 0), @dbname = db_name(drs.database_id), @node = ar.replica_server_name from sys.dm_hadr_availability_replica_states ars left join sys.availability_replicas ar on ar.replica_id = ars.replica_id left join sys.availability_groups ag on ag.group_id = ars.group_id left join sys.dm_hadr_database_replica_states drs on drs.replica_id = ars.replica_id where ars.role_desc = 'secondary' and drs.redo_queue_size > @minrds group by db_name(drs.database_id), ar.replica_server_name order by coalesce(max(drs.redo_queue_size), 0) desc; if @rds >= 10485760 begin set @msg = null; select @msg = @node + '.' + @dbname + ': ' + convert(varchar(20), @rds); exec msdb.dbo.sp_send_dbmail @recipients = @to ,@subject = 'Crit: Max redo_queue_size >= 10GB' ,@body = @msg ,@importance = 'high' end if @rds between @minrds and 10485760 begin select @msg = @node + '.' + @dbname + ': ' + convert(varchar(20), @rds); exec msdb.dbo.sp_send_dbmail @recipients = @to ,@subject = 'Warn: Max redo_queue_size > 5GB' ,@body = @msg end go

To see in the Dashboard just add the column, this is what it looks like when there’s a problem. If you’re wonder that number was 284.86GBs yup, it was a very busy day.


SQL Server Always On   Monitor The Redo_Queue_Size
Awesome PowerShell core calculation
SQL Server Always On   Monitor The Redo_Queue_Size

So to automate things, add the alert monitor as a step to a recurring SQL Agent Job to all replicas where it runs only if the node is Primary and you’ll get a nice email.

if exists (select 1 from sys.dm_hadr_availability_replica_states as ars inner join sys.availability_group_listeners as agl on ars.group_id = agl.group_id inner join sys.availability_replicas as arcn on arcn.replica_id = ars.replica_id where ars.role_desc = 'PRIMARY' and ars.operational_state_desc = 'ONLINE' and agl.dns_name = 'STARK-LG' and arcn.replica_server_name = @@SERVERNAME) begin --job step here: exec Jarvis.dbo.uspAlertRedoQueueSize end else begin print 'Server is not Primary for LG.' end

P.S. Go get ’em tiger.

emoji

Hiram


Viewing all articles
Browse latest Browse all 3160

Trending Articles