6

Upgrading SQL Server using Availability Groups – Checklist

 3 years ago
source link: https://thelonedba.wordpress.com/2019/10/17/upgrading-sql-server-using-availability-groups-checklist/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

This is a checklist put together and followed for an upgrade of a pair of physical SQL Server 2012 servers which hosted a single availability group of several terabytes of data with minimal downtime.

The availability group was configured with synchronous commit and automatic failover.

List of Steps

  • If possible, stop any ETL processes that put data into the affected databases – less traffic here means less traffic to catch up with when you resume the AG HADR.
  • Make sure all AGs hosted by the servers are on the same server
  • Put servers, clusters, listeners, everything into Maintenance Mode in SCOM / SQLMonitor / Solarwinds / whatever other monitoring system you use.
  • Disable Transaction Log backup jobs on all servers hosting the Availability Group(s)
  • Take Log backups
  • Configure the Availability Group(s) to manual failover
  • Suspend data movement for each database on the secondary server
  • Backup the system databases on the secondary server
  • Upgrade the secondary server and reboot if required (it might not be required, but it is recommended to do this anyway)
  • Resume data movement on the secondary, and allow the servers to synchronise.
  • Failover the availability group(s) to the upgraded server with no data loss
  • Check that data movement is suspended to the new secondary / unupgraded server due to version incompatibility
  • Upgrade other server and then reboot
  • Resume data movement on secondary, and allow to synchronise
  • (optional) failover AG with no data loss to test all is OK (this isn’t really required, but can be useful for extra peace of mind)
  • Reconfigure availability group(s) back to automatic failover (if that’s how it was)
  • Re-enable transaction log backup jobs
  • Take servers out of maintenance mode in SCOM/SQLMonitor/Solarwinds/Whatever

A couple of useful SQL queries

Suspend / Resume HADR for all databases on the server

This is the suspend version – change suspend to resume and rerun later

DECLARE @sql VARCHAR(MAX) = '';
SELECT @sql = @sql + 'ALTER DATABASE ' + QUOTENAME(DB_NAME(hars.database_id)) + ' SET HADR SUSPEND;' + CHAR(10) + CHAR(13)
FROM       sys.dm_hadr_database_replica_states AS hars
INNER JOIN sys.dm_hadr_availability_group_states AS hags
ON hags.group_id = hars.group_id
WHERE hars.is_local = 1
ORDER BY DB_NAME(hars.database_id);
PRINT @sql; -- optional
EXEC (@sql);

Check health of replicas in the AG (run on primary)

SELECT ag.name,
ar.replica_server_name,
hars.is_local,
hars.role_desc,
hars.operational_state_desc,
hars.connected_state_desc,
hars.recovery_health_desc,
hars.synchronization_health_desc,
hars.last_connect_error_number,
hars.last_connect_error_description,
hars.last_connect_error_timestamp,
ag.failure_condition_level,
ag.health_check_timeout,
ar.replica_server_name,
ar.owner_sid,
ar.endpoint_url,
ar.availability_mode_desc,
ar.failover_mode_desc,
ar.session_timeout,
ar.primary_role_allow_connections_desc,
ar.secondary_role_allow_connections_desc,
ar.backup_priority,
ag.automated_backup_preference_desc,
ar.read_only_routing_url
FROM sys.dm_hadr_availability_replica_states AS hars
INNER JOIN sys.availability_groups AS ag
ON ag.group_id = hars.group_id
INNER JOIN sys.availability_replicas AS ar
ON ar.group_id = ag.group_id
AND ar.replica_id = hars.replica_id;

Monitor progress of catchup/synchronisation (run on primary)

SELECT ag.name,
adc.database_name,
ar.replica_server_name,
hdrs.is_local,
hdrs.synchronization_state,
hdrs.synchronization_state_desc,
hdrs.is_commit_participant,
hdrs.synchronization_health_desc,
hdrs.database_state_desc,
hdrs.is_suspended,
hdrs.suspend_reason,
hdrs.suspend_reason_desc,
hdrs.last_sent_time,
hdrs.last_received_time,
hdrs.last_hardened_time,
hdrs.last_redone_time,
hdrs.log_send_queue_size,
hdrs.log_send_rate,
hdrs.redo_queue_size,
hdrs.redo_rate,
hdrs.filestream_send_rate,
CEILING(hdrs.redo_queue_size / hdrs.redo_rate) / 60.0 AS est_redo_completion_minutes, -- approximately
hdrs.last_commit_time,
hdrs.low_water_mark_for_ghosts,
hdrs.secondary_lag_seconds -- for SQL 2016 and above
FROM sys.dm_hadr_database_replica_states AS hdrs
INNER JOIN sys.availability_groups AS ag
ON ag.group_id = hdrs.group_id
INNER JOIN sys.availability_databases_cluster AS adc
ON adc.group_database_id = hdrs.group_database_id
INNER JOIN sys.availability_replicas AS ar
ON ar.replica_id = hdrs.replica_id
WHERE hdrs.log_send_queue_size <> 0
OR hdrs.redo_queue_size <> 0;

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK