6
Upgrading SQL Server using Availability Groups – Checklist
source link: https://thelonedba.wordpress.com/2019/10/17/upgrading-sql-server-using-availability-groups-checklist/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
This is a checklist put together and followed for an upgrade of a pair of physical SQL Server 2012 servers which hosted a single availability group of several terabytes of data with minimal downtime.
The availability group was configured with synchronous commit and automatic failover.
List of Steps
- If possible, stop any ETL processes that put data into the affected databases – less traffic here means less traffic to catch up with when you resume the AG HADR.
- Make sure all AGs hosted by the servers are on the same server
- Put servers, clusters, listeners, everything into Maintenance Mode in SCOM / SQLMonitor / Solarwinds / whatever other monitoring system you use.
- Disable Transaction Log backup jobs on all servers hosting the Availability Group(s)
- Take Log backups
- Configure the Availability Group(s) to manual failover
- Suspend data movement for each database on the secondary server
- Backup the system databases on the secondary server
- Upgrade the secondary server and reboot if required (it might not be required, but it is recommended to do this anyway)
- Resume data movement on the secondary, and allow the servers to synchronise.
- Failover the availability group(s) to the upgraded server with no data loss
- Check that data movement is suspended to the new secondary / unupgraded server due to version incompatibility
- Upgrade other server and then reboot
- Resume data movement on secondary, and allow to synchronise
- (optional) failover AG with no data loss to test all is OK (this isn’t really required, but can be useful for extra peace of mind)
- Reconfigure availability group(s) back to automatic failover (if that’s how it was)
- Re-enable transaction log backup jobs
- Take servers out of maintenance mode in SCOM/SQLMonitor/Solarwinds/Whatever
A couple of useful SQL queries
Suspend / Resume HADR for all databases on the server
This is the suspend version – change suspend to resume and rerun later
DECLARE
@sql
VARCHAR
(
MAX
) =
''
;
SELECT
@sql = @sql +
'ALTER DATABASE '
+ QUOTENAME(DB_NAME(hars.database_id)) +
' SET HADR SUSPEND;'
+
CHAR
(10) +
CHAR
(13)
FROM
sys.dm_hadr_database_replica_states
AS
hars
INNER
JOIN
sys.dm_hadr_availability_group_states
AS
hags
ON
hags.group_id = hars.group_id
WHERE
hars.is_local = 1
ORDER
BY
DB_NAME(hars.database_id);
PRINT @sql;
-- optional
EXEC
(@sql);
Check health of replicas in the AG (run on primary)
SELECT
ag.
name
,
ar.replica_server_name,
hars.is_local,
hars.role_desc,
hars.operational_state_desc,
hars.connected_state_desc,
hars.recovery_health_desc,
hars.synchronization_health_desc,
hars.last_connect_error_number,
hars.last_connect_error_description,
hars.last_connect_error_timestamp,
ag.failure_condition_level,
ag.health_check_timeout,
ar.replica_server_name,
ar.owner_sid,
ar.endpoint_url,
ar.availability_mode_desc,
ar.failover_mode_desc,
ar.session_timeout,
ar.primary_role_allow_connections_desc,
ar.secondary_role_allow_connections_desc,
ar.backup_priority,
ag.automated_backup_preference_desc,
ar.read_only_routing_url
FROM
sys.dm_hadr_availability_replica_states
AS
hars
INNER
JOIN
sys.availability_groups
AS
ag
ON
ag.group_id = hars.group_id
INNER
JOIN
sys.availability_replicas
AS
ar
ON
ar.group_id = ag.group_id
AND
ar.replica_id = hars.replica_id;
Monitor progress of catchup/synchronisation (run on primary)
SELECT
ag.
name
,
adc.database_name,
ar.replica_server_name,
hdrs.is_local,
hdrs.synchronization_state,
hdrs.synchronization_state_desc,
hdrs.is_commit_participant,
hdrs.synchronization_health_desc,
hdrs.database_state_desc,
hdrs.is_suspended,
hdrs.suspend_reason,
hdrs.suspend_reason_desc,
hdrs.last_sent_time,
hdrs.last_received_time,
hdrs.last_hardened_time,
hdrs.last_redone_time,
hdrs.log_send_queue_size,
hdrs.log_send_rate,
hdrs.redo_queue_size,
hdrs.redo_rate,
hdrs.filestream_send_rate,
CEILING(hdrs.redo_queue_size / hdrs.redo_rate) / 60.0
AS
est_redo_completion_minutes,
-- approximately
hdrs.last_commit_time,
hdrs.low_water_mark_for_ghosts,
hdrs.secondary_lag_seconds
-- for SQL 2016 and above
FROM
sys.dm_hadr_database_replica_states
AS
hdrs
INNER
JOIN
sys.availability_groups
AS
ag
ON
ag.group_id = hdrs.group_id
INNER
JOIN
sys.availability_databases_cluster
AS
adc
ON
adc.group_database_id = hdrs.group_database_id
INNER
JOIN
sys.availability_replicas
AS
ar
ON
ar.replica_id = hdrs.replica_id
WHERE
hdrs.log_send_queue_size <> 0
OR
hdrs.redo_queue_size <> 0;
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK