39

Delete Data from Large SQL Server Tables with Minimal Logging

 5 years ago
source link: https://www.tuicool.com/articles/hit/QjqaMrb
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

By:Jim Evans |   Last Updated: 2019-02-14   |  |   Related Tips:More > Database Administration

Problem

As SQL Server DBAs or developers, we periodically are tasked with purging data from a very large table. However, typical data delete methods can cause issues with large transaction logs andcontention especially when purging a production system. My recent challenge was to purge a log table that had over 650 million records and retain only the latest 1 million rows.

Solution

My goal is to build a strategic technique to purge a very large table, retaining only a fraction of the data. On a typical SQL Server, 1 million records is not a massive amount and with my recent production challenge it was only 1/650 of the current table size. To do a fast data purge we will utilize a little used feature in SQL Server, Partition Switching . Every table in SQL Server has at least 1 partition. In SQL Server you can quickly move an entire table from one table to another using the Alter Table Switch command. This leaves the original table fully intact, but empty. Then we can move the small sub set of data from the new table back to the original. I chose to use Partition Switch vs.renaming the table because I believe it is a less intrusive method, in which the object and index definitions do not get altered or destroyed during the operation.  It is hard to know on many database systems the effects on an application of renaming objects or the effects of system cache. 

Note: it is best to do this during off peak hours! According to Microsoft "Executing a SWITCH prevents new transaction from starting and might significantly affect the workload throughput and temporarily delay access to the underlying table."

Note: In this scenario the source table is not referenced by any foreign keys.  If other tables reference the source table in a foreign key you will have a multi-table scenario which will have to be handled as to not break referential integrity.  This article is meant for a simple scenario such as large log tables without foreign key constraints.

SQL Server Fast Delete from Large Table Demo

In this demo we will create a starting table and populate it with sample data. Then create an identical 'switch' table including indexes. Next, we will move the records to the new table using ALTER TABLE SWITCH command. Show that the record move (very quickly) and the original table is empty. After that we will move a small set of data back to the original table. Last drop the un-needed switch table.

Review the requirement and restrictions for SWITCH at Microsoft: https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-table-transact-sql?view=sql-server-2017

Step 1: Create a Test Table

For this step we will create a test table: dbo.Test1.

CREATE TABLE [dbo].[Test1]  (
   Col1_id INT IDENTITY(1,1),
   Col2_D DATETIME DEFAULT GETDATE(),
   Col3_C NCHAR(20) DEFAULT 'My test desc!',
   Col4_T NVARCHAR(MAX) DEFAULT REPLICATE('0123456789', 100)
);
GO

Step 2: Load Test Data

For this limited example we will only load 5000 rows and create 2 indexes.  This also works for tables with millions of records.

--2. Load Test Data

SET NOCOUNT ON;
INSERT [dbo].[Test1]  (Col2_D, Col3_C, Col4_T)
VALUES (DEFAULT, 'My test desc!', DEFAULT);
GO 5000 
 
--Check it.
SELECT top 10 * FROM [dbo].[Test1];
SELECT COUNT(*) As 'Test1' FROM [dbo].[Test1];
GO
 
--Create Indexes
CREATE CLUSTERED INDEX [ClusteredIndex_Test1_Col2_D] ON [dbo].[Test1]
([Col2_D] ASC) 
GO
 
CREATE UNIQUE NONCLUSTERED INDEX [NonClusteredIndex_Test1_Col1_ID] ON [dbo].[Test1]
([Col1_id] ASC) 
GO

The image below shows example results and 5000 rows loaded.

myaqYvv.png!web

Step 3: Create a Duplicate Table

Create a duplicate with new names for the table and indexes. You might script the original table from SSMS and do a find and replace on the table names. Be sure to script all table attributes and indexes. You do not need to include Foreign Keys in the switch table!

--3. Now let's duplicate the original schema and change the table and index names
 
CREATE TABLE [dbo].[Test1_SWITCH] (
   Col1_ID INT IDENTITY(1,1),
   Col2_D DATETIME DEFAULT GETDATE(),
   Col3_C NCHAR(20) DEFAULT 'My test desc!',
   Col4_T NVARCHAR(MAX) DEFAULT REPLICATE('0123456789', 201)
);
 
CREATE CLUSTERED INDEX [ClusteredIndex_Test1_SWITCH_Col2_D] ON [dbo].[Test1_SWITCH]
([Col2_D] ASC) 
GO
 
CREATE UNIQUE NONCLUSTERED INDEX [NonClusteredIndex_Test1_SWITCH_Col1_ID] ON [dbo].[Test1_SWITCH]
([Col1_id] ASC) 
GO

--Check it
SELECT COUNT(*) AS 'Test1' FROM [dbo].[Test1];
SELECT COUNT(*) AS 'Test1_SWITCH' FROM [dbo].[Test1_SWITCH];

The image below shows 5000 rows in the original table and 0 rows in the new table.

67nYny7.png!web

Step 4: The SWITCH!

Run the Alter Table SWITCH command and check the counts.

--4. Switch the data to a new table
ALTER TABLE [dbo].[Test1] SWITCH to [dbo].[Test1_SWITCH];
 
SELECT COUNT(*) AS 'Test1' FROM [dbo].[Test1];
SELECT COUNT(*) AS 'Test1_SWITCH' FROM [dbo].[Test1_SWITCH];

Voilà! The records quickly moved to the new table and the original table is now empty!

7rENFb3.png!web

Step 5: Move Back!

Now we can move the small set of records back to the original table. In this case I have a table column that has an auto incrementing field, so we must use IDENTITY_INSERT to keep these values. This may not be the same in your real-world scenario.

--5. Now we can load the small subset of data back to the table.
SET NOCOUNT ON;

SET IDENTITY_INSERT [dbo].[Test1] ON;
INSERT dbo.Test1 (Col1_ID, Col2_D, Col3_C, Col4_T)
SELECT Col1_ID, Col2_D, Col3_C, Col4_T FROM [dbo].[Test1_SWITCH] WHERE Col1_ID <= 1000;
SET IDENTITY_INSERT [dbo].[Test1] OFF;
 
SELECT COUNT(*) AS 'Test1' FROM [dbo].[Test1];
SELECT COUNT(*) AS 'Test1_SWITCH' FROM [dbo].[Test1_SWITCH];

The results show 1000 rows back in the original table.

26B7Rn7.png!web

Step 6: Eliminate the Original Data!

Let drop the switch table that is no longer needed. Note the DROP writes a lot less to the transaction log versus DELETEs when done in mass amounts!

--6. Finally drop the newly created switch table. 
DROP TABLE [dbo].[Test1_SWITCH];

Last: Clean Up the Demo

For this demo we are done with the original table, so go ahead and delete it.

--X. Clean Up the Demo
DROP TABLE [dbo].[Test1]

Wrap up of how to delete millions of records in SQL Server

This concludes this demo. Obviously, your production situation with be significantly greater than this 5000-row demo. If you find yourself in a situation where you need to purge a very large table, consider this technique to optimize the run time and minimize transaction log growth. Remember to target off peak hours to avoid contention. Also, be sure to test in a lab prior to running in production. Most important, be sure you have a recoverable backup before running any data purges in production!

Additional Thoughts

Next Steps

Last Updated: 2019-02-14

J7JRjaZ.png!web

q2qQNb.png!web

About the author

mqeaQnm.jpg!web Jim Evans is an IT Manager with Stericycle who has managed DBAs, BI Developer, and Data Management teams over the past 10 years.

View all my tips

Related Resources


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK