Archive Data by Date

Tables with a large amount of data can become unwieldy. As a result, it’s not uncommon to archive data by date into an archive table that contains older rows we care about, but don’t need for day-to-day operations. Typically, we might run a daily job to move rows from the main table into the archive table. However, the first time you run the archive job, you may need to move many millions or billions of rows. If you move that many rows in a single batch, the SQL Server transaction log may grow out of control, causing a variety of problems.

The Music Lesson

The Music Lesson

The code below shows how to efficiently archive rows based on a datetime column. The archival process moves rows, in small batches, from the main table to the archive table. If the database uses the SIMPLE recovery model, we run a CHECKPOINT operation between each batch to prevent the transaction log from growing out of control. If the database uses the FULL or BULK-LOGGED recovery model, we take a log backup to the NUL device and CHECKPOINT the database, which also prevents unreasonable log growth. Be aware, if your database uses the FULL or BULK-LOGGED recovery model, those log backups taken to the NUL device will essentially break the log backup chain. Once this script has archived all the desired rows, you should immediately take a full database backup, and a log backup to a permanent location for disaster recovery purposes.

The Setup

This first bit of code creates the dbo.Transactions and dbo.Transactions_Archived tables as part of a minimal, complete, and verifiable example. The dbo.Transactions table is populated with ~130,000 rows, with one row per minute starting 2019-01-01:

The Archive Script

Now that we have some data in the dbo.Transactions table, lets archive some of it! We’ll archive all data that is older than 180 days, into the dbo.Transactions_Archived table. The code uses DELETE with the OUTPUT ... INTO ... clause to ensure rows that are deleted from the main table are inserted into the archive table in a single, efficient operation.

The code executes two loops, the outer loop is per-day, and the inner loop is per-batch-sized-number-of-rows. Output is displayed showing the progress. The code can be stopped and re-started, and will automatically continue where it left off, allowing you to run this code over the course of several outage windows, if required.

Results

The following code summarizes the contents of each table:

You can control the number of rows that are archived per batch, by adjusting the @BatchSize parameter. I have it set to 100 because it makes a nice demo, however you probably want it set to 1,000 or 10,000 rows, depending on the size of each row, how large your transaction log is, how fast your disk subsystem is, and whether you want to prevent lock escalation.

When I ran the code on August 15th, 2019, the output looked like:

╔═══════════════════════════╦═════════════════════════╦═════════════════════════╦═════════╗
║         TableName         ║        StartDate        ║         EndDate         ║ NumRows ║
╠═══════════════════════════╬═════════════════════════╬═════════════════════════╬═════════╣
║ dbo.Transactions_Archived ║ 2019-01-01 00:00:00.000 ║ 2019-02-14 23:59:00.000 ║   64800 ║
╚═══════════════════════════╩═════════════════════════╩═════════════════════════╩═════════╝


╔══════════════════╦═════════════════════════╦═════════════════════════╦═════════╗
║    TableName     ║        StartDate        ║         EndDate         ║ NumRows ║
╠══════════════════╬═════════════════════════╬═════════════════════════╬═════════╣
║ dbo.Transactions ║ 2019-02-15 00:00:00.000 ║ 2019-03-31 23:59:00.000 ║   64800 ║
╚══════════════════╩═════════════════════════╩═════════════════════════╩═════════╝

Summary

As you can see above, half of the rows were moved to the dbo.Transactions_Archived table, while the most recent half were retained in the dbo.Transactions table. If you run this code in a different date, you’ll see different row counts in the output above, reflecting the fact that the archive script archives rows older than 180 days. I hope this script proves useful when you need to archive data by date. Let me know if you have any questions about it.

Check out the rest of our tools for SQL Server.