Efficiently Update Data from OLTP to OLAP

OLTP to OLAP: Acronym Time!

Online Transaction Processing, or OLTP, serves data to clients for line-of-business applications, enabling core business functionality. Online Analytical Processing, or OLAP, consists of analyzing OLTP data with the intent of providing reporting and business intelligence to management. One of our most important tasks is providing highly up-to-date data from the OLTP system into the OLAP system. Fresh data in the OLAP system allows business management to react quickly, whereas stale data in the OLAP system prevents quick, accurate decision making.

copying data from oltp to olap

      Copy all the things from OLTP to OLAP!

The last thing you want to do to a large, busy, OLTP system is to start running reporting against it. Even though this might provide business management with the freshest data possible, it will have the unfortunate side-effect of slowing down customer interaction. That might even have a negative effect on core business functionality, such as sales via a website, etc.

Get to the point!

This post presents a method that allows you to efficiently copy rows from OLTP tables into OLAP tables without needing to compare rows to see what changes have been made.

The code below creates a set of test tables; dbo.OLTP which will be the source data, and dbo.OLAP which we’ll efficiently copy fresh data into.

Note, the dbo.OLTP table has a column named rv, that is of type rowversion1. This column is solely used to track changes made to rows, and allows us to efficiently seek to new rows via the unique, non-clustered, index named OLTP_rv.

This code inserts 50,000,000 rows into the OLTP table:

Here, I create a couple of tables to use for analytics, or OLAP:

The 2nd table, dbo.OLAP_Updates, supports multiple source and destination tables, one per row, and is used to store the maximum row version we’ve migrated from the source OLTP table into the destination OLAP table.

These statements show the state of the OLTP and OLAP tables:

Initially, the output looks like this, since we haven’t copied any rows into the OLAP table:

╔════════════════╦══════════╗
║   OLTP         ║   Rows   ║
╠════════════════╬══════════╣
║ AAAAAAAAAAA... ║ 50000000 ║
╚════════════════╩══════════╝

╔════════════════╦══════════╗
║   OLAP         ║   Rows   ║
╠════════════════╬══════════╣
║                ║          ║
╚════════════════╩══════════╝

The Fun Bit!

This code migrates new rows, as well as modified rows, in an efficient manner, using the MIN_ACTIVE_ROWVERSION()2 function:

The MIN_ACTIVE_ROWVERSION() function captures the minimum rowversion value currently active in the database. Capturing that value ensures we’ll never miss rows that have been updated in the OLTP data.

After the first run of the migrate code, we see the OLAP table contains all the rows from the OLTP table; the initial run will take a fair amount of resources if your source table has lots of pre-existing data:

╔════════════════╦══════════╗
║   OLTP         ║   Rows   ║
╠════════════════╬══════════╣
║ AAAAAAAAAAA... ║ 50000000 ║
╚════════════════╩══════════╝


╔════════════════╦══════════╗
║   OLAP         ║   Rows   ║
╠════════════════╬══════════╣
║ AAAAAAAAAAA... ║ 50000000 ║
╚════════════════╩══════════╝

The OLTP table implements a rowversion column that is automatically updated whenever a row is updated or inserted. The rowversion number is unique at the database level, and increments monotonically for all transactions that take place within the context of that database. The dbo.OLTP_Updates table is used to store the minimum row version available inside the transaction used to copy data from the OLTP table into the OLAP table. Each time this code runs it captures incremental changes. This is far more efficient than comparing all the rows in both tables using a hashing function since this method doesn’t require reading any data other than the source data that is either new, or has changed.

Note the code above uses the temporary clustered columnstore index, #cci, to prompt the query processor to use batch mode for certain operations; this is only available in SQL Server 2014+.

Results Time!

In the code below, we’re updating a portion of the 50 million rows, simulating the OLTP system making changes to data as a result of client interaction. 1 in every 1000 rows are updated to B..., 1 in every 400 rows are updated to ‘C…’, then 1 in every 201 rows are updated to D.... The results shown reflect running the rowversion-based migration code after each update.

╔════════════════╦══════════╗
║   OLTP         ║   Rows   ║
╠════════════════╬══════════╣
║ AAAAAAAAAAA... ║ 49950000 ║
║ BBBBBBBBBBB... ║ 50000    ║
╚════════════════╩══════════╝


╔════════════════╦══════════╗
║   OLAP         ║   Rows   ║
╠════════════════╬══════════╣
║ AAAAAAAAAAA... ║ 49950000 ║
║ BBBBBBBBBBB... ║ 50000    ║
╚════════════════╩══════════╝

╔════════════════╦══════════╗
║   OLTP         ║   Rows   ║
╠════════════════╬══════════╣
║ AAAAAAAAAAA... ║ 49850000 ║
║ BBBBBBBBBBB... ║ 25000    ║
║ CCCCCCCCCCC... ║ 125000   ║
╚════════════════╩══════════╝


╔════════════════╦══════════╗
║   OLAP         ║   Rows   ║
╠════════════════╬══════════╣
║ AAAAAAAAAAA... ║ 49850000 ║
║ BBBBBBBBBBB... ║ 25000    ║
║ CCCCCCCCCCC... ║ 125000   ║
╚════════════════╩══════════╝

╔════════════════╦══════════╗
║   OLTP         ║   Rows   ║
╠════════════════╬══════════╣
║ AAAAAAAAAAA... ║ 49601989 ║
║ BBBBBBBBBBB... ║ 24876    ║
║ CCCCCCCCCCC... ║ 124379   ║
║ DDDDDDDDDDD... ║ 248756   ║
╚════════════════╩══════════╝


╔════════════════╦══════════╗
║   OLAP         ║   Rows   ║
╠════════════════╬══════════╣
║ AAAAAAAAAAA... ║ 49601989 ║
║ BBBBBBBBBBB... ║ 24876    ║
║ CCCCCCCCCCC... ║ 124379   ║
║ DDDDDDDDDDD... ║ 248756   ║
╚════════════════╩══════════╝

Consider implementing an ON DELETE trigger on the OLTP table that captures the pertinent primary key values into a third table, if your OLTP system deletes rows. Something like this should work:

This code will remove the deleted rows from the dbo.OLAP table that have been deleted in the dbo.OLTP table:

Gotchas!

Ensure there is enough CPU and memory in the SQL Server instance to support both the OLTP and the OLAP load. Running both loads in the same SQL Server instance can be difficult. If resource contention on the OLTP system is a primary concern, you may want to use a linked server on the OLAP system to separate resources into two separate SQL Servers. Run the migrate code above on the OLAP system, while monitoring for resource consumption on the OLTP system so you have an understanding of the potential impact.

This method is presented as an efficient alternative to copying rows based on the hash of columns in the source table against a hash of the columns in the destination table. For extreme systems, it may be required to use SSIS or some other bulk-load method of migrating the data from the OLTP system into a staging system to perform the actual rowversion comparison work. If you do that, you’ll need to ensure the destination table in the staging system uses a binary(8) data type for the rv column to ensure the rowversion values from the OLTP system are preserved in staging.

Wrap-up!

Instead of comparing rows column-by-column, use the rowversion data type to efficiently identify rows that have been modified since the last time you looked at them.

Let me know if you have any comments or questions about this code!


1rowversion T-SQL data type
2MIN_ACTIVE_ROWVERSION() T-SQL function