Deleting a Column: Meta-data-only operation

Microsoft’s SQL Server development team is constantly working to improve performance. One important performance optimization methodology that has been eminent in SQL Server for a long time has been making as many meta-data-only operations as possible. That is, instead of modifying every page in a table when modifying a table definition, we should only change the definition of the table itself, and change each page in future when we need to rewrite the page for some other reason. The impact of meta-data-only operations can be seen in a variety of places, such as adding a nullable column to a table. Or adding a non-nullable column that has a default value defined. Or, as will be shown in this post, dropping an existing column from a table.

Greek Girls on the Shore, by Joaquin Sorolla, 1895

Greek Girls on the Shore, by Joaquin Sorolla, 1895

Dropping a column that is not referenced by any other object lets the storage engine simply mark the column definition as no longer present. Deleting the meta-data invalidates the procedure cache. Any query that subsequently references the affected table will result in the plan for that query be recompiled. The recompile operation can only return columns that currently exist in the table. As a result, the storage engine skips the bytes stored in each page for the dropped column, as if the column no longer exists.

If the table is rebuilt the rebuild operation ignores those bytes allocated to the dropped column. When a DML operation deletes a row, or modifies a value, the page affected is re-written without the de-allocated bytes.

This effectively spreads the load of dropping the column over time, making it less noticeable.

Caveat Emptor

There are circumstances where you cannot drop a column, such as when the column is included in an index, or when you’ve manually created a statistics object for the column. One example shows the error that is presented when attempting to alter a column that has a manually created statistics object. The same semantics apply when dropping a column – if the column is referenced by any other object, it cannot simply be dropped. The referencing object must be altered first so it no longer references the column to be dropped. At that point, the column can be dropped.

The Script

The code below shows how meta-data-only operations can be validated by looking at the transaction log. It uses the undocumented fn_dblog command to show records recorded in the transaction log.

It creates a table, inserts a row, and displays the transaction log contents of those operations. The code then drops the column and shows the transaction log afterwards. It also shows the effect of modifying an existing row after the column is deleted to show how pages are re-written on an as-needed-basis afterwards.

This post is part of our series on Database Internals.

Remus Rusanu has a great blog post about reading and understanding the transaction log with fn_dblog.