How to Corrupt a Database

2019-07-29 · by Hannah Vernon · in tools

Microsoft goes to great lengths to ensure data integrity remains its paramount concern with SQL Server data. After all, ACID compliance doesn’t mean much if the underlying data is corrupt. Having said that, there are certainly points-of-failure in every computer system that make corruption inevitable given enough activity over enough time. This post talks about how to corrupt a database, so you’re prepared when it happens in production.

The Hopes of the Party, by James Gillray. No Corruption Visible Here.

Learning how to deal with a corrupt database can be one of the most challenging tasks a DBA will ever undertake. Typically, corrupt databases aren’t a big deal outside of production – you simply restore a new masked-data-copy from production and continue on. But, what if it’s your business’ primary production database that suddenly has corruption, and you’d rather not have to restore a backup? The best way to be prepared for something like that is to have an intentionally corrupted database that you can use for learning purposes. This post shows you how to corrupt a database so you can discover what works best for you before you’re in a Critical Moment™.

The Setup

First things first; let’s create a database:

USE master;
GO
IF DB_ID(N'corrupt_db') IS NOT NULL
BEGIN
    ALTER DATABASE corrupt_db SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE corrupt_db;
END
CREATE DATABASE corrupt_db
ON (
    NAME = N'corrupt_db_system'
    , FILENAME = N'C:\data\corrupt_db_system.mdf'
    , SIZE = 10MB
    , MAXSIZE = 100MB
    , FILEGROWTH = 10MB
    )
LOG ON (
    NAME = N'corrupt_db_log'
    , FILENAME = N'C:\data\corrupt_db_log.ldf'
    , SIZE = 10MB
    , MAXSIZE = 100MB
    , FILEGROWTH = 10MB
    );

ALTER DATABASE corrupt_db
SET RECOVERY SIMPLE;

ALTER DATABASE corrupt_db
SET PAGE_VERIFY CHECKSUM;

BACKUP DATABASE corrupt_db
TO DISK = N'NUL:';
GO

USE master;

IF DB_ID(N'corrupt_db') IS NOT NULL

BEGIN

ALTER DATABASE corrupt_db SET SINGLE_USER WITH ROLLBACK IMMEDIATE;

DROP DATABASE corrupt_db;

END

CREATE DATABASE corrupt_db

ON (

NAME = N'corrupt_db_system'

, FILENAME = N'C:\data\corrupt_db_system.mdf'

, SIZE = 10MB

, MAXSIZE = 100MB

, FILEGROWTH = 10MB

)

LOG ON (

NAME = N'corrupt_db_log'

, FILENAME = N'C:\data\corrupt_db_log.ldf'

, SIZE = 10MB

, MAXSIZE = 100MB

, FILEGROWTH = 10MB

);

ALTER DATABASE corrupt_db

SET RECOVERY SIMPLE;

ALTER DATABASE corrupt_db

SET PAGE_VERIFY CHECKSUM;

BACKUP DATABASE corrupt_db

TO DISK = N'NUL:';

Since we need some data to corrupt, let’s create a clustered index and add some data to it. In fact, we’ll add 2,000 pages or 16MB of data.

USE corrupt_db;

CREATE TABLE dbo.corrupt_table
(
    rownum int NOT NULL
        CONSTRAINT corrupt_table_pk
        PRIMARY KEY
        CLUSTERED
        WITH (
              DATA_COMPRESSION = NONE
            , FILLFACTOR = 100
            , PAD_INDEX = OFF
            )
        IDENTITY(1,1)
    , rowdata1 char(8000) NULL
    , rowdata2 char(48) NULL
) ON [PRIMARY];

;WITH source_rows AS
(   /* generate 2,000 rows */
    SELECT [number] = v1.n * 1000
                    + v2.n * 100
                    + v3.n * 10
                    + v4.n + 1
    FROM           (VALUES (0), (1)) v1(n)
        CROSS JOIN (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v2(n)
        CROSS JOIN (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v3(n)
        CROSS JOIN (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v4(n)
)
INSERT INTO dbo.corrupt_table (rowdata1, rowdata2)
SELECT REPLICATE(CHAR(0x11), 8000)
    , REPLICATE(CHAR(0x22), 48)
FROM source_rows;
GO

USE corrupt_db;

CREATE TABLE dbo.corrupt_table

(

rownum int NOT NULL

CONSTRAINT corrupt_table_pk

PRIMARY KEY

CLUSTERED

WITH (

DATA_COMPRESSION = NONE

, FILLFACTOR = 100

, PAD_INDEX = OFF

)

IDENTITY(1,1)

, rowdata1 char(8000) NULL

, rowdata2 char(48) NULL

) ON [PRIMARY];

;WITH source_rows AS

( /* generate 2,000 rows */

SELECT [number] = v1.n * 1000

+ v2.n * 100

+ v3.n * 10

+ v4.n + 1

FROM (VALUES (0), (1)) v1(n)

CROSS JOIN (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v2(n)

CROSS JOIN (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v3(n)

CROSS JOIN (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v4(n)

)

INSERT INTO dbo.corrupt_table (rowdata1, rowdata2)

SELECT REPLICATE(CHAR(0x11), 8000)

, REPLICATE(CHAR(0x22), 48)

FROM source_rows;

Let’s Corrupt It!

SQL Server includes a built-in undocumented command purpose-built for writing to database pages directly, DBCC WRITEPAGE. That command can be used with equal efficacy to corrupt a page, or to fix corruption on a page.

The syntax for DBCC WRITEPAGE is:

DBCC WRITEPAGE(<database_id | database_name>, <file_id>, <page_id>, <offset>, <length:, <value>, <write_direct>);

1	DBCC WRITEPAGE(<database_id \| database_name>, <file_id>, <page_id>, <offset>, <length:, <value>, <write_direct>);

The “offset” parameter indicates the starting address on the page where we should start writing data. The “length” parameter indicates how much data we’ll be writing. The “value” parameter is a hexadecimal value to write into the page; the length of this parameter must match the “length” parameter. The “write_direct” parameter should be 0 if you want DBCC WRITEPAGE to write to a buffer in memory, or 1 if you can put the database into single-user-mode, and have DBCC WRITEPAGE write directly to the data on-disk.

If you attempt to use “write_direct” with a value of 1, but don’t have the database in single user mode, you’ll receive the following error:

Msg 7919, Level 16, State 5, Line 91
Repair statement not processed. Database needs to be in single user mode.

This next piece of code will thoroughly corrupt the dbo.corrupt_table by placing the database into single user mode, then running DBCC WRITEPAGE to randomly alter a single byte on each page of the table. We’re inserting the list of pages used by the table into a temporary table since we won’t be able to directly access the corrupt_db database while using DBCC WRITEPAGE. We then iterate a cursor over the temporary table, calling DBCC WRITEPAGE once against each page. The DBCC WRITEPAGE writes a randomly generated single byte to a randomly chosen location within the page.

DROP TABLE IF EXISTS #table_pages;
DECLARE @dbid int = DB_ID(N'corrupt_db');
SELECT dpa.allocated_page_file_id
    , dpa.allocated_page_page_id
INTO #table_pages
FROM corrupt_db.sys.schemas s  
    INNER JOIN corrupt_db.sys.objects o ON o.schema_id = s.schema_id
CROSS APPLY sys.dm_db_database_page_allocations(@dbid, o.object_id, NULL, NULL, 'DETAILED') dpa
WHERE o.name = N'corrupt_table'
    AND s.name = N'dbo'
    AND dpa.page_type_desc = N'DATA_PAGE';

USE master;
ALTER DATABASE corrupt_db SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
GO

DECLARE @dbid int = DB_ID(N'corrupt_db');
DECLARE @fileid int;
DECLARE @pageid int;
DECLARE @offset int;
DECLARE @value varbinary(1);
DECLARE cur CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY
FOR
SELECT dpa.allocated_page_file_id
    , dpa.allocated_page_page_id
FROM #table_pages dpa
OPEN cur;
FETCH NEXT FROM cur INTO @fileid, @pageid;
WHILE @@FETCH_STATUS = 0
BEGIN
    SET @offset = CONVERT(int, CRYPT_GEN_RANDOM(2)) % 8192;
    SET @value = CRYPT_GEN_RANDOM(1);
    DBCC WRITEPAGE (@dbid, @fileid, @pageid, @offset, 1, @value, 1) WITH NO_INFOMSGS;
    FETCH NEXT FROM cur INTO @fileid, @pageid;
END
CLOSE cur;
DEALLOCATE cur;
GO

ALTER DATABASE corrupt_db SET MULTI_USER;
GO

DROP TABLE IF EXISTS #table_pages;

DECLARE @dbid int = DB_ID(N'corrupt_db');

SELECT dpa.allocated_page_file_id

, dpa.allocated_page_page_id

INTO #table_pages

FROM corrupt_db.sys.schemas s

INNER JOIN corrupt_db.sys.objects o ON o.schema_id = s.schema_id

CROSS APPLY sys.dm_db_database_page_allocations(@dbid, o.object_id, NULL, NULL, 'DETAILED') dpa

WHERE o.name = N'corrupt_table'

AND s.name = N'dbo'

AND dpa.page_type_desc = N'DATA_PAGE';

USE master;

ALTER DATABASE corrupt_db SET SINGLE_USER WITH ROLLBACK IMMEDIATE;

DECLARE @dbid int = DB_ID(N'corrupt_db');

DECLARE @fileid int;

DECLARE @pageid int;

DECLARE @offset int;

DECLARE @value varbinary(1);

DECLARE cur CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY

FOR

SELECT dpa.allocated_page_file_id

, dpa.allocated_page_page_id

FROM #table_pages dpa

OPEN cur;

FETCH NEXT FROM cur INTO @fileid, @pageid;

WHILE @@FETCH_STATUS = 0

BEGIN

SET @offset = CONVERT(int, CRYPT_GEN_RANDOM(2)) % 8192;

SET @value = CRYPT_GEN_RANDOM(1);

DBCC WRITEPAGE (@dbid, @fileid, @pageid, @offset, 1, @value, 1) WITH NO_INFOMSGS;

FETCH NEXT FROM cur INTO @fileid, @pageid;

END

CLOSE cur;

DEALLOCATE cur;

ALTER DATABASE corrupt_db SET MULTI_USER;

What Happened?

The last statement in the code above brings the database back into multi user mode, so we can query the table:

USE corrupt_db;
SELECT *
FROM dbo.corrupt_table;

USE corrupt_db;

SELECT *

FROM dbo.corrupt_table;

The first page the query attempts to read is detected as corrupt, via the checksum calculation that is performed when each page is read into the buffer pool:

Msg 824, Level 24, State 2, Line 135
SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x00caae6e; actual: 0x80caae26). It occurred during a read of page (1:328) in database ID 15 at offset 0x00000000290000 in file 'C:\data\corrupt_db_system.mdf'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

DBCC CHECKDB, super-hero?

Next, lets run a DBCC CHECKDB against the database, as instructed in the error message:

DBCC CHECKDB(N'corrupt_db') WITH NO_INFOMSGS, ALL_ERRORMSGS;

1	DBCC CHECKDB(N'corrupt_db') WITH NO_INFOMSGS, ALL_ERRORMSGS;

Hmmm… seems there are a boat-load of errors that need fixing. As detailed in the last line of the output, repair_allow_data_loss is the minimum level of repair that will bring the database back to a consistent state. Let’s try that, but first we need to put the database into single user mode:

USE master;
ALTER DATABASE corrupt_db SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DBCC CHECKDB(N'corrupt_db', repair_allow_data_loss) WITH NO_INFOMSGS, ALL_ERRORMSGS;
ALTER DATABASE corrupt_db SET MULTI_USER;
GO

USE master;

ALTER DATABASE corrupt_db SET SINGLE_USER WITH ROLLBACK IMMEDIATE;

DBCC CHECKDB(N'corrupt_db', repair_allow_data_loss) WITH NO_INFOMSGS, ALL_ERRORMSGS;

ALTER DATABASE corrupt_db SET MULTI_USER;

Ok, the output reports a bunch of pages were fixed. Below is a portion of the output showing items that were repaired in the database.

Repair: The page (1:2354) has been deallocated from object ID 565577053, index ID 1, partition ID 72057594041270272, alloc unit ID 72057594046906368 (type In-row data).
Repair: The page (1:2355) has been deallocated from object ID 565577053, index ID 1, partition ID 72057594041270272, alloc unit ID 72057594046906368 (type In-row data).
Repair: The page (1:2356) has been deallocated from object ID 565577053, index ID 1, partition ID 72057594041270272, alloc unit ID 72057594046906368 (type In-row data).
Repair: The page (1:2357) has been deallocated from object ID 565577053, index ID 1, partition ID 72057594041270272, alloc unit ID 72057594046906368 (type In-row data).
Repair: The page (1:2358) has been deallocated from object ID 565577053, index ID 1, partition ID 72057594041270272, alloc unit ID 72057594046906368 (type In-row data).
Msg 8945, Level 16, State 1, Line 140
Table error: Object ID 565577053, index ID 1 will be rebuilt.
        The error has been repaired.
Msg 8928, Level 16, State 1, Line 140
Object ID 565577053, index ID 1, partition ID 72057594041270272, alloc unit ID 72057594046906368 (type In-row data): Page (1:328) could not be processed.  See other errors for details.
        The error has been repaired.
Msg 8939, Level 16, State 98, Line 140
Table error: Object ID 565577053, index ID 1, partition ID 72057594041270272, alloc unit ID 72057594046906368 (type In-row data), page (1:328). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 2057 and -4.
        The error has been repaired.

Now let’s look at the table, with a simple SELECT statement:

USE corrupt_db;
SELECT *
FROM dbo.corrupt_table;

USE corrupt_db;

SELECT *

FROM dbo.corrupt_table;

Ok, so some rows still exist, but a whole bunch have been deleted by the DBCC CHECKDB with repair_allow_data_loss option. Since we’re randomly choosing an offset and value to write to each page, the effects of that corruption will vary each time you run the test code.

In the above example, where we lost data, you might need to choose between restoring the entire database or restoring a temporary copy of the database somewhere, so you can copy data from the temporary copy into the production copy. That may or may not work for the business, but it should definitely be a business decision.

Summary

Using DBCC WRITEPAGE in a variety of ways to intentionally corrupt a database is an important learning exercise. Determining a plan for dealing with corruption, and executing the plan to see the outcome, is priceless experience.

Check out the rest of our tools, and let me know if you have any questions or suggestions about this post. Thanks for reading, and please share this post if you found it useful!

How to Corrupt a Database

The Setup

Let’s Corrupt It!

What Happened?

DBCC CHECKDB, super-hero?

Summary

Related

One Response

Search

Categories

Pages

Meta

How to Corrupt a Database

The Setup

Let’s Corrupt It!

What Happened?

DBCC CHECKDB, super-hero?

Summary

Share this:

Related

One Response

Search

Categories

Pages

Meta