AI-Powered Backup Status Audit Across Your Fleet

Every DBA knows the sinking feeling: a production database needs a point-in-time restore, and you discover the last transaction log backup was six hours ago — not the fifteen minutes you assumed. Backup monitoring is one of those areas where “I’m pretty sure we’re covered” isn’t good enough.

I needed a comprehensive backup audit script that could run across my entire fleet and flag anything suspicious. Not just “is there a recent backup” but the harder questions: are diff backups happening on schedule? Are any databases in SIMPLE recovery that shouldn’t be? Are backups landing in unexpected locations?

Here’s the prompt I started with:

What the Agent Produced

The agent gave me a solid first draft using msdb.dbo.backupset with OUTER APPLY subqueries to pull the latest backup of each type. The compression ratio calculation used compressed_backup_size and backup_size from backupset — straightforward. It joined backupmediafamily for the physical path and used sys.databases as the base to catch databases with zero backup history.

What I liked: it properly handled databases with no rows in backupset (left joins, ISNULL wrappers). It included sys.dm_hadr_database_replica_states to identify AG databases and filtered secondary replicas out of the “missing backup” alerts — because a secondary might legitimately have no local backup history.

What needed fixing:

  • The age thresholds were hardcoded. I asked for 25 hours and 30 minutes, but in production you want these parameterized. Different environments have different SLAs.
  • It missed copy_only backups. The initial query counted copy_only full backups as the “last full backup,” which can mask a missing scheduled backup. I asked the agent to filter those out with is_copy_only = 0.
  • Compression ratio was inverted. The agent calculated compressed / uncompressed — which gives you a number less than 1. I prefer the inverse (uncompressed / compressed) so you can read it as “3.2x compression.”

The Iteration: Recovery Model Audit

This is where the conversational workflow pays off. After reviewing the first result set, I followed up:

The agent added the warning column and the state filter. I tweaked the size threshold — 1 GB was a reasonable starting point, but in my environment 500 MB made more sense since we have several small but critical reference databases.

The Final Script

Note the d_size.[page_count] alias — I’m using sys.master_files via the d_size CROSS APPLY to estimate database size without connecting to each database. The source_database_id IS NULL check excludes database snapshots, which legitimately have no backups.

AG replica caveat: If this instance hosts AG secondary replicas, filter them out of the backup gap alerts using sys.fn_hadr_backup_is_preferred_replica(). Secondaries where backups are delegated to the primary will have no local backup history, producing false positives. Add AND (sys.fn_hadr_backup_is_preferred_replica(d.[name]) = 1 OR sys.fn_hadr_backup_is_preferred_replica(d.[name]) IS NULL) to the outer WHERE clause to suppress those.

What I Validated

Before deploying this across the fleet, I verified a few things manually:

  1. The copy_only filter matters. One of our instances had a copy_only full backup taken two days ago for a migration test. Without the filter, that masked the fact that the scheduled full backup had silently failed.
  2. AG secondary behavior. Databases on a secondary replica where backups are preferred on the primary correctly showed no backup history without triggering false alarms — because the script checks the replica role.
  3. Compression ratio sanity. Highly compressible databases (staging environments full of repeated test data) showed ratios above 8x. Encrypted or already-compressed data showed ratios near 1.0x. Both expected.

To run this across the fleet, wrap it in the PowerShell pattern from Post 6 — iterate through your CMS server list, execute the query on each instance, and aggregate the results.

Try This Yourself

Start with the simplest version: run the query on a single instance and compare the results against what you think your backup situation looks like. Most DBAs find at least one surprise — a database that fell out of the maintenance plan, a log backup job that stopped after an AG failover, or a dev database someone switched to SIMPLE “temporarily” two years ago.

Then iterate. Ask the agent to add RPO violation checks specific to your SLAs. Ask it to flag backups going to local disk instead of your backup share. Ask it to highlight databases where the backup chain is broken (no full backup since the last RESTORE or detach/attach). Each refinement takes a follow-up prompt and a few minutes of review.

The goal isn’t a perfect script on the first try — it’s getting to a comprehensive audit faster than building it from scratch, then hardening it with your knowledge of what actually matters in your environment.


Part of the ALTER DBA ADD AGENT series.