Wait Stats, Deadlocks, and Blocking Chains: AI-Assisted Diagnosis

It’s 2 PM on a Tuesday. The application team says “the database is slow.” You pull up wait stats and see CXPACKET through the roof. Or you get paged at midnight because the monitoring system flagged a deadlock. Or you open your inbox to find a blocking chain screenshot from a developer who doesn’t know what they’re looking at.

This is the troubleshooting workflow every DBA knows. The AI agent doesn’t replace your diagnostic skills — but it can speed up the analysis phase when you’re under pressure.

One important baseline note: wait stats should be captured over a specific interval, not read blindly since the last service restart. If you’re not already doing differential captures, that’s step zero.

Wait Stats Analysis

Wait stats are the first thing most DBAs check when diagnosing performance problems. The data itself is easy to collect — the interpretation is where experience matters.

The agent flags CXPACKET as parallel query overhead — though on modern SQL Server, CXPACKET alone isn’t actionable; you need to check CXCONSUMER and correlate with actual query plans, DOP settings, and parallelism skew. It identifies PAGEIOLATCH_SH as physical page read waits that could indicate scan-heavy plans, missing indexes, memory pressure, or actual storage latency — recommend correlating with sys.dm_io_virtual_file_stats to distinguish. It notes that ASYNC_NETWORK_IO is usually a client-side consumption issue (row-by-row fetching from the app tier), not a SQL Server problem. And it calls out LCK_M_IX as lock contention worth investigating.

More importantly, it suggests next steps: check sys.dm_exec_query_stats for the highest-CPU queries driving the parallelism, review MAXDOP and cost threshold for parallelism settings, look at Query Store for plan regressions.

Is the agent always right? No. But it helps me work through the decision tree faster, especially when I’m fielding Teams messages at the same time.

Deadlock XML Analysis

Deadlock graphs are powerful diagnostic data wrapped in an impenetrable XML format. Most DBAs can read them, but it takes time — and at 2 AM, time is the one thing you don’t have.

The agent parses the XML, identifies the resources, and explains the cycle: Process 1 holds an exclusive key lock on OrderDetails and wants a lock on Inventory; Process 2 holds an exclusive key lock on Inventory and wants a lock on OrderDetails. It identifies the access pattern and suggests fixes: ensure both code paths acquire locks in the same order, reduce the transaction scope, or consider whether a missing index is extending lock duration.

A caveat: the inputbuf only shows the waiting statement, not the full transaction history. The reason each process touched the second table might be a trigger, a foreign key enforcement, or an earlier statement not shown in the XML. The agent’s explanation is a plausible hypothesis, not a certainty — you may need the full execution stack from Extended Events to confirm.

One thing the post would be incomplete without: your application should have retry logic for error 1205 (deadlock victim). Deadlocks happen in any concurrent system. The question isn’t whether they’ll occur — it’s whether your app handles them gracefully. If you don’t have retry logic, that’s a more important fix than the lock ordering.

Note: this is a simplified deadlock graph for illustration. Real-world deadlock XML is usually messier — proc names in the execution stack, parameter values, hostname/login context, and lock modes beyond the clean X/X pattern. The agent handles messy XML just as well; I’ve cleaned this one up so the cycle is easy to follow.

Blocking Chain Triage

Long blocking chains during business hours are one of the most stressful DBA scenarios. You need to identify the head blocker, understand what it’s doing, and decide whether to kill it — fast.

The agent identifies session 55 as the head blocker — and critically, notes that it’s sleeping with an open transaction, not actively executing. That’s a common pattern: an application opened a transaction, did some work, and never committed — possibly waiting on an external service call or a user interaction. Session 55 holds an exclusive lock from its UPDATE, and everything else — the S locks from the report queries, the IX lock from the audit INSERT — is incompatible with that X lock and has to wait.

Before reaching for KILL, check the rollback cost. If session 55 has been running a large UPDATE, killing it means SQL Server has to roll back all that work — which can take longer than the original statement and keeps the locks held the entire time. Check sys.dm_tran_active_transactions for the transaction’s log usage before deciding.

This is the kind of analysis I’d do myself — but having it done instantly means I can focus on the decision rather than the data gathering. One rule I follow: never let the agent make kill/no-kill decisions unreviewed. It can recommend, but the KILL command comes from me.

Try This Yourself: The next time you run sp_WhoIsActive during a blocking event, copy the output and paste it into the agent. Compare its analysis to your own.

Generating Targeted Fixes

Once you’ve identified the problem, the agent helps you build the fix.

The agent generates all three options with qualitative trade-off analysis. It can’t give you precise write-performance estimates without knowing your table size, DML rates, and existing index count — but it can tell you “a wide covering index on a write-heavy table will hurt more than a narrow filtered index.” Your job is to evaluate which option fits your actual workload.

The Agent as a Calm Second Opinion

During an incident, the most valuable thing the agent provides isn’t speed — it’s a checklist under pressure. It helps me avoid tunnel vision on the first hypothesis. It summarizes evidence while I’m handling the incident channel. It doesn’t skip steps because someone is asking “is the database down?” for the third time.

One tip that’s easy to overlook: you can paste screenshots directly into the agent’s chat interface. That SSMS execution plan with the fat arrows? Paste it. The Activity Monitor showing a wall of blocked sessions? Paste it. The agent reads the image and incorporates what it sees into its analysis. It’s not a substitute for actual plan XML or DMV output — text data is always more reliable — but during an incident when you need a quick read on what you’re looking at, a screenshot gets you a faster first pass than exporting and formatting.

For follow-up evidence gathering, point the agent at Query Store, Extended Events, or the blocked process report — these give it the data it needs to move beyond initial triage into root cause.

For ongoing monitoring between incidents, SqlServerAgMonitor tracks AG health continuously — the kind of baseline data that helps you spot drift before it becomes a 2 AM page.

Deep Dives

Want to go deeper? These companion posts walk through specific scenarios in detail:


Next up: Incident Response: Root Cause Analysis with an AI Partner — using AI to build post-mortems and RCA documents from diagnostic data.


Part of the ALTER DBA ADD AGENT series — Previous: Legacy Code