Deadlock Graph Analysis with AI

Deadlock graphs contain everything you need to diagnose the problem. The resource list tells you which objects are involved. The process list tells you what each session was doing. The victim list tells you who lost. It’s all there — buried in XML that takes 10 minutes to parse manually at 2 AM when you’re getting paged.

This is a good use case for an AI agent: structured data that requires pattern recognition and explanation. You’re not asking the agent to fix anything. You’re asking it to read the XML faster than you can and explain what happened in plain language.

Collecting Deadlock Graphs

Before you can analyze a deadlock, you need to capture it. Three main approaches:

The system_health session. SQL Server captures deadlock graphs automatically in the system_health Extended Events session. This is always running — you don’t need to configure anything. The catch: it only retains a limited number of events, so if you don’t check regularly, older deadlocks roll off.

A dedicated Extended Events session. For production systems where deadlocks are frequent, I set up a dedicated session that writes to a file target with longer retention. This gives you a complete history.

Monitoring tools. If you’re running a third-party monitoring tool, it’s likely already capturing deadlock graphs. Check there first.

Feeding the Graph to the Agent

Here’s a realistic deadlock scenario. Two stored procedures update the same two tables but in opposite order — the classic deadlock pattern.

What the Agent Produces

The agent breaks down the deadlock clearly:

Resources. Two exclusive key locks — one on PK_OrderDetails (held by usp_UpdateShipment, wanted by usp_ProcessReturn) and one on PK_Orders (held by usp_ProcessReturn, wanted by usp_UpdateShipment). Both are exclusive (X) locks on clustered index keys.

The cycle. usp_ProcessReturn updated dbo.Orders first (acquiring an X lock), then tried to update dbo.OrderDetails. Meanwhile, usp_UpdateShipment updated dbo.OrderDetails first, then tried to update dbo.Orders. Classic opposite-order access pattern on the same OrderID.

The victim. Process process2d8a1c8a8 (usp_UpdateShipment) was chosen as the victim because it had lower logused (3,856 bytes vs. 4,120 bytes) — SQL Server picks the least expensive transaction to roll back by default.

The fix. The agent suggests three approaches, ranked by preference:

  1. Consistent access order. Refactor both procedures to always access dbo.Orders before dbo.OrderDetails. This eliminates the cycle entirely.
  2. Shorter transactions. If the procedures are doing other work between the two updates, restructure so the conflicting updates happen as close together as possible.
  3. Read committed snapshot isolation (RCSI). This wouldn’t help here since both processes are taking X locks for writes, but the agent correctly notes this. I’ve seen agents incorrectly suggest RCSI for write-write deadlocks — always validate.

What I Validated

The agent’s analysis was accurate. The root cause was obvious once explained: two procedures, same tables, opposite order. But I checked two things the agent couldn’t:

First, I looked at the full procedure definitions — not just the lines shown in the deadlock graph. usp_ProcessReturn had an explicit BEGIN TRANSACTION wrapping six statements when only two needed to be atomic. Narrowing the transaction scope reduced the lock hold time.

Second, I checked the deadlock frequency. This was happening 40–50 times per day during peak hours, which meant it wasn’t a rare race condition — it was a systematic design problem that needed the access-order fix, not just retry logic.

The Fixed Pattern

The key change: both usp_ProcessReturn and usp_UpdateShipment now access dbo.Orders before dbo.OrderDetails. Same order, no cycle, no deadlock.

Try This Yourself

  1. Pull a recent deadlock graph from the system_health session using the query above.
  2. Paste the full XML into your AI agent and ask for a human-readable analysis.
  3. Verify the agent’s findings against the actual procedure definitions and table schemas.
  4. Check whether the fix the agent suggests is the right fix for your situation — consistent access order isn’t always possible when third-party code is involved.
  5. After applying the fix, monitor for deadlock recurrence using an Extended Events session with a file target.

The agent reads XML faster than you do. But the decision about which fix to apply — refactor procedures, add retry logic, change isolation levels — requires understanding your application’s transaction semantics. That’s your job.

For more on AI-assisted troubleshooting workflows, see Post 8: Wait Stats, Deadlocks, and Blocking Chains. For using AI during active incidents, see Post 9: Incident Response and Root-Cause Analysis.


Part of the ALTER DBA ADD AGENT series.