Monitoring Long-Running Transactions with AI-Built Alerts

Every DBA has a story about the transaction that ate the log drive. Someone opened an explicit transaction in SSMS at 9 AM, ran an UPDATE, got pulled into a meeting, and by 11 AM the transaction log had grown to fill the disk. The query finished in milliseconds — the transaction was open for two hours, holding locks, preventing log truncation, and silently making everything worse.

Long-running queries get caught by monitoring tools. Long-running transactions often don’t — because the session looks idle. The monitoring dashboard shows green; the log drive is screaming.

I used an AI agent to build a monitoring solution that catches these before they become incidents. Here’s the prompt I started with:

What the Agent Produced

The first draft was structurally sound — it correctly joined sys.dm_tran_active_transactionssys.dm_tran_session_transactionssys.dm_exec_sessions and pulled most_recent_sql_handle from sys.dm_exec_connections. It included a LEFT JOIN to sys.dm_exec_requests to detect whether the session was actively running or idle, and aggregated lock counts from sys.dm_tran_locks.

Two things needed fixing. First, the log space calculation was pulling from sys.dm_tran_database_transactions without aggregating — if a transaction spans multiple databases, you get duplicate rows. I asked the agent to fix that:

Second, I realized the initial query would fire on every open transaction — including perfectly normal ones. Backup jobs, index rebuilds, and DBCC operations all hold transactions open for extended periods. I followed up:

The agent updated both correctly. It used s.[program_name] NOT LIKE patterns with appropriate wildcards and added a DECLARE @ThresholdMinutes int variable at the top.

What I Validated and Changed

I tested against a development instance where I deliberately opened an explicit transaction in SSMS. The query caught it within one polling cycle. Good. Then I ran it on production during a maintenance window and confirmed:

  • Index rebuild transactions showed up until I added the program_name exclusion
  • The most_recent_sql_handle text was the last batch on the connection, not the statement that opened the transaction — an important distinction I added as a comment in the script
  • Log space numbers from sys.dm_tran_database_transactions reflect per-transaction log usage, which is a different metric from DBCC SQLPERF(LOGSPACE) (total per-database log space). The per-transaction view shows how much log this transaction has generated; the SQLPERF view shows total log file utilization. Both are useful, but they measure different things — don’t expect them to match.

I also added the query text from sys.dm_exec_sql_text via the connection’s most_recent_sql_handle, since knowing what the session last ran is critical for deciding whether to kill it or wait.

The Final Solution

The key insight: session status matters more than duration alone. A sleeping session with an open transaction for 10 minutes is almost certainly an abandoned SSMS window. An executing session with a transaction open for 10 minutes might be a legitimate batch job. The status column in the alert gives you that context instantly.

Scheduling It

Create a SQL Server Agent job with a single T-SQL step containing the script above. Schedule it every 5 minutes. For the threshold, 5 minutes is a reasonable starting point for OLTP workloads — adjust upward if you have legitimate long-running batch processes that hold transactions open. If you need more granular exclusions, add specific login names or host names to the WHERE clause.

Don’t be tempted to check every 30 seconds — you’ll overwhelm Database Mail during a real incident, and a transaction that’s only been open for 30 seconds rarely needs intervention. Five minutes gives you reasonable detection time without alert flood.

Try This Yourself

  1. Open SSMS, connect to a test instance, and run BEGIN TRAN; SELECT 1; — don’t commit.
  2. Run the monitoring query manually. Your session should appear with a sleeping status.
  3. Iterate with the agent: ask it to add the databases involved in the transaction, or to flag transactions holding more than 1000 locks.
  4. Deploy as an Agent job and tune the threshold over a week — watch what fires and adjust.
  5. The common tuning pattern: start strict (5 minutes), then loosen for specific programs or logins that legitimately hold long transactions.

For the full pattern on building custom monitoring queries and tuning alert thresholds, see Post 13: Building Custom Monitoring Queries and Alerts. For how this fits into a broader monitoring strategy alongside purpose-built tools, see Post 12: AI-Native Monitoring.


Part of the ALTER DBA ADD AGENT series.