Unreported SQL Server Agent Job Failures

SQL Server Agent provides an easy-to-configure framework for notifying DBAs when jobs fail to run for any reason. Unfortunately, there are circumstances where job failures may not result in SQL Server sending a notification. For instance, the job may have been misconfigured, or perhaps the email server was down when the job failed. Either way, you think everything is fine, but that critical backup job hasn’t ran in 3 days and you’re about to need a résumé update.

Sad DBA…

Instead of thinking everything is fine, run these queries against your SQL Server instances on a regular basis to ensure you’re not missing notifications for jobs that have failed, or have been misconfigured.

This quick query shows jobs that haven’t been configured to notify the operator via email. You’ll want to look at these jobs and ensure they have been configured properly.

SELECT ServerName = @@SERVERNAME
    , JobName = sj.name
FROM msdb.dbo.sysjobs sj
WHERE sj.notify_level_email = 0 OR sj.notify_email_operator_id = 0
ORDER BY sj.name;

This query shows jobs that have failed over the past 30 days, where no notification was sent, even though the job is configured to notify the operator via email.

SELECT ServerName = jh.server  
    , JobName = j.name  
    , JobStep = js.step_id  
    , JobStepName = js.step_name  
    , SQLSeverity = jh.sql_severity  
    , JobCommand = js.command  
    , NumberOfErrors = COUNT(*)  
FROM msdb.dbo.sysjobhistory jh  
    INNER JOIN msdb.dbo.sysjobs j ON jh.job_id = j.job_id  
    INNER JOIN msdb.dbo.sysjobsteps js ON j.job_id = js.job_id AND jh.step_id = js.step_id  
WHERE jh.run_date > CONVERT(int, CONVERT(varchar(255), GETDATE(), 112)) - 30  
    AND (jh.operator_id_emailed = 0 AND j.notify_level_email > 0 AND j.notify_email_operator_id > 0)  
    AND jh.run_status <> 1 /* Success */ 
    AND jh.run_status <> 2 /* Will Retry */ 
GROUP BY jh.server  
    , j.name  
    , js.step_id  
    , js.step_name  
    , jh.sql_severity  
    , js.command  
ORDER BY j.name, js.step_id;

I’ve seen a surprising number of unreported job failures across a variety of SQL Server versions, that can’t be reliably proven to be a result of the email server being unavailable, so I’m not certain what is causing properly-configured jobs to fail without an email being sent.

In case you haven’t seen it, Microsoft has great documentation for setting up SQL Server Agent Job Notifications.

This post is part of our series on SQL Server Maintenance