Performance Counters Over Time: Making sys.dm_os_performance_counters Tell the Truth

Run this query on any SQL Server and look at the number it gives you for Batch Requests/sec:
|
1 2 3 4 5 6 7 |
SELECT [counter_name] , [cntr_value] FROM [sys].[dm_os_performance_counters] WHERE [counter_name] = N'Batch Requests/sec'; |
On a server that has been up for a few weeks you will get something like 9,000,000,000. Nine billion batch requests per second is a fun number to put on a status report, but it is not happening. That value is not a rate at all. It is a running total that has been climbing since the service started, and the “/sec” in the name is a promise the DMV does not keep on its own. To turn it into an actual rate you have to sample it twice and do the arithmetic yourself.
sys.dm_os_performance_counters is one of the most useful and most misread views in SQL Server. It exposes the same SQLServer:* counters you would otherwise pull from Performance Monitor, but it hands them to you raw, with no formatting and no hint about what kind of number each row contains. Read it naively and you will publish nonsense. Read the cntr_type column first and it becomes a precise, scriptable monitoring source you can capture over time without leaving T-SQL. This post is about doing the second thing.
The DMV That Replaced sysperfinfo
If you have been doing this a while you may remember sys.sysperfinfo. It was deprecated and replaced by sys.dm_os_performance_counters, which is the supported way to read SQL Server’s performance counters from inside the engine.[1] Every row is one counter, and the columns that matter are:
object_name– the counter category, likeSQLServer:Buffer ManagerorSQLServer:SQL Statistics. On a named instance the prefix changes toMSSQL$InstanceName:instead ofSQLServer:, which matters the moment you start filtering by it.counter_name– the counter itself, likePage life expectancy. Be warned: these are stored in a fixed-width column and come back padded with trailing spaces, so an unqualified equality comparison can quietly miss.instance_name– the sub-instance the counter applies to, such as a database name for the Databases object, or an empty string for global counters.cntr_value– the raw value.cntr_type– the WMI performance counter type, and the single most important column in the view. It tells you how to interpretcntr_value, and without it the value is just a number with no units.
Reading the view requires the VIEW SERVER STATE permission, the same permission most diagnostic DMVs need. Grant that to a monitoring login rather than handing it sysadmin.
The cntr_type Column Is the Whole Story
There are only a handful of cntr_type values you will meet in practice, and each one demands a different calculation. Here is the decoder ring:
| cntr_type | Meaning | How to read it | Example counter |
|---|---|---|---|
| 65792 | Instantaneous value | Use cntr_value as-is. It is already the current reading. |
Page life expectancy, User Connections, Processes blocked |
| 272696576 | Cumulative running total | Sample twice; the rate is (value2 - value1) / seconds_between_samples. |
Batch Requests/sec, Page reads/sec, Transactions/sec |
| 272696320 | Cumulative running total (32-bit variant) | Same as above: diff two samples over elapsed seconds. | Various “/sec” counters |
| 537003264 | Fraction (numerator of a ratio) | Divide by its paired base and multiply by 100 for a percentage. | Buffer cache hit ratio |
| 1073874176 | Average (numerator of an average) | (num2 - num1) / (base2 - base1) across two samples. |
Average Latch Wait Time (ms), Average Wait Time (ms) |
| 1073939712 | Base (divisor only) | Never report this on its own. It exists solely to pair with a fraction or average counter. | Buffer cache hit ratio base |
The lesson is that the same view contains at least three completely different kinds of number. A 65792 row is a snapshot you can trust immediately. A 272696576 row is a counter that only goes up, so a single read tells you almost nothing; you need two reads and the clock. And a fraction or average row is useless until you find its partner.
The Base-Counter Trap
The fraction (537003264) and average (1073874176) counters each have a companion base row, type 1073939712, that holds the denominator. Buffer cache hit ratio is the famous example: the counter named Buffer cache hit ratio is the numerator, and a second row named Buffer cache hit ratio base is the denominator. The percentage everyone quotes is numerator / base * 100.
You would hope the base row’s name was always the numerator’s name plus a tidy suffix. It is not. Pulling the fraction, average, and base rows from a live SQL Server 2019 instance shows just how inconsistent the naming is:
|
1 2 3 4 5 6 7 8 9 |
counter_name cntr_type Buffer cache hit ratio 537003264 Buffer cache hit ratio base 1073939712 <- lowercase "base" Average Latch Wait Time (ms) 1073874176 Average Latch Wait Time Base 1073939712 <- "Base", and the numerator's "(ms)" is dropped Average Wait Time (ms) 1073874176 Average Wait Time Base 1073939712 <- same pattern, per lock resource Time delete FileTable item BASE 1073939712 <- all caps "BASE" Avg. Length of Batched Writes BS 1073939712 <- abbreviated "BS" |
Lowercase “base”, title-case “Base”, uppercase “BASE”, the abbreviation “BS”, and numerators that carry an “(ms)” or “/Fetch” suffix that the base row throws away. There is no single string rule that pairs all of them correctly. This is why trying to auto-join every fraction to every base across the whole view is a trap: you will mismatch rows and compute confident, wrong percentages.
The reliable approach is to pair the specific counters you actually care about, by name, explicitly. You will only ever watch a handful of ratio counters, so spell out each pairing and move on.
Capturing Counters Over Time
Because the rate and average counters are meaningless from a single read, the smallest useful capture is two snapshots a few seconds apart. The script below takes a snapshot, waits, takes a second, and then interprets every counter according to its cntr_type: instantaneous values pass through, running totals become per-second rates, the buffer cache hit ratio is computed against its base, and the average latch wait is computed across both samples.
Notice the two ratio counters are paired by name in an explicit CASE, exactly because of the naming mess above, and the object filter uses LIKE '%:Object Name' so the same script works on a default instance (SQLServer:) and a named instance (MSSQL$Instance:) without edits.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
SET NOCOUNT ON; IF OBJECT_ID(N'tempdb..#perf', N'U') IS NOT NULL BEGIN DROP TABLE #perf; END; CREATE TABLE #perf ( [sample_id] tinyint NOT NULL , [sample_time] datetime2(3) NOT NULL , [object_name] nvarchar(128) NOT NULL , [counter_name] nvarchar(128) NOT NULL , [instance_name] nvarchar(128) NOT NULL , [cntr_value] bigint NOT NULL , [cntr_type] int NOT NULL ); /* ---- snapshot 1 ---- */ INSERT INTO #perf ( [sample_id], [sample_time], [object_name], [counter_name] , [instance_name], [cntr_value], [cntr_type] ) SELECT 1 , SYSDATETIME() , RTRIM([object_name]) , RTRIM([counter_name]) , RTRIM([instance_name]) , [cntr_value] , [cntr_type] FROM [sys].[dm_os_performance_counters]; WAITFOR DELAY '00:00:10'; /* ---- snapshot 2 ---- */ INSERT INTO #perf ( [sample_id], [sample_time], [object_name], [counter_name] , [instance_name], [cntr_value], [cntr_type] ) SELECT 2 , SYSDATETIME() , RTRIM([object_name]) , RTRIM([counter_name]) , RTRIM([instance_name]) , [cntr_value] , [cntr_type] FROM [sys].[dm_os_performance_counters]; /* ---- interpret each counter by its cntr_type ---- */ DECLARE @elapsed_seconds decimal(18, 3) = ( SELECT DATEDIFF(millisecond, MIN([sample_time]), MAX([sample_time])) / 1000.0 FROM #perf ); ;WITH [s1] AS ( SELECT [object_name], [counter_name], [instance_name], [cntr_value], [cntr_type] FROM #perf WHERE [sample_id] = 1 ) , [s2] AS ( SELECT [object_name], [counter_name], [instance_name], [cntr_value], [cntr_type] FROM #perf WHERE [sample_id] = 2 ) SELECT [object_name] = [s2].[object_name] , [counter_name] = [s2].[counter_name] , [instance_name] = [s2].[instance_name] , [interpreted] = CASE WHEN [s2].[cntr_type] = 65792 THEN CONVERT(decimal(18, 3), [s2].[cntr_value]) WHEN [s2].[cntr_type] IN (272696576, 272696320) THEN CONVERT(decimal(18, 3), ([s2].[cntr_value] - [s1].[cntr_value]) / NULLIF(@elapsed_seconds, 0)) WHEN [s2].[cntr_type] = 537003264 THEN CONVERT(decimal(18, 3), [s2].[cntr_value] * 100.0 / NULLIF([b2].[cntr_value], 0)) WHEN [s2].[cntr_type] = 1073874176 THEN CONVERT(decimal(18, 3), ([s2].[cntr_value] - [s1].[cntr_value]) * 1.0 / NULLIF([b2].[cntr_value] - [b1].[cntr_value], 0)) END FROM [s2] INNER JOIN [s1] ON [s1].[object_name] = [s2].[object_name] AND [s1].[counter_name] = [s2].[counter_name] AND [s1].[instance_name] = [s2].[instance_name] LEFT JOIN [s2] AS [b2] ON [s2].[cntr_type] IN (537003264, 1073874176) AND [b2].[cntr_type] = 1073939712 AND [b2].[object_name] = [s2].[object_name] AND [b2].[instance_name] = [s2].[instance_name] AND [b2].[counter_name] = CASE WHEN [s2].[counter_name] = N'Buffer cache hit ratio' THEN N'Buffer cache hit ratio base' WHEN [s2].[counter_name] = N'Average Latch Wait Time (ms)' THEN N'Average Latch Wait Time Base' END LEFT JOIN [s1] AS [b1] ON [b1].[cntr_type] = 1073939712 AND [b1].[object_name] = [b2].[object_name] AND [b1].[counter_name] = [b2].[counter_name] AND [b1].[instance_name] = [b2].[instance_name] WHERE [s2].[object_name] LIKE N'%:SQL Statistics' OR [s2].[object_name] LIKE N'%:Buffer Manager' OR ([s2].[object_name] LIKE N'%:Databases' AND [s2].[instance_name] = N'_Total') OR [s2].[object_name] LIKE N'%:Latches' OR [s2].[object_name] LIKE N'%:General Statistics' ORDER BY [s2].[object_name] , [s2].[counter_name]; DROP TABLE #perf; |
Run against a real instance, the interesting rows come back already interpreted. These numbers are from a SQL Server 2019 box that was deliberately churned during the ten-second window:
|
1 2 3 4 5 6 7 8 9 10 11 12 |
object_name counter_name instance_name interpreted ------------------------- ----------------------------- -------------- ----------- SQLServer:Buffer Manager Buffer cache hit ratio 100.000 SQLServer:Buffer Manager Page life expectancy 58.000 SQLServer:Buffer Manager Page reads/sec 3157.156 SQLServer:Databases Transactions/sec _Total 10.694 SQLServer:General Statis. Processes blocked 0.000 SQLServer:General Statis. User Connections 9.000 SQLServer:Latches Average Latch Wait Time (ms) 0.514 SQLServer:SQL Statistics Batch Requests/sec 4.192 SQLServer:SQL Statistics SQL Compilations/sec 5.390 SQLServer:SQL Statistics SQL Re-Compilations/sec 1.882 |
Every value there is now something you can actually put on a chart. Page reads/sec is a true rate, not a lifetime total. Page life expectancy of 58 is the live reading, and it is low because the workload was hammering the buffer pool. Buffer cache hit ratio came out of its base pairing as a clean 100 percent. None of those would have been correct from a single naive SELECT.
Logging to a Table for the Long Haul
Two snapshots are enough to read a counter once. To watch counters over hours or days, write each snapshot to a permanent table and let SQL Server Agent run the capture on a schedule. The structure is deliberately narrow so it stays cheap to insert into and easy to query later:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
CREATE DATABASE [DBA_Monitoring]; GO USE [DBA_Monitoring]; GO CREATE TABLE [dbo].[perf_counter_history] ( [perf_counter_history_id] bigint NOT NULL IDENTITY(1, 1) CONSTRAINT [PK_perf_counter_history] PRIMARY KEY CLUSTERED , [collected_at] datetime2(3) NOT NULL CONSTRAINT [DF_perf_counter_history_collected_at] DEFAULT (SYSDATETIME()) , [object_name] nvarchar(128) NOT NULL , [counter_name] nvarchar(128) NOT NULL , [instance_name] nvarchar(128) NOT NULL , [cntr_value] bigint NOT NULL , [cntr_type] int NOT NULL ) WITH (DATA_COMPRESSION = PAGE); GO CREATE NONCLUSTERED INDEX [IX_perf_counter_history_lookup] ON [dbo].[perf_counter_history] ( [counter_name] , [instance_name] , [collected_at] ) INCLUDE ( [cntr_value] , [cntr_type] ) WITH ( DATA_COMPRESSION = PAGE ); GO CREATE PROCEDURE [dbo].[capture_perf_counters] AS BEGIN SET NOCOUNT ON; INSERT INTO [dbo].[perf_counter_history] ( [object_name] , [counter_name] , [instance_name] , [cntr_value] , [cntr_type] ) SELECT RTRIM([object_name]) , RTRIM([counter_name]) , RTRIM([instance_name]) , [cntr_value] , [cntr_type] FROM [sys].[dm_os_performance_counters]; END; GO |
Point an Agent job at EXEC [dbo].[capture_perf_counters]; on whatever cadence you need, every minute for a focused investigation, every fifteen for a long-term baseline. Capturing every counter keeps the collector dead simple and defers all the interpretation to query time, where the same cntr_type rules from the table above turn the stored rows into rates and ratios. The page compression and the narrow row keep the storage cost modest even at a one-minute cadence. Add a retention job that deletes rows older than your baseline window so the table does not grow without bound.
One caution worth stating plainly: a running-total counter resets to zero when the SQL Server service restarts. When you diff two stored samples and the second is smaller than the first, you have spanned a restart, and the right move is to discard that interval rather than report a large negative rate.
Availability Group Counters You Cannot Get From a DMV
If you run Availability Groups, two object categories in this same view are worth capturing on a schedule: SQLServer:Availability Replica and SQLServer:Database Replica. They expose replica health, and the useful insight is knowing which of their counters duplicate a DMV and which do not.
Some of what they report is available elsewhere. You can read log_send_queue_size and redo_queue_size straight from sys.dm_hadr_database_replica_states, so for a point-in-time queue depth a DMV is fine. The throughput and transport counters are the ones with no DMV equivalent. SQLServer:Availability Replica\Bytes Sent to Transport/sec, Bytes Received from Replica/sec, and Flow Control Time (ms/sec) live only in the performance counters, and on the database side SQLServer:Database Replica\Redone Bytes/sec and Log Bytes Received/sec are the rates that tell you how fast a secondary is actually draining its queues. The redo queue depth you can read from a DMV; how quickly it is shrinking you cannot.
That distinction is a good rule for deciding what to capture in general: favour the counters that expose something no DMV gives you. A queue depth is easy to get from sys.dm_hadr_database_replica_states at any moment, but a Bytes Sent to Transport/sec trend, or a Flow Control Time (ms/sec) that climbs under load, is the kind of signal only a captured-over-time counter will show. Everything above still applies, because these are ordinary performance counters: capture them on the same schedule, watch the trend rather than a single reading, and a stalling send rate or a redo throughput that cannot keep pace with the incoming log becomes a leading indicator of failover risk instead of a surprise you discover during an incident.[2]
The Gotchas Worth Repeating
- Trailing spaces.
counter_name,object_name, andinstance_nameare fixed-width and padded.RTRIMthem on the way in, or compare withLIKE, so an exact match does not silently fail on invisible whitespace. - The object prefix depends on the instance. A default instance uses
SQLServer:; a named instance usesMSSQL$InstanceName:. Filter on the part after the colon withLIKE '%:Buffer Manager'if you want one script for both. - Service restarts reset the totals. Running-total counters start over at zero on restart. Guard your diffs against negative results.
- Never report a base counter by itself. A type 1073939712 row is a denominator with no standalone meaning.
The Takeaway
sys.dm_os_performance_counters is not hard to use, but it is easy to misuse, because it hands you running totals, instantaneous readings, and ratio numerators in the same shape and trusts you to know the difference. The cntr_type column is where that difference lives. Read it first, diff the totals across two samples, pair the ratios with their bases by name, and capture the whole set on a schedule, and you have a precise performance baseline built entirely from T-SQL, no external tooling required.
How do you baseline your instances? Rolling your own capture table like this, leaning on a third-party monitor, or living in Performance Monitor? I would like to hear what has worked for you. Find me on Bluesky or LinkedIn.
References
- sys.dm_os_performance_counters (Transact-SQL) – Microsoft Learn. The supported replacement for
sys.sysperfinfo, its column definitions, and the permission required to read it. ↩ - Use SQL Server Objects – Microsoft Learn. The catalog of SQLServer:* performance objects, including the Availability Replica and Database Replica objects and their counters. ↩
- How to capture performance counters over a period of time – answer by Kin Shah on Database Administrators Stack Exchange, which noted the deprecation of
sys.sysperfinfoin favor ofsys.dm_os_performance_countersand prompted this deeper write-up. ↩