Performance Counters Over Time: Making sys.dm_os_performance_counters Tell the Truth

A database administrator holding a stopwatch reads a monitoring panel: a climbing odometer-style counter shows a running total while a needle gauge shows a live value and a strip-chart traces the trend over time.

Run this query on any SQL Server and look at the number it gives you for Batch Requests/sec:

On a server that has been up for a few weeks you will get something like 9,000,000,000. Nine billion batch requests per second is a fun number to put on a status report, but it is not happening. That value is not a rate at all. It is a running total that has been climbing since the service started, and the “/sec” in the name is a promise the DMV does not keep on its own. To turn it into an actual rate you have to sample it twice and do the arithmetic yourself.

sys.dm_os_performance_counters is one of the most useful and most misread views in SQL Server. It exposes the same SQLServer:* counters you would otherwise pull from Performance Monitor, but it hands them to you raw, with no formatting and no hint about what kind of number each row contains. Read it naively and you will publish nonsense. Read the cntr_type column first and it becomes a precise, scriptable monitoring source you can capture over time without leaving T-SQL. This post is about doing the second thing.

The DMV That Replaced sysperfinfo

If you have been doing this a while you may remember sys.sysperfinfo. It was deprecated and replaced by sys.dm_os_performance_counters, which is the supported way to read SQL Server’s performance counters from inside the engine.[1] Every row is one counter, and the columns that matter are:

  • object_name – the counter category, like SQLServer:Buffer Manager or SQLServer:SQL Statistics. On a named instance the prefix changes to MSSQL$InstanceName: instead of SQLServer:, which matters the moment you start filtering by it.
  • counter_name – the counter itself, like Page life expectancy. Be warned: these are stored in a fixed-width column and come back padded with trailing spaces, so an unqualified equality comparison can quietly miss.
  • instance_name – the sub-instance the counter applies to, such as a database name for the Databases object, or an empty string for global counters.
  • cntr_value – the raw value.
  • cntr_type – the WMI performance counter type, and the single most important column in the view. It tells you how to interpret cntr_value, and without it the value is just a number with no units.

Reading the view requires the VIEW SERVER STATE permission, the same permission most diagnostic DMVs need. Grant that to a monitoring login rather than handing it sysadmin.

The cntr_type Column Is the Whole Story

There are only a handful of cntr_type values you will meet in practice, and each one demands a different calculation. Here is the decoder ring:

cntr_type Meaning How to read it Example counter
65792 Instantaneous value Use cntr_value as-is. It is already the current reading. Page life expectancy, User Connections, Processes blocked
272696576 Cumulative running total Sample twice; the rate is (value2 - value1) / seconds_between_samples. Batch Requests/sec, Page reads/sec, Transactions/sec
272696320 Cumulative running total (32-bit variant) Same as above: diff two samples over elapsed seconds. Various “/sec” counters
537003264 Fraction (numerator of a ratio) Divide by its paired base and multiply by 100 for a percentage. Buffer cache hit ratio
1073874176 Average (numerator of an average) (num2 - num1) / (base2 - base1) across two samples. Average Latch Wait Time (ms), Average Wait Time (ms)
1073939712 Base (divisor only) Never report this on its own. It exists solely to pair with a fraction or average counter. Buffer cache hit ratio base

The lesson is that the same view contains at least three completely different kinds of number. A 65792 row is a snapshot you can trust immediately. A 272696576 row is a counter that only goes up, so a single read tells you almost nothing; you need two reads and the clock. And a fraction or average row is useless until you find its partner.

The Base-Counter Trap

The fraction (537003264) and average (1073874176) counters each have a companion base row, type 1073939712, that holds the denominator. Buffer cache hit ratio is the famous example: the counter named Buffer cache hit ratio is the numerator, and a second row named Buffer cache hit ratio base is the denominator. The percentage everyone quotes is numerator / base * 100.

You would hope the base row’s name was always the numerator’s name plus a tidy suffix. It is not. Pulling the fraction, average, and base rows from a live SQL Server 2019 instance shows just how inconsistent the naming is:

Lowercase “base”, title-case “Base”, uppercase “BASE”, the abbreviation “BS”, and numerators that carry an “(ms)” or “/Fetch” suffix that the base row throws away. There is no single string rule that pairs all of them correctly. This is why trying to auto-join every fraction to every base across the whole view is a trap: you will mismatch rows and compute confident, wrong percentages.

The reliable approach is to pair the specific counters you actually care about, by name, explicitly. You will only ever watch a handful of ratio counters, so spell out each pairing and move on.

Capturing Counters Over Time

Because the rate and average counters are meaningless from a single read, the smallest useful capture is two snapshots a few seconds apart. The script below takes a snapshot, waits, takes a second, and then interprets every counter according to its cntr_type: instantaneous values pass through, running totals become per-second rates, the buffer cache hit ratio is computed against its base, and the average latch wait is computed across both samples.

Notice the two ratio counters are paired by name in an explicit CASE, exactly because of the naming mess above, and the object filter uses LIKE '%:Object Name' so the same script works on a default instance (SQLServer:) and a named instance (MSSQL$Instance:) without edits.

Run against a real instance, the interesting rows come back already interpreted. These numbers are from a SQL Server 2019 box that was deliberately churned during the ten-second window:

Every value there is now something you can actually put on a chart. Page reads/sec is a true rate, not a lifetime total. Page life expectancy of 58 is the live reading, and it is low because the workload was hammering the buffer pool. Buffer cache hit ratio came out of its base pairing as a clean 100 percent. None of those would have been correct from a single naive SELECT.

Logging to a Table for the Long Haul

Two snapshots are enough to read a counter once. To watch counters over hours or days, write each snapshot to a permanent table and let SQL Server Agent run the capture on a schedule. The structure is deliberately narrow so it stays cheap to insert into and easy to query later:

Point an Agent job at EXEC [dbo].[capture_perf_counters]; on whatever cadence you need, every minute for a focused investigation, every fifteen for a long-term baseline. Capturing every counter keeps the collector dead simple and defers all the interpretation to query time, where the same cntr_type rules from the table above turn the stored rows into rates and ratios. The page compression and the narrow row keep the storage cost modest even at a one-minute cadence. Add a retention job that deletes rows older than your baseline window so the table does not grow without bound.

One caution worth stating plainly: a running-total counter resets to zero when the SQL Server service restarts. When you diff two stored samples and the second is smaller than the first, you have spanned a restart, and the right move is to discard that interval rather than report a large negative rate.

Availability Group Counters You Cannot Get From a DMV

If you run Availability Groups, two object categories in this same view are worth capturing on a schedule: SQLServer:Availability Replica and SQLServer:Database Replica. They expose replica health, and the useful insight is knowing which of their counters duplicate a DMV and which do not.

Some of what they report is available elsewhere. You can read log_send_queue_size and redo_queue_size straight from sys.dm_hadr_database_replica_states, so for a point-in-time queue depth a DMV is fine. The throughput and transport counters are the ones with no DMV equivalent. SQLServer:Availability Replica\Bytes Sent to Transport/sec, Bytes Received from Replica/sec, and Flow Control Time (ms/sec) live only in the performance counters, and on the database side SQLServer:Database Replica\Redone Bytes/sec and Log Bytes Received/sec are the rates that tell you how fast a secondary is actually draining its queues. The redo queue depth you can read from a DMV; how quickly it is shrinking you cannot.

That distinction is a good rule for deciding what to capture in general: favour the counters that expose something no DMV gives you. A queue depth is easy to get from sys.dm_hadr_database_replica_states at any moment, but a Bytes Sent to Transport/sec trend, or a Flow Control Time (ms/sec) that climbs under load, is the kind of signal only a captured-over-time counter will show. Everything above still applies, because these are ordinary performance counters: capture them on the same schedule, watch the trend rather than a single reading, and a stalling send rate or a redo throughput that cannot keep pace with the incoming log becomes a leading indicator of failover risk instead of a surprise you discover during an incident.[2]

The Gotchas Worth Repeating

  • Trailing spaces. counter_name, object_name, and instance_name are fixed-width and padded. RTRIM them on the way in, or compare with LIKE, so an exact match does not silently fail on invisible whitespace.
  • The object prefix depends on the instance. A default instance uses SQLServer:; a named instance uses MSSQL$InstanceName:. Filter on the part after the colon with LIKE '%:Buffer Manager' if you want one script for both.
  • Service restarts reset the totals. Running-total counters start over at zero on restart. Guard your diffs against negative results.
  • Never report a base counter by itself. A type 1073939712 row is a denominator with no standalone meaning.

The Takeaway

sys.dm_os_performance_counters is not hard to use, but it is easy to misuse, because it hands you running totals, instantaneous readings, and ratio numerators in the same shape and trusts you to know the difference. The cntr_type column is where that difference lives. Read it first, diff the totals across two samples, pair the ratios with their bases by name, and capture the whole set on a schedule, and you have a precise performance baseline built entirely from T-SQL, no external tooling required.

How do you baseline your instances? Rolling your own capture table like this, leaning on a third-party monitor, or living in Performance Monitor? I would like to hear what has worked for you. Find me on Bluesky or LinkedIn.

References

  1. sys.dm_os_performance_counters (Transact-SQL) – Microsoft Learn. The supported replacement for sys.sysperfinfo, its column definitions, and the permission required to read it.
  2. Use SQL Server Objects – Microsoft Learn. The catalog of SQLServer:* performance objects, including the Availability Replica and Database Replica objects and their counters.
  3. How to capture performance counters over a period of time – answer by Kin Shah on Database Administrators Stack Exchange, which noted the deprecation of sys.sysperfinfo in favor of sys.dm_os_performance_counters and prompted this deeper write-up.