IS DISTINCT FROM: Comparing NULLs Without the Headache

A data engineer watches glowing data cards move along a conveyor toward a comparison gate; translucent NULL cards slip through a gap while a navy gate catches and routes them safely into the bin.

It usually shows up in a code review. Someone writes a filter that is supposed to exclude voided rows, it looks completely reasonable, and yet a tester swears that records are going missing. The query is short enough to fit on a sticky note:

Read it out loud and it sounds airtight: give me every payment whose status is not VOID. Run it against real data and it quietly drops a pile of rows that are very obviously not voided. The reviewer leans over and says “just wrap it in COALESCE, that fixes it,” and they are right, the rows come back. But a second reviewer says “no, never put a function on the column, you just killed your index.” They are also right. Both pieces of advice are correct, they appear to contradict each other, and the thing that actually explains the whole mess is one small idea about how SQL Server treats NULL in a comparison.

This post walks through that idea, the COALESCE reflex and when it genuinely costs you, and the clean modern answer that SQL Server finally grew in 2022: IS DISTINCT FROM. PostgreSQL folks have had it for years, so they get a knowing nod along the way. Everything is shown against a tiny throwaway table you can paste into a scratch database, and the only real infrastructure that appears anywhere is fictional.

The NULL Trap in a Plain Comparison

Here is the whole world for this post: a small payments table with a status column that is allowed to be NULL, because sometimes a payment lands before anyone has classified it.

Six rows. One is genuinely VOID (payment 102). Two have no status yet (payments 104 and 106). Now run the sticky-note query that asks for everything that is not VOID:

You expect five rows back: everything except payment 102. You get three.

Payments 104 and 106 vanished, and they are exactly the rows whose status is NULL. That is not a bug in SQL Server, it is three-valued logic doing precisely what the standard says. A comparison against NULL does not return TRUE or FALSE, it returns UNKNOWN. So NULL <> N'VOID' is UNKNOWN, and the WHERE clause only keeps rows where the predicate is TRUE. UNKNOWN is not TRUE, so those rows are filtered out. The query did not lie to you, it answered a slightly different question than the one in your head: it returned rows that are known to not be VOID, and a row whose status is unknown is not known to be anything.

This is the part that bites people who think of NULL as “blank” or “empty string.” It is neither. It is the absence of a value, and the absence of a value refuses to compare equal or unequal to anything, including another absence. In the six-row table that is two missing rows, easy to spot. In the two-million-row version I tested later in this post, the same innocent-looking filter silently dropped 280,000 rows whose status simply had not been recorded yet. That is the kind of thing a tester finds before you do.

The COALESCE Reflex, and When It Actually Costs You

The most common fix, the one the first reviewer reached for, is to give NULL a stand-in value before the comparison happens:

Now NULL becomes an empty string, the empty string is not equal to N'VOID', the predicate is TRUE, and the missing rows come back. Five rows, as expected. Problem solved, and for an exclusion filter like this one there is genuinely nothing wrong with it.

So why does the second reviewer flinch? Because of a habit that is usually correct: wrapping a column in a function makes the predicate non-sargable. “Sargable” is just a compact way of saying the optimizer can use an index seek to jump straight to the rows it needs, instead of reading the whole table and testing every row.[3][4][5] The moment you write COALESCE([payment_status], N''), the optimizer can no longer reason about the bare [payment_status] column, so any index on that column becomes useless for seeking on this predicate.

Here is the nuance that the “never put a function on a column” rule glosses over, and it is the whole reason both reviewers are right at the same time. Add an index and watch two different predicates:

On the six-row table above the optimizer scans no matter what you write, because reading six rows is cheaper than consulting an index. The difference only appears once the table is big enough for a seek to pay off. I loaded the same shape into a two-million-row table, with most rows SETTLED, VOID a deliberately rare 0.1 percent, and about 280,000 rows left NULL, then captured the logical reads for each of the four queries:

query access path logical reads
(1) [payment_status] = N'VOID' Index Seek 9
(2) COALESCE([payment_status], N'') = N'VOID' Index Scan 6,417
(3) [payment_status] <> N'VOID' Index Scan 5,968
(4) COALESCE([payment_status], N'') <> N'VOID' Index Scan 6,417

Nine logical reads versus six thousand four hundred is the entire argument in two numbers. Query (1) seeks straight to the VOID rows. Query (2) wraps the column in COALESCE, the seek disappears, and the engine reads the whole index instead. That is the case the second reviewer is protecting against, and the cost is real.

Queries (3) and (4) are the surprise, and the reads give it away: both scan, at roughly the same cost as the COALESCE equality. An inequality like <> N'VOID' matches almost the entire table, so the optimizer was never going to seek for it. Wrapping the column in COALESCE on an exclusion therefore costs you nothing you had not already surrendered the moment you wrote <>. The function is “free” here, not because functions are free, but because there was no seek to lose.

So the honest rule of thumb is narrower than “never put a function on a column”:

  • On a positive equality against an indexed column, a function wrap can destroy a real seek. Keep the column bare.
  • On an inequality or exclusion, the predicate was non-sargable already, so a function wrap changes nothing about the access path.

Those numbers came from SQL Server 2019, and your own reads will vary with row counts, index design, and data distribution. The shape holds regardless: a function on the column trades a seek for a scan only where a seek was possible to begin with. The point is not to memorize a verdict, it is to know which question to ask: “was a seek ever possible here?” If the answer is no, the COALESCE was never the villain.

The Clean Answer: IS DISTINCT FROM

Even when COALESCE is harmless for performance, it is still a workaround. You are inventing a sentinel value, N'', and quietly hoping no real row ever legitimately contains an empty string, because if one does, your sentinel and your data collide. What you actually wanted was a comparison that treats NULL as a normal, comparable value, so that NULL and VOID are simply considered different and NULL and NULL are considered the same.

That is exactly what IS DISTINCT FROM does, and as of SQL Server 2022 (16.x) it is built in.[1]

This returns the five rows you wanted from the start, including the two NULL rows, with no function on the column and no invented sentinel. It reads cleanly too: give me the rows whose status is distinct from VOID, and a NULL status is distinct from VOID, so it stays.

The operator comes as a matched pair. IS DISTINCT FROM is the NULL-safe “not equal,” and IS NOT DISTINCT FROM is the NULL-safe “equal.” Microsoft’s documentation lays out the full truth table, which is worth internalizing because it is the entire behavior in four rows:[1]

A B A = B A IS NOT DISTINCT FROM B
0 0 True True
0 1 False False
0 NULL Unknown False
NULL NULL Unknown True

Look at the last two rows. Plain = returns Unknown the moment NULL is involved, which is what dropped your data earlier. IS NOT DISTINCT FROM never returns Unknown: it treats NULL as a value that equals only itself. That is the property you wanted all along.

The NULL-safe “equal” side is handy for the mirror-image problem, finding the rows that are unset:

One caution before you sprinkle it everywhere. IS DISTINCT FROM on an exclusion behaves like the <> case above: it matches most of the table, so do not expect it to suddenly produce a seek. IS NOT DISTINCT FROM is the equality-shaped sibling, and whether the optimizer can seek on it against an indexed column can depend on your SQL Server build, whether you compare to a literal or a nullable variable, and the cardinality of the data. If you are leaning on it in a hot path, check the actual execution plan rather than assuming. It buys you correct NULL handling and clearer intent for certain; the access path deserves a look on real data.

The PostgreSQL Aside

If you work in both worlds, none of this is new on the PostgreSQL side. IS DISTINCT FROM and IS NOT DISTINCT FROM are SQL-standard comparison predicates, and PostgreSQL has carried them for a very long time.[2] The same query you just wrote runs there essentially unchanged:

This is one of those small features where SQL Server spent years as the odd one out, and where a PostgreSQL DBA reviewing a SQL Server 2019 codebase would be genuinely puzzled that everyone is writing it the long way. Which brings us to the long way.

Still on SQL Server 2019 or Earlier?

If IS DISTINCT FROM throws a syntax error, you are on a build older than 2022, and you have a couple of portable options. The most direct is to spell out the NULL handling by hand. For an exclusion against a non-NULL constant like N'VOID', that reduces to:

That is the everyday form, and it is what most pre-2022 code does. On the two-million-row table it returns the same 1,998,000 rows as IS DISTINCT FROM would, at the same cost: an index scan of 6,417 logical reads, identical to the COALESCE version. That is expected, since this is an exclusion and exclusions do not seek. You are matching the modern operator’s results and its access path, just with more typing. The fully general equivalent, the one that also handles the case where both sides might be NULL, is the exact expression Microsoft’s engine uses internally when it has to translate IS DISTINCT FROM for a linked server that cannot parse the syntax:[1]

There is also a tidy trick that has worked on every modern version and reads surprisingly well once you have seen it. A single-row INTERSECT compares using NULL-safe equality, so you can borrow it:

INTERSECT and EXCEPT both treat two NULLs as equal, which is the same NULL-as-a-value semantics that IS [NOT] DISTINCT FROM gives you. It is more verbose and the optimizer treats it as a semi-join, so as always, look at the plan before committing it to a hot path. But it is correct, it is portable back to old versions, and it spares you from reasoning through a three-line boolean expression every time.

If EXCEPT reads more naturally to you than NOT EXISTS, it gives you the same NULL-safe inequality while keeping a plain EXISTS. EXCEPT returns the left row only when it is distinct from the right, so the test maps straight onto “distinct from”:

On the two-million-row table this costs the same 6,417 logical reads off the same index scan as every other exclusion here, so it changes nothing about the access path. The plan shapes do differ in a small way that is worth knowing. The NOT EXISTS plus INTERSECT form collapses to a single anti-semi-join. The EXISTS plus EXCEPT form expands to a semi-join sitting on top of an anti-semi-join, because the engine realizes the EXCEPT itself as the anti-semi-join and then needs a semi-join for the outer EXISTS. The plain OR [payment_status] IS NULL is simpler than both, a single filtered index scan with no join at all. All three do the same work and read the same number of pages on an exclusion, so choose the one that is clearest to you; the optimizer charges the same either way.

The Takeaway

The rows did not really go missing. They were filtered out by a comparison that returned UNKNOWN, because NULL refuses to compare equal or unequal to anything. COALESCE papers over that by inventing a sentinel value, and on an exclusion filter it costs you nothing because the predicate was never going to seek anyway. On a positive equality it can cost you a real index seek, which is why the blanket “no functions on columns” advice exists.

IS DISTINCT FROM is the version of this you actually wanted: NULL treated as a comparable value, no sentinel, no function wrap, and intent that reads the way you think. If you are on SQL Server 2022 or later, reach for it. If you are still on 2019, the OR ... IS NULL form and the INTERSECT trick get you the same correctness until you upgrade. Either way, the lesson underneath is the one worth keeping: when rows disappear from a WHERE clause that looks obviously correct, suspect NULL first.

Have you been bitten by the disappearing-rows version of this, or found a use for IS DISTINCT FROM that goes beyond tidying up a filter? I would like to hear about it. Find me on Bluesky or LinkedIn.

References

  1. IS [NOT] DISTINCT FROM (Transact-SQL) – Microsoft Learn. Availability (SQL Server 2022 and later, Azure SQL Database, Azure SQL Managed Instance, Microsoft Fabric), the truth table, and the linked-server decode used as the pre-2022 equivalent.
  2. Comparison Functions and Operators – PostgreSQL Documentation, section 9.2. IS DISTINCT FROM and IS NOT DISTINCT FROM as SQL-standard NULL-safe comparison predicates.
  3. Sargability: Why %String% Is Slow – Brent Ozar. A practical walkthrough of why wrapping a column, or leading a LIKE with a wildcard, defeats an index seek and forces the optimizer into a full scan.
  4. Learn T-SQL With Erik: Expression SARGability – Erik Darling. How expressions buried in CTEs, derived tables, and function-wrapped columns stay non-sargable, and how a computed column can restore an index seek.
  5. Sargable – Wikipedia. The origin of the term, a contraction of “Search ARGument ABLE,” and a reference list of which operators are and are not sargable.