Part 2: When the AI Code Reviewer Finds Bugs the Optimizer Didn’t
In Part 1, I walked through an interactive session with GitHub Copilot CLI where we optimized a production T-SQL function — cutting memory grants by 80% and logical reads by 73%. I thought we were done.
Then I asked Copilot to review its own work.
What followed was a genuinely useful code review that surfaced three critical logic bugs — bugs that existed in the original function for years, survived the optimization, and would never show up in an execution plan. This post documents that entire process: the prompt I used, the review output, how I triaged the findings, and the fixes we applied.
Setting Up the Review
After publishing the optimized function from Part 1, I asked Copilot to spawn a separate task acting as a T-SQL expert performing a code review. Here’s the exact prompt I used:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
You are a senior T-SQL expert and SQL Server MVP performing a thorough code review. Read the file C:\temp\is_primary.sql and review it as if it were submitted as a pull request. Context about this function: - It's an inline table-valued function in a server_maintenance database - It's called by every backup job, DBCC check, and maintenance task to determine if the current server should act on a given database - It handles standalone databases, local Availability Groups, and Distributed Availability Groups - @for_backup = 0 means "am I the primary?" and @for_backup = 1 means "should I run backups?" - It was recently optimized: @database_name was pushed into the CTE, ROW_NUMBER() was replaced with TOP (1) ... ORDER BY, and duplicated branch logic was factored out Please review for: 1. Correctness — Are there any logic bugs, edge cases, or semantic differences from standard AG behavior? 2. Performance — Any remaining inefficiencies, missing optimizations, or concerns about plan quality? 3. Style and readability — Naming, formatting, commenting, maintainability 4. Robustness — Edge cases around NULL handling, collation, empty result sets, non-ONLINE states 5. Potential issues — Anything that would make you push back on a PR Be critical and specific. Reference line numbers where relevant. If something is good, say so briefly, but focus your energy on things that could be improved. Output your review in a structured format with severity levels (Critical / Warning / Suggestion / Praise). |
A few things about this prompt that mattered:
- Context about the function’s role — without knowing this gates every backup job, the reviewer can’t assess severity
- The optimization history — so it knows what changed and can look for regressions
- Explicit review dimensions — correctness, performance, style, robustness, not just “review this”
- Structured output format — severity levels make triage practical
The Review Results
The review came back with 3 Critical, 4 Warning, 4 Suggestion, and 6 Praise findings. Here’s what it found.
🔴 Critical #1: NOT EXISTS blocks all secondary backup scenarios
The function had this at the top of its WHERE clause:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
WHERE NOT EXISTS ( SELECT 1 FROM [src] [s2] WHERE ( [s2].[local_ag_role] = N'SECONDARY' OR [s2].[distributed_ag_role] = N'SECONDARY' ) ) AND ( ( @for_backup = 0 AND /* primary check */ ... ) OR ( @for_backup = 1 AND /* backup preference check */ ... ) ) |
The NOT EXISTS is evaluated before the @for_backup branching. On any secondary replica, the CTE returns a row with local_ag_role = N'SECONDARY', NOT EXISTS evaluates to FALSE, and the entire WHERE clause short-circuits. The @for_backup = 1 branch is dead code on every secondary replica.
This means if your AG is configured with “Prefer Secondary” or “Secondary Only” backup preference, backups would never run on the secondary — silently.
🔴 Critical #2: “Any Replica” (NONE) backup preference never matches
The backup preference matching logic was:
|
1 2 3 |
COALESCE([src].[local_ag_role], N'PRIMARY') = COALESCE(UPPER(REPLACE([src].[local_automated_backup_preference], N'_only', N'')), N'PRIMARY') |
When automated_backup_preference_desc = N'NONE' (which means “Any Replica”):
– Right side: UPPER(REPLACE(N'NONE', N'_only', N'')) → N'NONE'
– Left side: N'PRIMARY' or N'SECONDARY'
– Neither equals N'NONE' → no match, no backups
🔴 Critical #3: “Prefer Secondary” can’t fall back to primary
The REPLACE approach treats N'SECONDARY' (Prefer Secondary, allow primary fallback) identically to N'SECONDARY_ONLY' (secondary required, never fallback). Both produce N'SECONDARY' after the replace. When all secondaries are down, the primary should take over for “Prefer Secondary” — but the role check forces N'PRIMARY' = N'SECONDARY', which fails. Backups silently stop.
Important context: these were pre-existing bugs
All three critical issues existed in the original function, not just our optimized version. The NOT EXISTS was present in both branches of the original’s duplicated logic. The REPLACE-based preference matching was unchanged. Our optimization preserved the bugs faithfully.
This is worth noting because it demonstrates something interesting: a performance optimization pass followed by a correctness review catches more than either alone. The performance work forced us to understand the function deeply enough to refactor it, and the refactoring made the logical structure clear enough for the reviewer to spot the issues.
Triaging the Full Review
The review produced 11 findings. Here’s how I triaged them:
| ID | Severity | Finding | Disposition |
|---|---|---|---|
| crit-1 | 🔴 Critical | NOT EXISTS blocks @for_backup=1 on secondaries | Fixed |
| crit-2 | 🔴 Critical | NONE (Any Replica) preference never matches | Fixed |
| crit-3 | 🔴 Critical | SECONDARY vs SECONDARY_ONLY fallback broken | Fixed |
| warn-4 | 🟡 Warning | Replace NOT EXISTS with direct row-level predicate | Fixed (solved crit-1 too) |
| warn-5 | 🟡 Warning | Missing is_distributed = 1 guard on DAG join |
Fixed |
| warn-6 | 🟡 Warning | NOT EXISTS is logically redundant for ~1 row | Fixed (solved with warn-4) |
| warn-7 | 🟡 Warning | Three-way OR fragile during transitional AG states | Won’t fix — safe-by-default is intentional |
| sug-8 | 💡 Suggestion | Nested COALESCE can be flattened | Fixed |
| sug-9 | 💡 Suggestion | Use CREATE OR ALTER | Won’t fix — SSDT handles this |
| sug-10 | 💡 Suggestion | NULL @for_backup has no guard | Won’t fix — silent no-rows is safe |
| sug-11 | 💡 Suggestion | Typo: “compatability” | Fixed |
Three findings resolved together: warn-4’s suggested fix (replace NOT EXISTS with direct predicates) simultaneously fixed crit-1 (secondary blocking) and warn-6 (redundant CTE re-expansion). One code change addressed three review items and removed the last CTE re-expansion, giving us a further performance improvement as a side effect.
The Fixes
Fix 1: Replace NOT EXISTS with scoped direct predicates (crit-1 + warn-4 + warn-6)
The reviewer suggested that since the CTE returns ~1 row per database, the NOT EXISTS (SELECT 1 FROM [src] ...) indirection is equivalent to checking the columns directly. And by scoping the secondary exclusion to @for_backup = 0 only, we fix the backup blocking:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
-- Before: WHERE NOT EXISTS ( SELECT 1 FROM [src] [s2] WHERE ( [s2].[local_ag_role] = N'SECONDARY' OR [s2].[distributed_ag_role] = N'SECONDARY' ) ) -- After: WHERE ( /* for non-backup callers, exclude databases where this server is a secondary replica */ @for_backup = 1 OR ( COALESCE([src].[local_ag_role], N'PRIMARY') <> N'SECONDARY' AND COALESCE([src].[distributed_ag_role], N'PRIMARY') <> N'SECONDARY' ) ) |
This is both a correctness fix and a performance win — it eliminates the second CTE expansion entirely. The CTE’s 7-way join now executes exactly once.
Fix 2: Rewrite backup preference matching (crit-2 + crit-3)
The original single-expression REPLACE approach couldn’t handle all four preference values correctly. The fix separates them into two categories:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
-- Before: AND COALESCE([src].[local_ag_role], N'PRIMARY') = COALESCE(UPPER(REPLACE([src].[local_automated_backup_preference], N'_only', N'')), N'PRIMARY') -- After: AND ( /* NONE (Any Replica) or SECONDARY (Prefer Secondary): trust fn_hadr_backup_is_preferred_replica */ [src].[local_automated_backup_preference] IN (N'NONE', N'SECONDARY') OR /* PRIMARY or SECONDARY_ONLY: require exact role match */ COALESCE([src].[local_ag_role], N'PRIMARY') = UPPER(REPLACE([src].[local_automated_backup_preference], N'_only', N'')) ) |
The logic for each preference value:
– PRIMARY → IN check is FALSE → falls through to role match: require local_ag_role = PRIMARY ✓
– SECONDARY_ONLY → IN check is FALSE → falls through to role match: require local_ag_role = SECONDARY ✓
– SECONDARY (Prefer Secondary) → IN check is TRUE → passes, trusts fn_hadr_backup_is_preferred_replica which handles primary fallback ✓
– NONE (Any Replica) → IN check is TRUE → passes, trusts fn_hadr_backup_is_preferred_replica ✓
– NULL (standalone, no AG) → IN check is FALSE → COALESCE(NULL, 'PRIMARY') = COALESCE(NULL, 'PRIMARY') → TRUE ✓
Fix 3: Add is_distributed guard (warn-5)
|
1 2 3 4 5 6 7 8 9 |
-- Before: LEFT JOIN [sys].[availability_groups] [dist_ag] ON [dist_ar].[group_id] = [dist_ag].[group_id] -- After: LEFT JOIN [sys].[availability_groups] [dist_ag] ON [dist_ar].[group_id] = [dist_ag].[group_id] AND [dist_ag].[is_distributed] = 1 |
Prevents false matches if a server happens to be named identically to a local AG — unlikely but possible during metadata inconsistencies.
Fix 4: Flatten nested COALESCE (sug-8)
|
1 2 3 4 5 6 |
-- Before: COALESCE(COALESCE([dist_ag].[name], [local_ag].[name]), @@SERVERNAME) -- After: COALESCE([dist_ag].[name], [local_ag].[name], @@SERVERNAME) |
Final Execution Plan Comparison
After applying all fixes, I deployed the function and compared execution plans one more time:
| Metric | Original | After Part 1 (perf only) | After Part 2 (perf + correctness) |
|---|---|---|---|
| Est. Cost | 67.4 | 1.02 | 0.39 |
| Operators | 179 | 158 | 94 |
| Memory Granted | 64,440 KB (63 MB) | 12,608 KB (12 MB) | 7,168 KB (7 MB) |
| Memory Used | 224 KB | 488 KB | 256 KB |
| Sort Operators | 3 (52K est. rows) | 1 (1 est. row) | 1 (1 est. row) |
| Total Logical Reads | 133 | 36 | 24 |
| CTE Expansions | 2 (NOT EXISTS) | 2 (NOT EXISTS) | 1 |
The correctness fixes gave us a further performance improvement as a bonus. Replacing NOT EXISTS with direct predicates eliminated the last CTE re-expansion, dropping operators from 158 to 94 and the memory grant from 12 MB to 7 MB.
Lessons Learned
Ask for a code review after optimizing, not just before. Performance work restructures code, and restructuring reveals logic that was previously hidden in duplication. The three critical bugs in this function lived inside duplicated NOT EXISTS / REPLACE blocks for years. Factoring that logic out made the issues visible.
The AI reviewer found real bugs, not just style nits. I was expecting formatting complaints and naming suggestions. Instead, all three criticals were genuine logic errors that would affect production AG backup behavior. The key was providing enough context in the prompt — without knowing the function gates backup jobs and handles AG preferences, the reviewer couldn’t have assessed the NONE/SECONDARY edge cases.
Correctness fixes can improve performance. This is counterintuitive but makes sense: the NOT EXISTS was both logically wrong (too aggressive) and a performance drag (CTE re-expansion). Fixing the logic by switching to direct predicates simultaneously eliminated the re-expansion. Sometimes the “correct” code is also the “fast” code.
Prompt engineering matters for code review. The structured review (severity levels, specific dimensions, explicit request to be critical) produced actionable output. A vague “review this function” prompt would have returned generic feedback. Telling it to act as a “senior T-SQL expert performing a PR review” set the right expectations for depth and specificity.
Not every finding needs fixing. Three items were triaged as “won’t fix” with documented rationale — the CREATE OR ALTER suggestion (deployment framework handles it), the NULL parameter guard (silent no-rows is safe), and the transitional AG state concern (safe-by-default is intentional). A good reviewer accepts that and moves on.
The Final Function
Here’s the complete function after both the performance optimization (Part 1) and the correctness fixes (Part 2):
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
CREATE FUNCTION [dbo].[is_primary] ( @database_name sysname , @for_backup bit ) RETURNS TABLE AS RETURN ( WITH [src] AS ( SELECT [database_name] = [d].[name] COLLATE SQL_Latin1_General_CP1_CI_AS , [recovery_model] = [d].[recovery_model_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [backup_folder_name] = COALESCE([dist_ag].[name], [local_ag].[name], @@SERVERNAME) COLLATE SQL_Latin1_General_CP1_CI_AS , [local_ag_name] = [local_ag].[name] COLLATE SQL_Latin1_General_CP1_CI_AS , [local_ag_role] = [local_ar_states].[role_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [local_automated_backup_preference] = [local_ag].[automated_backup_preference_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [distributed_ag_name] = [dist_ag].[name] COLLATE SQL_Latin1_General_CP1_CI_AS , [backup_preferred_replica] = COALESCE([sys].[fn_hadr_backup_is_preferred_replica]([d].[name]), 1) , [distributed_ag_role] = [dist_ar_states].[role_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [distributed_ag_state] = [dist_ar_states].[operational_state_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [dist_automated_backup_preference] = [dist_ag].[automated_backup_preference_desc] COLLATE SQL_Latin1_General_CP1_CI_AS FROM [sys].[databases] [d] LEFT JOIN [sys].[availability_databases_cluster] [adc] ON [d].[group_database_id] = [adc].[group_database_id] LEFT JOIN [sys].[availability_groups] [local_ag] ON [adc].[group_id] = [local_ag].[group_id] LEFT JOIN [sys].[availability_replicas] [local_ar] ON [local_ag].[group_id] = [local_ar].[group_id] AND [local_ar].[replica_server_name] = @@SERVERNAME LEFT JOIN [sys].[dm_hadr_availability_replica_states] [local_ar_states] ON [local_ag].[group_id] = [local_ar_states].[group_id] AND [local_ar].[replica_id] = [local_ar_states].[replica_id] LEFT JOIN [sys].[availability_replicas] [dist_ar] ON [local_ag].[name] = [dist_ar].[replica_server_name] LEFT JOIN [sys].[availability_groups] [dist_ag] ON [dist_ar].[group_id] = [dist_ag].[group_id] AND [dist_ag].[is_distributed] = 1 LEFT JOIN [sys].[dm_hadr_availability_replica_states] [dist_ar_states] ON [dist_ag].[group_id] = [dist_ar_states].[group_id] AND [dist_ar].[replica_id] = [dist_ar_states].[replica_id] WHERE [d].[name] = @database_name AND [d].[state_desc] = N'ONLINE' ) SELECT TOP (1) [src].[database_name] , [src].[backup_folder_name] , [src].[recovery_model] , [src].[local_ag_name] , [src].[local_ag_role] , [src].[local_automated_backup_preference] , [src].[distributed_ag_name] , [src].[backup_preferred_replica] , [src].[distributed_ag_role] , [src].[distributed_ag_state] , [src].[dist_automated_backup_preference] FROM [src] WHERE ( /* for non-backup callers, exclude databases where this server is a secondary replica */ @for_backup = 1 OR ( COALESCE([src].[local_ag_role], N'PRIMARY') <> N'SECONDARY' AND COALESCE([src].[distributed_ag_role], N'PRIMARY') <> N'SECONDARY' ) ) AND ( ( [src].[local_ag_name] IS NOT NULL AND [src].[distributed_ag_name] IS NULL AND [src].[distributed_ag_role] IS NULL ) OR ( [src].[local_ag_name] IS NOT NULL AND [src].[distributed_ag_name] IS NOT NULL AND [src].[distributed_ag_role] IS NOT NULL ) OR ( [src].[local_ag_name] IS NULL AND [src].[distributed_ag_name] IS NULL AND [src].[distributed_ag_role] IS NULL AND [src].[local_ag_role] IS NULL ) ) AND COALESCE([src].[distributed_ag_role], N'PRIMARY') = N'PRIMARY' AND ( ( /* return a row only if we're running on the primary instance, and the caller is asking for access to the database */ @for_backup = 0 AND COALESCE([src].[local_ag_role], N'PRIMARY') = N'PRIMARY' ) OR ( /* return a row only if we're running on the instance matching the backup preference */ @for_backup = 1 AND [src].[backup_preferred_replica] = 1 AND ( /* NONE (Any Replica) or SECONDARY (Prefer Secondary): trust fn_hadr_backup_is_preferred_replica */ [src].[local_automated_backup_preference] IN (N'NONE', N'SECONDARY') OR /* PRIMARY or SECONDARY_ONLY: require exact role match */ COALESCE([src].[local_ag_role], N'PRIMARY') = UPPER(REPLACE([src].[local_automated_backup_preference], N'_only', N'')) ) ) ) ORDER BY [src].[backup_folder_name] ); |
This post was produced through a continuation of the interactive GitHub Copilot CLI session from Part 1. Copilot was asked to spawn a sub-agent acting as a T-SQL expert code reviewer, then iteratively applied fixes based on the review findings, re-deployed to a live SQL Server instance, and verified the execution plans after each change. The review prompt, triage decisions, and all code changes shown here are from that real session.