Can GitHub Copilot Optimize Your T-SQL? I Put It to the Test
I’ve been curious about how well AI coding assistants handle real-world T-SQL optimization — not textbook examples, but production code with system DMVs, Availability Groups, and windowed functions. So I pointed GitHub Copilot CLI at one of my own functions and asked it to analyze, critique, and optimize it.
What followed was an iterative process with some surprises — including an “optimization” that made the estimated cost worse, a memory grant that was 300× larger than needed, and a final version that cut logical reads by 73% and memory grants by 80%.
Here’s the full journey.
The Original Function: is_primary()
This inline table-valued function is the gating mechanism in my server_maintenance database. Every backup job, DBCC check, and maintenance task calls it to answer one question: “Should I run on this server, for this database?”
It handles three scenarios:
- The server is the PRIMARY replica for a database in an Availability Group
- The database isn’t in any AG (standalone)
- The server matches the AG’s automated backup preference (when
@for_backup = 1)
The function joins seven system DMVs/catalog views to resolve both local and distributed Availability Group topologies:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
CREATE FUNCTION [dbo].[is_primary] ( @database_name sysname , @for_backup bit ) RETURNS TABLE AS RETURN ( WITH [src] AS ( SELECT [database_name] = [d].[name] COLLATE SQL_Latin1_General_CP1_CI_AS , [recovery_model] = [d].[recovery_model_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [backup_folder_name] = COALESCE(COALESCE([dist_ag].[name], [local_ag].[name]), @@SERVERNAME) COLLATE SQL_Latin1_General_CP1_CI_AS , [local_ag_name] = [local_ag].[name] COLLATE SQL_Latin1_General_CP1_CI_AS , [local_ag_role] = [local_ar_states].[role_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [local_automated_backup_preference] = [local_ag].[automated_backup_preference_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [distributed_ag_name] = [dist_ag].[name] COLLATE SQL_Latin1_General_CP1_CI_AS , [backup_preferred_replica] = COALESCE([sys].[fn_hadr_backup_is_preferred_replica]([d].[name]), 1) , [distributed_ag_role] = [dist_ar_states].[role_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [distributed_ag_state] = [dist_ar_states].[operational_state_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [dist_automated_backup_preference] = [dist_ag].[automated_backup_preference_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [row] = ROW_NUMBER() OVER (PARTITION BY [d].[name] ORDER BY COALESCE([dist_ag].[name], [local_ag].[name], [d].[name])) FROM [sys].[databases] [d] LEFT JOIN [sys].[availability_databases_cluster] [adc] ON [d].[group_database_id] = [adc].[group_database_id] LEFT JOIN [sys].[availability_groups] [local_ag] ON [adc].[group_id] = [local_ag].[group_id] LEFT JOIN [sys].[availability_replicas] [local_ar] ON [local_ag].[group_id] = [local_ar].[group_id] AND [local_ar].[replica_server_name] = @@SERVERNAME LEFT JOIN [sys].[dm_hadr_availability_replica_states] [local_ar_states] ON [local_ag].[group_id] = [local_ar_states].[group_id] AND [local_ar].[replica_id] = [local_ar_states].[replica_id] LEFT JOIN [sys].[availability_replicas] [dist_ar] ON [local_ag].[name] = [dist_ar].[replica_server_name] LEFT JOIN [sys].[availability_groups] [dist_ag] ON [dist_ar].[group_id] = [dist_ag].[group_id] LEFT JOIN [sys].[dm_hadr_availability_replica_states] [dist_ar_states] ON [dist_ag].[group_id] = [dist_ar_states].[group_id] AND [dist_ar].[replica_id] = [dist_ar_states].[replica_id] WHERE [d].[state_desc] = N'ONLINE' ) SELECT [src].[database_name] , [src].[backup_folder_name] , [src].[recovery_model] , [src].[local_ag_name] , [src].[local_ag_role] , [src].[local_automated_backup_preference] , [src].[distributed_ag_name] , [src].[backup_preferred_replica] , [src].[distributed_ag_role] , [src].[distributed_ag_state] , [src].[dist_automated_backup_preference] FROM [src] WHERE [src].[database_name] = @database_name AND [src].[row] = 1 AND ( ( @for_backup = 0 AND COALESCE([src].[local_ag_role], N'PRIMARY') = N'PRIMARY' AND COALESCE([src].[distributed_ag_role], N'PRIMARY') = N'PRIMARY' AND ( ( [src].[local_ag_name] IS NOT NULL AND [src].[distributed_ag_name] IS NULL AND [src].[distributed_ag_role] IS NULL ) OR ( [src].[local_ag_name] IS NOT NULL AND [src].[distributed_ag_name] IS NOT NULL AND [src].[distributed_ag_role] IS NOT NULL ) OR ( [src].[local_ag_name] IS NULL AND [src].[distributed_ag_name] IS NULL AND [src].[distributed_ag_role] IS NULL AND [src].[local_ag_role] IS NULL ) ) AND NOT EXISTS ( SELECT 1 FROM [src] WHERE [src].[database_name] = @database_name AND ( [src].[local_ag_role] = N'SECONDARY' OR [src].[distributed_ag_role] = N'SECONDARY' ) ) ) OR ( @for_backup = 1 AND [src].[backup_preferred_replica] = 1 AND COALESCE([src].[local_ag_role], N'PRIMARY') = COALESCE(UPPER(REPLACE([src].[local_automated_backup_preference], N'_only', N'')), N'PRIMARY') AND COALESCE([src].[distributed_ag_role], N'PRIMARY') = N'PRIMARY' AND ( /* ...same three-way topology check repeated... */ ) AND NOT EXISTS ( /* ...same NOT EXISTS repeated... */ ) ) ) ); |
I asked Copilot: “Do you think it is efficient?”
Stage 1: Copilot’s Initial Analysis
Copilot identified three concerns, in order of significance:
1. The CTE is expanded twice (NOT EXISTS re-evaluation)
The NOT EXISTS (SELECT 1 FROM [src] ...) subquery appears in both the @for_backup = 0 and @for_backup = 1 branches. In SQL Server, CTEs are not materialized — they’re re-expanded inline each time they’re referenced. That means the entire 7-way join across the system DMVs executes twice: once for the main SELECT, and again for each NOT EXISTS check.
2. Duplicated branch logic (~80% identical)
The @for_backup = 0 and @for_backup = 1 branches share nearly all their logic — the three-way AG topology check, the NOT EXISTS subquery, and the distributed AG role check. The only real difference is one line that checks backup_preferred_replica and matches the AG role to the backup preference. This duplication is a maintenance risk.
3. Late predicate filtering
The @database_name filter is applied in the outer SELECT (WHERE [src].[database_name] = @database_name), not inside the CTE. This means the CTE’s 7-way join processes all online databases before filtering down to the one we care about.
Copilot’s verdict: “Efficient enough for its purpose” since the system DMVs are small in-memory metadata. But it had ideas for improvement.
Stage 2: The First Rewrite — Windowed Functions and Predicate Pushdown
Copilot’s first rewrite made three structural changes:
Change 1: Push @database_name into the CTE
|
1 2 3 4 5 6 7 |
-- Before (in outer SELECT): WHERE [src].[database_name] = @database_name -- After (in CTE WHERE clause): WHERE [d].[name] = @database_name AND [d].[state_desc] = N'ONLINE' |
This lets the optimizer filter to a single database before performing the 7-way join.
Change 2: Replace NOT EXISTS with a windowed flag
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
-- Before: NOT EXISTS (SELECT 1 FROM [src] WHERE ... role = N'SECONDARY') -- After (added to CTE SELECT): , [has_secondary] = MAX( CASE WHEN [local_ar_states].[role_desc] = N'SECONDARY' OR [dist_ar_states].[role_desc] = N'SECONDARY' THEN 1 ELSE 0 END ) OVER (PARTITION BY [d].[name]) -- And in outer WHERE: AND [src].[has_secondary] = 0 |
This eliminates the second CTE expansion entirely — the secondary check is computed once as a window aggregate.
Change 3: Factor out shared conditions
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
-- Shared conditions, written once: WHERE [src].[row] = 1 AND [src].[has_secondary] = 0 AND (/* three-way AG topology check */) AND COALESCE([src].[distributed_ag_role], N'PRIMARY') = N'PRIMARY' -- Only the branch-specific difference remains: AND ( ( @for_backup = 0 AND COALESCE([src].[local_ag_role], N'PRIMARY') = N'PRIMARY' ) OR ( @for_backup = 1 AND [src].[backup_preferred_replica] = 1 AND COALESCE([src].[local_ag_role], N'PRIMARY') = COALESCE(UPPER(REPLACE([src].[local_automated_backup_preference], N'_only', N'')), N'PRIMARY') ) ) |
This looked like a clean improvement on paper. Then I asked Copilot to actually run it and compare execution plans.
Stage 3: The Surprise — Estimated Cost Goes UP
I had Copilot deploy both versions to my local SQL Server instance and capture actual execution plans with SET STATISTICS XML ON and SET STATISTICS IO ON.
The results were… unexpected:
| Metric | Original | v2 (Rewrite) |
|---|---|---|
| Estimated Cost | 67.4 | 114.4 ⚠️ |
| Operators | 179 | 104 |
| CPU time | 15–16 ms | 0 ms |
| Elapsed time | 6–7 ms | 3–4 ms |
| Total Logical Reads | 133 | 26 |
| Memory Granted | 64,440 KB | 72,888 KB |
| Memory Used | 224 KB | 224 KB |
The estimated cost nearly doubled, yet actual performance was better across every real metric. The optimizer was wrong.
Why? The MAX() OVER() windowed aggregate and ROW_NUMBER() require Sort operators. The optimizer priced these Sorts based on its estimate of 52,000 rows flowing through the CTE (it can’t predict DMV cardinality accurately). In reality, with @database_name pushed into the CTE, only 1 row flows through.
The predicate pushdown slashed logical reads by ~5×, but the Sort operators inflated the cost estimate. This is a textbook example of why estimated cost is not execution time.
Stage 4: Tackling the Real Problem — The Memory Grant
Both versions shared one glaring issue: excessive memory grants. The original requested 64 MB but used only 224 KB — a 300× overallocation. Every call to this function (and it’s called for every database on every maintenance job) needlessly reserves tens of megabytes from the workspace memory pool.
The root cause was the windowed functions:
ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)requires a Sort operatorMAX(...) OVER (PARTITION BY ...)requires a Segment + Sequence Project- The optimizer estimates these will process 52K rows, so it requests memory proportional to 52K × row_size
The fix was to eliminate all windowed functions:
Replace ROW_NUMBER with TOP 1
|
1 2 3 4 5 6 7 8 9 10 11 12 |
-- Before (CTE + outer filter): , [row] = ROW_NUMBER() OVER (PARTITION BY [d].[name] ORDER BY COALESCE(...)) ... WHERE [src].[row] = 1 -- After (outer SELECT): SELECT TOP 1 ... FROM [src] WHERE ... ORDER BY [src].[backup_folder_name] |
A TOP 1 ... ORDER BY uses a Top N Sort, which only maintains a buffer of N rows (in this case, 1). Its memory footprint is negligible regardless of the estimated input size.
Replace the windowed has_secondary flag with NOT EXISTS
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
-- Before (windowed aggregate in CTE): , [has_secondary] = MAX(CASE WHEN ... THEN 1 ELSE 0 END) OVER (PARTITION BY [d].[name]) ... WHERE [src].[has_secondary] = 0 -- After (NOT EXISTS in outer WHERE): WHERE NOT EXISTS ( SELECT 1 FROM [src] [s2] WHERE ( [s2].[local_ag_role] = N'SECONDARY' OR [s2].[distributed_ag_role] = N'SECONDARY' ) ) |
Wait — didn’t the original use NOT EXISTS too? Yes, but the original’s CTE processed all databases, so re-expanding it was expensive. Now that @database_name is inside the CTE, re-expanding it only processes one database’s worth of rows through the 7-way join. The cost is trivial.
This gave us zero windowed functions, zero full Sort operators, and only a single cheap Top N Sort.
The Final Version
Here’s the complete optimized function:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
CREATE FUNCTION [dbo].[is_primary] ( @database_name sysname , @for_backup bit ) RETURNS TABLE AS RETURN ( WITH [src] AS ( SELECT [database_name] = [d].[name] COLLATE SQL_Latin1_General_CP1_CI_AS , [recovery_model] = [d].[recovery_model_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [backup_folder_name] = COALESCE(COALESCE([dist_ag].[name], [local_ag].[name]), @@SERVERNAME) COLLATE SQL_Latin1_General_CP1_CI_AS , [local_ag_name] = [local_ag].[name] COLLATE SQL_Latin1_General_CP1_CI_AS , [local_ag_role] = [local_ar_states].[role_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [local_automated_backup_preference] = [local_ag].[automated_backup_preference_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [distributed_ag_name] = [dist_ag].[name] COLLATE SQL_Latin1_General_CP1_CI_AS , [backup_preferred_replica] = COALESCE([sys].[fn_hadr_backup_is_preferred_replica]([d].[name]), 1) , [distributed_ag_role] = [dist_ar_states].[role_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [distributed_ag_state] = [dist_ar_states].[operational_state_desc] COLLATE SQL_Latin1_General_CP1_CI_AS , [dist_automated_backup_preference] = [dist_ag].[automated_backup_preference_desc] COLLATE SQL_Latin1_General_CP1_CI_AS FROM [sys].[databases] [d] LEFT JOIN [sys].[availability_databases_cluster] [adc] ON [d].[group_database_id] = [adc].[group_database_id] LEFT JOIN [sys].[availability_groups] [local_ag] ON [adc].[group_id] = [local_ag].[group_id] LEFT JOIN [sys].[availability_replicas] [local_ar] ON [local_ag].[group_id] = [local_ar].[group_id] AND [local_ar].[replica_server_name] = @@SERVERNAME LEFT JOIN [sys].[dm_hadr_availability_replica_states] [local_ar_states] ON [local_ag].[group_id] = [local_ar_states].[group_id] AND [local_ar].[replica_id] = [local_ar_states].[replica_id] LEFT JOIN [sys].[availability_replicas] [dist_ar] ON [local_ag].[name] = [dist_ar].[replica_server_name] LEFT JOIN [sys].[availability_groups] [dist_ag] ON [dist_ar].[group_id] = [dist_ag].[group_id] LEFT JOIN [sys].[dm_hadr_availability_replica_states] [dist_ar_states] ON [dist_ag].[group_id] = [dist_ar_states].[group_id] AND [dist_ar].[replica_id] = [dist_ar_states].[replica_id] WHERE [d].[name] = @database_name AND [d].[state_desc] = N'ONLINE' ) SELECT TOP 1 [src].[database_name] , [src].[backup_folder_name] , [src].[recovery_model] , [src].[local_ag_name] , [src].[local_ag_role] , [src].[local_automated_backup_preference] , [src].[distributed_ag_name] , [src].[backup_preferred_replica] , [src].[distributed_ag_role] , [src].[distributed_ag_state] , [src].[dist_automated_backup_preference] FROM [src] WHERE NOT EXISTS ( SELECT 1 FROM [src] [s2] WHERE ( [s2].[local_ag_role] = N'SECONDARY' OR [s2].[distributed_ag_role] = N'SECONDARY' ) ) AND ( ( [src].[local_ag_name] IS NOT NULL AND [src].[distributed_ag_name] IS NULL AND [src].[distributed_ag_role] IS NULL ) OR ( [src].[local_ag_name] IS NOT NULL AND [src].[distributed_ag_name] IS NOT NULL AND [src].[distributed_ag_role] IS NOT NULL ) OR ( [src].[local_ag_name] IS NULL AND [src].[distributed_ag_name] IS NULL AND [src].[distributed_ag_role] IS NULL AND [src].[local_ag_role] IS NULL ) ) AND COALESCE([src].[distributed_ag_role], N'PRIMARY') = N'PRIMARY' AND ( ( /* return a row only if we're running on the primary instance, and the caller is asking for access to the database */ @for_backup = 0 AND COALESCE([src].[local_ag_role], N'PRIMARY') = N'PRIMARY' ) OR ( /* return a row only if we're running on the instance matching the backup preference */ @for_backup = 1 AND [src].[backup_preferred_replica] = 1 AND COALESCE([src].[local_ag_role], N'PRIMARY') = COALESCE(UPPER(REPLACE([src].[local_automated_backup_preference], N'_only', N'')), N'PRIMARY') ) ) ORDER BY [src].[backup_folder_name] ); |
Final Results
| Metric | Original | Optimized | Improvement |
|---|---|---|---|
| Memory Granted | 64,440 KB (63 MB) | 12,608 KB (12 MB) | 80% reduction |
| Memory Used | 224 KB | 488 KB | — |
| Sort Operators | 3 (52K est. rows) | 1 (1 est. row) | Top N Sort only |
| Total Operators | 179 | 158 | 12% fewer |
| Total Logical Reads | 133 | 36 | 73% reduction |
| Estimated Cost | 67.4 | 1.0 | 98% reduction |
| Elapsed Time | 5–7 ms | 5–6 ms | Comparable |
| Duplicated Logic | Yes (2 branches) | No (factored out) | Easier to maintain |
The remaining 12 MB memory grant comes from Hash Match operators the optimizer uses for the LEFT JOINs — that’s inherent to the join strategy and harder to eliminate without fundamentally changing the query shape.
Three Optimizations, Summarized
- Push
@database_nameinto the CTE — The biggest single win. Reduces the 7-way join from processing all online databases to processing one. This cut logical reads from 133 to 36. -
Replace
ROW_NUMBER()withTOP 1 ... ORDER BY— Eliminates full Sort operators. A Top N Sort buffers only 1 row regardless of estimated cardinality, slashing the memory grant. -
Factor out shared WHERE conditions — The three-way AG topology check,
NOT EXISTS, and distributed role check were duplicated across both@for_backupbranches. Now they’re written once, with only the two-line branch difference inside a finalOR.
Lessons Learned
Estimated cost is not execution time. The intermediate v2 rewrite had an estimated cost of 114 (vs 67 for the original) because the optimizer overprices windowed functions on DMVs. Yet it was measurably faster. If I’d trusted estimated cost alone, I would have rejected a genuine improvement.
Windowed functions have hidden costs. ROW_NUMBER() and MAX() OVER() are elegant, but they require Sort operators that request memory proportional to estimated row counts. When the optimizer overestimates (as it does with DMVs), you get 300× memory overallocation.
CTEs are not temp tables. Every reference to a CTE re-expands the entire query. The original function’s NOT EXISTS (SELECT 1 FROM [src] ...) caused the 7-way join to execute twice. This is fine when the CTE is cheap (as it became after pushing @database_name in), but it was wasteful when the CTE scanned all databases.
AI assistants can optimize T-SQL iteratively. Copilot didn’t nail it on the first try — the v2 rewrite actually increased the memory grant. But by deploying to a real instance, capturing actual execution plans, and analyzing the results, it identified the Sort operators as the culprit and arrived at a version that’s better by every measurable metric. The key was treating it as a conversation, not a one-shot prompt.
The next post details the process of doing a code review with Github Copilot.
This post was produced through an interactive session with GitHub Copilot CLI (powered by Claude). The optimization was performed iteratively: Copilot analyzed the original function, proposed changes, deployed them to a live SQL Server instance, captured actual execution plans with SET STATISTICS XML ON and SET STATISTICS IO ON, identified where its first approach fell short, and refined the solution. All code in this post is from that real session.