Idempotency Is a Contract: Making Retries Safe from the Database Side
The previous post closed out the first half of this series, the part of the contract you can see: column names, types, result-set shapes, the behavior of a single call. The rest of the series is about the promises that do not appear anywhere in a column list, the ones that span calls and consumers, and the first of those is the one a distributed system leans on hardest: sending this twice is safe.
It is leaned on hard because retries are not an edge case. They are the normal operating condition of any system that talks over a network. A request times out and the client cannot tell whether the work happened, so it sends it again. A message queue guarantees at-least-once delivery, which is a polite way of saying it will sometimes deliver twice. A user double-clicks Submit. In every one of these cases the same logical operation arrives at your database more than once, and whether that results in one charge or two is decided in the data layer. Idempotency is the promise that it results in one, and it is a promise only the database can actually keep.

Exactly-Once Is a Story; At-Least-Once Plus Idempotent Is the Truth
It is worth being precise about what is achievable, because a lot of grief comes from chasing the impossible version. You cannot get exactly-once delivery over an unreliable network. What you can get is at-least-once delivery combined with idempotent processing, and the combination is indistinguishable from exactly-once to anyone watching the data. The delivery layer is allowed to be sloppy and send the message twice; the database refuses to apply it twice; the net effect is once.
That shifts the entire burden onto one question: when the same operation arrives a second time, how does the database recognize it and decline to do the work again? The answer the industry has converged on is the idempotency key.
The Idempotency Key
An idempotency key is a unique token the client generates for one logical operation and sends with every attempt at it. The first attempt does the work and records the key alongside the result. Every later attempt with the same key finds that record and returns the stored result instead of doing the work again. This is exactly how Stripe describes its own implementation: it saves the status code and response body of the first request for a given key, and replays that saved response for any subsequent request carrying the same key, whether the first one succeeded or failed.[1]
The crucial part, the part that makes it actually safe rather than approximately safe, is that recording the key and performing the effect happen in the same transaction. If they are separate, there is a window where the work commits and the key does not, or the reverse, and the whole guarantee evaporates. One transaction, one atomic outcome:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
CREATE TABLE [dbo].[idempotency_key] ( [idempotency_key] char(36) NOT NULL CONSTRAINT [pk_idempotency_key] PRIMARY KEY , [response_status] int NOT NULL , [response_body] nvarchar(max) NULL , [created_utc] datetime2(3) NOT NULL CONSTRAINT [df_idempotency_created] DEFAULT (SYSUTCDATETIME()) ); GO CREATE PROCEDURE [api].[charge_create] ( @idempotency_key char(36) , @customer_id int , @amount decimal(19, 4) ) AS BEGIN SET NOCOUNT ON; SET XACT_ABORT ON; /* Fast path: the key has been seen, replay the stored result and stop. */ DECLARE @status int, @body nvarchar(max); SELECT @status = [response_status] , @body = [response_body] FROM [dbo].[idempotency_key] WHERE [idempotency_key] = @idempotency_key; IF @status IS NOT NULL BEGIN SELECT @status AS [response_status], @body AS [response_body]; RETURN; END; BEGIN TRY BEGIN TRANSACTION; /* The effect and the key are recorded together, atomically. */ INSERT INTO [dbo].[charge] ([customer_id], [amount]) VALUES (@customer_id, @amount); INSERT INTO [dbo].[idempotency_key] ([idempotency_key], [response_status], [response_body]) VALUES (@idempotency_key, 201, N'{"status":"created"}'); COMMIT TRANSACTION; SELECT 201 AS [response_status], N'{"status":"created"}' AS [response_body]; END TRY BEGIN CATCH IF @@TRANCOUNT > 0 BEGIN ROLLBACK TRANSACTION; END; /* Two callers raced with the same key; the unique constraint caught the loser. Treat it as a duplicate and replay the winner's result. */ IF ERROR_NUMBER() IN (2627, 2601) BEGIN SELECT [response_status] , [response_body] FROM [dbo].[idempotency_key] WHERE [idempotency_key] = @idempotency_key; RETURN; END; THROW; END CATCH; END; |
Notice what is doing the real work. The opening SELECT is an optimization, the common case where the retry arrives well after the first attempt committed. It is not the guard. The guard is the primary key on idempotency_key and the CATCH block that handles error 2627. When two attempts arrive close enough together that both pass the opening SELECT, exactly one of them wins the insert and the other takes a unique-constraint violation, falls into the catch, and replays the winner’s stored result. The constraint is the arbiter, because a constraint is the one thing in the database that two concurrent transactions cannot both satisfy.
The Upsert Race, and Why the Index Is the Real Guard
This same lesson – that the unique index, not the procedural logic, is what actually enforces uniqueness under concurrency – is the thing most “check then insert” code gets wrong. The naive idempotency check, “select to see if the key exists, and if not, insert it,” has a race built into it: two callers both select, both see nothing, both insert. Without a unique constraint, both inserts succeed and you have done the work twice, which is the precise failure idempotency was supposed to prevent.
It is the upsert problem wearing a different hat, and Michael J. Swart’s survey of SQL Server upsert patterns and antipatterns is the clearest treatment of getting it right: you need both a unique index and the correct locking hints, or you need to lean on the constraint and handle the violation, because the optimistic check alone is not safe under load.[2] It is also why reaching for MERGE as a tidy one-liner here is a trap. MERGE performs its match and its insert as separate operations under the covers, so it is subject to the same race conditions as the hand-written version and has carried a long tail of bugs besides; Aaron Bertrand’s standing advice to use caution with it applies directly.[3] The dull, reliable answer is the one above: try the insert, let the unique constraint be the judge, and handle the loser gracefully.
What Belongs in the Key, and How Long It Lives
A few practical decisions turn the pattern from a sketch into something you can run.
The client owns the key. It has to, because only the client knows that its second request is a retry of the first. A key the server generates per request defeats the entire purpose. A GUID per logical operation is the usual choice.
The key scopes one operation, not one session. It identifies “this particular charge the client is trying to make,” so a genuinely new charge gets a new key and a retry of an existing one reuses it. Getting this boundary right is a contract conversation with the consumers, which is the recurring theme of this series: the meaning of the key is part of the promise.
Keys do not live forever. The table would grow without bound, so keys age out, and the retention window is itself part of the contract. If you keep keys for 24 hours, you are promising that a retry within 24 hours is deduplicated and one after that may be reprocessed. Consumers need to know that number, the same way they would need to know how long an API honors an idempotency key.
A Promise You Cannot See in the Schema
Look at the schema and you will see a table with a primary key and a couple of columns. Nothing about that says “this system is safe to retry.” The safety lives in the discipline around it: the key recorded in the same transaction as the effect, the constraint trusted as the arbiter, the violation handled instead of avoided, the retention window agreed with the people who depend on it. That is what makes idempotency a contract rather than a column – an invisible promise that the whole distributed system above the database is quietly relying on.
The next post is about the other half of reliable messaging: not “what happens when the same message arrives twice,” but “how do I publish a message at all without a distributed transaction tearing my data and my queue out of sync.” That is the transactional outbox, and it leans on everything here.
How does your team handle retries and deduplication today? I would like to hear about it on Bluesky or LinkedIn.
References and Footnotes
- Idempotent requests – Stripe API reference. How a client-supplied idempotency key lets a request be retried safely: the first response (status and body) is saved for the key and replayed for subsequent requests carrying it. ↩
- SQL Server UPSERT Patterns and Antipatterns – Michael J. Swart. Why a check-then-insert is unsafe under concurrency and how to combine a unique index with correct locking to make insert-or-update atomic. ↩
- Use Caution with SQL Server’s MERGE Statement – Aaron Bertrand, MSSQLTips. The race-condition and correctness pitfalls of
MERGE, which performs its insert, update, and delete as separate operations under the covers. ↩