Your Database Is an API: Treating the Schema as a Versioned Contract

A developer renames a column. The rename is clean, the build is green, the unit tests pass, and the pull request merges on a Tuesday afternoon. On Thursday a finance report that nobody on the team has ever opened returns an error, a nightly export to a partner drops a field, and a five-year-old SSIS package starts logging warnings that page someone at 2 a.m. None of those consumers were in the repository. None of them were in the test suite. All of them were reading that column directly.

This is the moment most of us learn, the hard way, that the database is not a private data structure. It is an interface, and other people’s code is built on top of it.

The trouble is that the database does almost nothing to remind you of that. A class in your application has a public keyword that signals “be careful, other code depends on this.” A REST API has a version number in the URL and a changelog and, if you are lucky, an angry product manager who will stop you from breaking it. A table has none of that. It looks exactly the same whether one stored procedure reads it or four hundred report definitions, three ETL jobs, and a partner integration you have never heard of read it. The schema is a contract being honored by everyone and signed by no one.

This series is about taking that contract seriously: writing it down, drawing a boundary around it, versioning it, and changing it on purpose instead of by accident. This first post makes the case for the idea itself. The rest of the series gets concrete.

A woman engineer shaking hands with a database, sealing a versioned schema contract stamped with version numbers

Your Database Has an API, Whether You Designed One or Not

When Stripe writes about API versioning, they open with a line that applies word for word to a database: “an API represents a contract for communication that can’t be changed without considerable cooperation and effort.”[1] They have maintained backward compatibility with every version of their API since 2011, because code written against an interface has expectations baked into it: a field is always present, a value is never null, a list comes back in a particular order. Break one of those expectations and you break the caller, even if your change looked harmless from where you were standing.

Every table that something outside your codebase queries is an endpoint on an API you are publishing. The column names are the field names. The data types are the schema. The nullability, the collation, the default values, the implicit ordering that a careless SELECT * consumer is secretly relying on – all of it is part of the contract, including the parts you never intended to promise. That last clause is where most production incidents live. Nobody decided that the rows would always come back ordered by insertion; a consumer just observed that they did and built on it.

Martin Fowler draws a distinction that is worth more to a DBA than the entire public-versus-private debate it sits next to: the line between a public interface and a published one.[2] A public method can still be changed cheaply, because every caller lives in the same codebase and a refactoring tool can rename them all at once. A published interface is one used by code you cannot see and cannot edit. You can rename a public thing on a whim. You cannot rename a published thing without a migration, a deprecation window, and a conversation.

A table that only your own stored procedures touch is public. A table that a partner’s nightly job selects from is published. They look identical in SQL Server Management Studio. The difference is the blast radius, and the database will not tell you which kind you are about to change.

This Is a Different Question From “Where Should the Logic Live?”

I have written at length about where business logic should live – whether your rules belong in stored procedures, in the application tier, or split between them. That series is about placement. This one is not. Wherever you decided your logic should run, the database still presents a surface to everything that reads and writes it, and that surface is a contract.

You can put every business rule in C# and still hand a partner a view they query directly. You can put everything in T-SQL and still let a reporting tool bind to your physical tables. The contract exists independently of where the logic sits. So treat the two questions separately: decide where the behavior belongs, and also decide what interface you are willing to stand behind over time. This series is about the second decision.

What the Contract Actually Covers

When people first accept that the schema is an interface, they tend to think it means “don’t rename columns.” It means a great deal more than that. Here is a partial inventory of the promises a database makes, most of them implicitly:

  • Names – tables, columns, the name of a stored procedure, the names of its parameters. Rename any of them and the binding breaks.
  • Types and sizes – widening varchar(50) to varchar(100) is usually safe; narrowing it is a breaking change waiting for the one row that no longer fits. Changing int to bigint can break a consumer that parses into a 32-bit integer.
  • Nullability – a column that was never null and suddenly can be will break every caller that assumed it was always populated. The reverse, making a nullable column required, breaks every writer that left it empty.
  • Result-set shape – the columns a stored procedure returns, in the order it returns them. A consumer that reads result columns by ordinal position is coupled to that order even though you never wrote the order down.
  • Cardinality and uniqueness – “there is exactly one row per customer” is a promise. The day a duplicate appears, every consumer that did SELECT TOP (1) or assumed a one-to-one join gets a different answer.
  • Lookup and enum values – the set of allowed status codes is a contract. Adding a new status can break a consumer with a hard-coded CASE that has no ELSE.
  • Behavior – defaults, computed values, triggers, the error number a stored procedure raises. An application that catches error 50001 is coupled to that number as surely as to any column name.

Most of these are never written down anywhere. They are discovered at runtime, in production, by the consumer that depended on the part you forgot you were promising.

Draw a Boundary: Published Surface vs Private Implementation

The reason a REST API can change its internals freely is that it has a seam. The URL and the JSON it returns are the contract; the service code, the caching, and the database behind it are implementation that the caller never sees. A database can have the same seam, and the tools to build it have shipped in SQL Server for decades: views and stored procedures.

The pattern is to treat one set of objects as the published interface and another as private implementation:

  • Published interface – a set of views and stored procedures, ideally in their own schema, that you commit to keeping stable. This is what outside consumers are allowed to touch.
  • Private implementation – the physical tables, their indexes, their partitioning, the columns you split or merged last quarter. You reserve the right to reshape these as long as the published interface keeps returning the same answers.

A simple convention makes the boundary visible. Put the implementation in dbo (or wherever it lives now) and put the published surface in a schema named for what it is:

Now the physical column is legal_name but the published column is customer_name. If next year the legal team insists the physical column be split into legal_first and legal_last, the table changes and the view absorbs the difference. Consumers of [api].[customer] never notice. The view is the contract; the table is free.

This is not free of cost. An extra layer of views has to be maintained, and a view that does something expensive can hide a performance problem behind a tidy name. The point is not that every table needs a view in front of it. The point is that the objects an outside consumer is allowed to touch should be a deliberate, named, defensible subset, not “whatever happens to be in the database.” Decide which objects are published. Everything else is yours to refactor.

Versioning: Borrow the Rules That Already Exist

Once you have a published surface, you can version it, and you do not need to invent the rules. Semantic versioning already classifies changes by their effect on the caller: a major version for incompatible changes, a minor version for backward-compatible additions, a patch for backward-compatible fixes.[3] Map that onto a schema and the discipline writes itself:

  • Additive changes are minor and safe. A new nullable column, a new table, a new stored procedure, a new optional parameter with a default. Existing consumers do not notice and do not break. Adding is almost always safe; the one trap is SELECT * consumers that materialize into a fixed structure, which is one more reason to discourage SELECT * across a contract boundary.
  • Breaking changes are major and expensive. Renames, drops, type narrowing, new NOT NULL columns without a default, changing the meaning of an existing value. These require a migration window, not a deploy.

The category that gets teams into trouble is the breaking change that looks additive. Adding a new required column to a table is a breaking change for every existing INSERT that does not supply it. Adding a value to a status enum is a breaking change for every consumer that switched on the old set without a default branch. The fix is not to avoid the change; it is to recognize which category it falls into and to treat the breaking ones with the ceremony they deserve.

Breaking Changes Without a Flag Day: Expand, Migrate, Contract

The technique for making a backward-incompatible change safely has a name. Fowler calls it parallel change, also known as expand and contract, and it breaks a breaking change into three phases that never require every consumer to switch at the same instant.[4] Applied to a database, suppose you need to rename a column from email to email_address on the published interface:

  1. Expand – add the new shape alongside the old one. The view exposes both email and email_address, mapped to the same underlying column. Nothing is broken because nothing was removed.
  2. Migrate – move consumers over to the new name on their own schedules. You announce a deprecation date for the old column. Each consuming team updates when it can, not when you deploy.
  3. Contract – once nothing reads the old name, remove it from the view. This is the only step that is breaking, and by the time you take it, nothing is left to break.

The deprecation comment is doing real work: it is the written-down version of the promise, with a date attached. When a partner asks “is it safe to keep using email?” the answer is in the object definition, not in someone’s memory. This is the database equivalent of Stripe keeping every API version alive while it nudges integrators forward, and it is exactly the kind of small, reversible, version-controlled migration that evolutionary database design is built on.[5]

The Hard Part: Finding Out Who Your Consumers Are

All of this assumes you know who depends on the contract, and the uncomfortable truth is that you usually do not. SQL Server can tell you about dependencies inside the database. sys.dm_sql_referencing_entities will list the views, procedures, and functions that reference an object, which is genuinely useful and badly underused.[6] It cannot tell you about the application three teams over that holds the connection string, the Power BI dataset someone built last year, or the partner’s integration that you will only hear about when it breaks.

The engine does push back when something is bound tightly enough. Try to alter a column that a schema-bound view or an index depends on and SQL Server refuses, with the familiar message that one or more objects access this column – a topic I have written about before. That refusal is the database doing exactly what this whole post is asking for: protecting a contract. The lesson is not to fight the error; it is to extend that same protection deliberately, by routing consumers through a published surface that you control, instead of leaving them to bind to physical tables you would like to be able to change.

A schema-bound view is one way to make that protection explicit. WITH SCHEMABINDING tells SQL Server that the view is a contract: the underlying columns it names cannot be dropped or altered out from under it without dropping the view first.[7] That is a feature, not an obstacle. It turns an implicit promise into one the engine will enforce.

Where This Series Goes Next

This post argued for the idea. The rest of the series turns it into practice:

  • Reads are part of the contract too – what an ORM or a hand-written query couples itself to, and how to keep the read surface stable without starving developers of the access they need.
  • Putting the contract under CI/CD – the database belongs in the same pipeline as the application, building on the version control and CI/CD foundations and adding the contract-testing angle.
  • Testing database code – how you prove the contract still holds before a change ships.
  • Behavioral contractsidempotency, the outbox pattern, temporal history and change data capture, and multi-tenant isolation – the promises a database makes that are not visible in a column list at all.

If you would rather start from the other end – which logic belongs in the database in the first place – read the companion Where Does Business Logic Live? series. And if versioning the database the way you version application code is new to you, the Git for DBAs series is the place to build that muscle.

The single idea underneath all of it is the one this post started with: the schema is not a private data structure. It is a published interface with consumers you cannot see, and the difference between a calm Thursday and a 2 a.m. page is whether you treated it like one.

I would like to hear how your team draws the line between the database’s published surface and its private implementation. Find me on Bluesky or LinkedIn.

References and Footnotes

  1. APIs as infrastructure: future-proofing Stripe with versioning – Stripe. The source of the framing that an API is a contract that cannot be changed without cooperation and effort, and a description of how a provider maintains backward compatibility across many versions.
  2. PublishedInterface – Martin Fowler. The distinction between a public interface (changeable within one codebase) and a published one (used by code you cannot edit), which Fowler argues matters more than public versus private.
  3. Semantic Versioning 2.0.0 – Tom Preston-Werner. The rules that classify changes as major (incompatible), minor (backward-compatible additions), or patch (backward-compatible fixes), mapped here onto schema changes.
  4. ParallelChange – Martin Fowler / Danilo Sato. The expand-migrate-contract pattern for making backward-incompatible changes to an interface without requiring all consumers to switch at once.
  5. Evolutionary Database Design – Pramod Sadalage and Martin Fowler. The practices behind small, version-controlled, reversible database migrations and close DBA-developer collaboration.
  6. sys.dm_sql_referencing_entities (Transact-SQL) – Microsoft Learn. The dynamic management function that lists the objects within a database that reference a given object, useful for mapping in-database dependencies on a contract.
  7. CREATE VIEW (Transact-SQL) – Microsoft Learn. The reference for views, including WITH SCHEMABINDING, which binds a view to the schema of its underlying tables and prevents changes that would break it.