Database Seeding: When to Hit the Table vs. When to Hit the API
You’ve got a new environment to stand up. Maybe it’s a fresh QA instance, maybe it’s a training environment someone just spun up, or maybe production got rebuilt after a disaster recovery test. Either way, the database is empty and the application doesn’t work until it has reference data — status codes, lookup tables, user accounts, configuration records.
The question nobody writes a blog post about is: how do you get that data in there?
Most teams start with a SQL script. INSERT INTO ..., run it, done. That works right up until it doesn’t — which is usually the moment you discover that half your reference data has to go through an API because the application layer enforces business rules, generates audit trails, or creates cascading records that a raw INSERT would skip.
Welcome to the dual-channel seeding problem.

The Two Channels
In any system with both a database and a service layer, there are entities that can be inserted directly into tables and entities that must go through the API. The line between them isn’t always obvious, but here’s a reliable heuristic:
Direct SQL is appropriate when:
- The data is truly reference/lookup data with no business logic
- No triggers, computed columns, or cascading operations depend on the insert path
- The application doesn’t maintain audit trails for this entity
- The entity must exist before the API is even available (chicken-and-egg data)
API calls are appropriate when:
- The service layer validates relationships or enforces constraints beyond what the schema does
- The insert creates side effects — audit records, linked entities, notification events
- The entity participates in a business workflow that the database schema alone doesn’t capture
- The API generates identifiers, timestamps, or computed values that downstream systems depend on
What This Looks Like in Practice
Imagine an order management platform with a PostgreSQL back end and a REST API. Seeding a new environment might involve 15 or so operations split across both channels:
Direct SQL (~8 scripts):
- System users and service accounts (must exist before the API can authenticate anyone)
- Roles and permission grants
- Order status codes and allowed status transitions
- Warehouse locations
- Currency codes and exchange rates
- Shipping carriers and service levels
- Notification templates
API calls (~7 scripts):
- Product catalog entries (the catalog service validates SKU formats and image URLs)
- Inventory allocations (the inventory service enforces warehouse capacity rules)
- Pricing tiers (the pricing engine computes volume breakpoints)
- Shipping zone rules (the shipping service validates geographic mappings)
- Discount policies (the promotion engine enforces stacking and exclusion rules)
- Tax jurisdiction configurations (the tax service computes nexus)
- Fulfillment routing rules (the routing engine validates carrier compatibility)
The direct SQL scripts handle data that the application doesn’t have opinions about — it just reads it. The API scripts handle data where the service layer does real work: validating that a pricing tier’s breakpoints don’t overlap, ensuring shipping zones cover all postal codes without gaps, or computing effective dates for tax jurisdiction changes.
The Orchestration Problem
The tricky part isn’t either channel in isolation. It’s the ordering between them.
In this order management platform, the full seeding flow looks something like this:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
Master Orchestrator ├── Phase 1: Database Scripts (direct SQL) │ ├── Clean existing data │ ├── System users (must exist first) │ ├── Roles, permissions, status codes │ ├── Warehouses, carriers, currencies │ └── Notification templates │ ├── Phase 2: Authenticate via OAuth │ ├── Phase 3: API Scripts (ordered) │ ├── Product catalog (no dependencies) │ ├── Pricing tiers (no dependencies) │ ├── Inventory allocations (depends on products + warehouses) │ ├── Shipping zones │ │ │ ├── *** Database interlude *** │ │ └── Update system config table with │ │ default shipping zone IDs │ │ (needs zone IDs created by the API) │ │ │ ├── Fulfillment routing rules │ │ (depends on system config update) │ ├── Discount policies │ ├── Tax jurisdictions │ └── Fulfillment routing overrides |
Notice the database interlude in the middle of the API phase. After the API creates shipping zones, the database needs a configuration update before the next API script can create fulfillment routing rules. The two channels aren’t sequential — they’re interleaved.
This is the kind of dependency that bites you at 2 AM when you’re seeding a new environment and the fulfillment script fails because nobody documented that it depends on a database update that runs between the shipping and routing scripts.
Multi-Environment Targeting
Production systems rarely have just one environment to seed. You might have development, SIT, UAT, training, QA, and pre-production — each isolated by a tenant identifier. The seeding scripts need to handle this without becoming a maintenance nightmare.
The pattern that works well: pass the environment identifier as a parameter, and let each script use it as a filter. In SQL Server, that might look like this:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
DECLARE @tenant_id int = 5; DECLARE @system_user nvarchar(50) = N'svc_seed_uat'; INSERT INTO [config].[order_status] ( [tenant_id] , [status_code] , [status_name] , [is_terminal] , [created_by] , [created_date] ) VALUES (@tenant_id, N'NEW', N'New Order', 0, @system_user, SYSUTCDATETIME()) , (@tenant_id, N'PICKED', N'Picked', 0, @system_user, SYSUTCDATETIME()) , (@tenant_id, N'SHIPPED', N'Shipped', 0, @system_user, SYSUTCDATETIME()) , (@tenant_id, N'DELIVERED', N'Delivered', 1, @system_user, SYSUTCDATETIME()) , (@tenant_id, N'CANCELLED', N'Cancelled', 1, @system_user, SYSUTCDATETIME()) , (@tenant_id, N'RETURNED', N'Returned', 1, @system_user, SYSUTCDATETIME()); |
For the API scripts, the same tenant identifier gets passed as a query parameter or header to every endpoint. A master orchestrator script maps tenant IDs to system user accounts, database servers, and API URLs — one place to change when you add environment number 11 to the fleet.
The Idempotency Question
Here’s where most seeding scripts go wrong: they assume a clean target. DELETE FROM ... WHERE tenant_id = @id followed by INSERT INTO ... works fine the first time. It also works the tenth time — as long as nobody has added records through the application in between.
The moment real data coexists with seed data, the delete-and-reinsert pattern becomes dangerous. You’re deleting records that users created, orders that were processed, and audit trails that compliance needs.
Better patterns, in order of increasing sophistication:
MERGE/ON CONFLICT DO NOTHING— Insert if missing, skip if present. Simple, safe, but you can’t update existing records.MERGEwith UPDATE /ON CONFLICT DO UPDATE(UPSERT) — Insert or update. Good when seed values might change between releases. Requires a clear conflict target (usually the primary key or a unique constraint).- Temp table with merge logic — Load seed data into a temp table, compare against the target, insert missing rows, update changed rows, optionally report what was done. More code, but gives you a full audit of what the script actually changed.
The temp-table approach pays for itself the first time someone asks “what did the seed script actually do to the UAT environment?” Instead of “it ran successfully,” you can say “it inserted 3 new status codes, updated 1 existing one, and skipped 8 that were already current.”
Credential Handling
Seeding scripts are dangerous because they write data and they need credentials to do it. Two rules:
Never store credentials in the script or the repository. Not in a config file, not in a .env, not in a comment that says “TODO: remove before commit.” Use secure prompts (Read-Host -AsSecureString in PowerShell), environment variables from a vault, or an OAuth client credentials flow where the secret lives in a credential store.
Separate database credentials from API credentials. They’re different attack surfaces. The database user that runs seed scripts should not be the same identity that authenticates to the API. The database user needs write access to reference tables. The API user needs OAuth scopes. Conflating them means a leaked API token gives someone database access, or vice versa.
Lessons Learned
After seeding dozens of environments across multiple releases, a few patterns have stuck:
Version your seed data like code. Put the data files, scripts, and orchestrator in a Git repository. When someone asks “what shipping zones were in UAT last month?” you can answer with git log.
Think about your data file format. CSV looks like the obvious choice, but product names contain commas, addresses contain commas, and description fields contain commas. Consider a delimiter that doesn’t appear in your data — tab-separated, pipe-delimited, or even JSON lines. Pick one and stick with it.
Number your scripts explicitly. seed_01, seed_02, seed_03. Not seed_products, seed_pricing, seed_shipping. When execution order matters — and it always matters — sequential numbering makes the dependencies visible. You can see at a glance that 04 runs after 03.
Log what you did, not just whether it succeeded. “Seeding completed successfully” tells you nothing when something goes wrong three days later. “Inserted 23 products, created 6 pricing tiers, configured 4 tax jurisdictions for tenant 5” tells you exactly what state the environment is in.
Keep a verification checklist. After seeding, what should you check? Can a user log in? Can they place an order? Does the shipping calculation return the right rate? Write the checks down. Run them every time. Automate them if you can.
Try This Yourself
If your team doesn’t have a formalized seeding process yet, start small:
- Pick one environment that you rebuild frequently (dev or QA).
- List every piece of reference data it needs to function — you’ll be surprised how long the list is.
- For each item, decide: can this go directly into the database, or does it need to go through the API?
- Write one script per entity, numbered in dependency order.
- Write a master orchestrator that runs them all in sequence with parameterized environment targeting.
You’ll have a repeatable, version-controlled seeding process in a day or two. The first time you hand a new team member the repo and say “run this to set up your dev environment,” you’ll wonder how you ever lived without it.
What does your database seeding process look like? Still running a single SQL script and hoping for the best, or have you evolved into something more structured? I’d love to hear what’s working — and what’s broken — in your environment. Find me on Bluesky or LinkedIn, or drop a comment below.