Nobody puts "test data management" on their resume. But more of my "failed test" investigations end in bad data than in real bugs. It's the quiet skill that decides whether your suite is trustworthy.
The Three Data Sins
- Shared mutable data: two tests use the same account; one logs it out, the other fails.
- Stale data: a test depends on an order that was deleted last week.
- Hidden data: a test "just works" because of a row someone added by hand months ago and forgot.
All three produce failures that look like product bugs and waste hours.
Isolation Is the Fix
The rule I follow: each test creates the data it needs and cleans up after itself. No test should depend on data another test left behind.
beforeEach: create a fresh user via API, not the UI
test: act on that user
afterEach: delete the user
Creating data through the API instead of clicking through the UI is faster and doesn't itself break when the UI changes.
Deterministic, Not Random
"Random" test data feels thorough but produces failures you can't reproduce. I use controlled variation โ a fixed seed, or explicit boundary values โ so a failure today fails again tomorrow.
Sensitive Data Never Comes from Production
Copying a production database into test is a breach waiting to happen. I use synthetic data that matches the shape of real data: realistic names, valid-format emails on a domain I control, fake but well-formed card numbers from the test ranges payment providers publish.
The Payoff
When data is isolated and deterministic, a red test means a real bug. That's the whole point โ a suite you can't trust is worse than no suite, because it trains the team to ignore failures.
