background

How do I recover from data loss in production?

Production data loss is one of the most serious incidents a software team can face. Here's how to respond, what your options are, and how to prevent recurrence.

Production data loss is one of the most serious incidents a software team can face. The response in the first hour determines how much data is recoverable. Here’s how to approach it.

Stop the Bleeding Immediately

Before attempting recovery, stop whatever is causing the loss. If a background job is deleting records, stop the job. If a migration is running destructively, halt it. If a deploy introduced code that’s overwriting data, roll it back.

Every minute of continued data loss makes recovery harder. Containment takes priority over diagnosis.

Assess What You’re Dealing With

Data loss incidents come in several forms, each with different recovery options:

Accidental deletion - Records deleted by a bug, a bad migration, or a manual operation that ran against production instead of staging. Most recoverable if you act quickly.

Data corruption - Records overwritten with incorrect values. Depends heavily on whether you have a point-in-time snapshot before the corruption occurred.

Cascading delete - A foreign key constraint or dependent destroy wiped associated records when a parent was deleted. Often partially recoverable from backups.

Data sent to wrong tenant - In multi-tenant applications, data written to the wrong tenant’s scope. The data exists but is in the wrong place.

Infrastructure failure - Disk failure, storage volume corruption. Recovery depends entirely on your backup and replication configuration.

Recovery Options

Point-in-Time Database Restore

If your database provider supports point-in-time recovery (PostgreSQL on RDS, Heroku Postgres, most managed database services), you can restore to a snapshot from before the data loss occurred. This is the cleanest recovery path.

Considerations:

Extracting from Backups

If you have regular database backups (you should), restore the backup to a separate environment and extract the specific records that were lost. This is slower than point-in-time recovery but preserves everything that happened after the backup.

# Restore a Postgres dump to a local recovery instance
pg_restore -d recovery_db backup_20260410_0200.dump

# Query for the lost records
psql recovery_db -c "SELECT * FROM orders WHERE deleted_at > '2026-04-10 14:00:00'"

Application-Level Recovery

Some data loss can be partially reconstructed from:

This is labor-intensive and rarely complete, but can fill gaps when database recovery isn’t possible.

Soft Delete Pattern

If you use a soft delete pattern (deleted_at timestamp rather than hard deletes), accidentally deleted records are recoverable with a simple update. If you don’t have soft deletes and data loss is a recurring risk, adding them is worthwhile.

Communicating With Affected Users

If customer data was lost, they need to know. The communication should include:

Proactive, honest communication about data loss is handled better by customers than discovering it themselves.

Preventing Future Data Loss

We Can Help

Data loss recovery under time pressure benefits from experience. If you’re dealing with production data loss now, contact us immediately. If you’ve recovered from an incident and want to ensure your backup and recovery systems are solid, we offer that review as well.

Learn about our emergency software services.