Automate retention jobs and TTLs

Keep only what you need, no longer than necessary. Turn retention rules into code with time-bound deletes, legal holds, and tamper-evident audits across databases, object storage, logs, and analytics.

Automate Retention Jobs and TTLs

Strategy and governance

Define a retention schedule per dataset: purpose, system, selector, duration, action (delete, anonymize, aggregate).
Prefer event time over ingestion time for retention cutoffs.
Enforce exceptions: legal hold, disputes, fraud investigations, accounting requirements.

-- Retention registry
CREATE TABLE retention_policies (
  id serial PRIMARY KEY,
  dataset text UNIQUE,          -- e.g., "app.sessions", "s3://prod-uploads"
  system text,                  -- postgres, mongo, s3, bigquery, es
  selector text,                -- SQL or path pattern
  duration interval,            -- e.g., '90 days'
  action text CHECK (action IN ('delete','anonymize','aggregate')),
  legal_basis text,             -- purpose or citation
  enabled boolean DEFAULT true
);

-- Dataset-level holds
CREATE TABLE legal_holds (
  id uuid PRIMARY KEY,
  dataset text,
  subject_id text,              -- optional per-subject hold
  reason text,
  created_at timestamptz DEFAULT now(),
  released_at timestamptz
);

Relational stores (Postgres example)

Prefer partitioning by time and drop whole partitions. Fall back to targeted deletes with small batches.

-- Partitioned table
CREATE TABLE events (
  id bigserial,
  occurred_at timestamptz NOT NULL,
  user_id uuid,
  data jsonb
) PARTITION BY RANGE (occurred_at);

-- Monthly partitions
CREATE TABLE events_2025_09 PARTITION OF events
  FOR VALUES FROM ('2025-09-01') TO ('2025-10-01');

-- Scheduled function: drop partitions older than 180 days, unless on hold
CREATE OR REPLACE PROCEDURE enforce_events_retention()
LANGUAGE plpgsql AS $$
DECLARE part record;
BEGIN
  FOR part IN
    SELECT inhrelid::regclass AS p
    FROM pg_inherits
    WHERE inhparent = 'events'::regclass
  LOOP
    -- Derive partition month start from name, or inspect min(occurred_at)
    IF to_date(regexp_replace(part.p::text, '.*_(\d{4})_(\d{2})', '\1-\2-01'), 'YYYY-MM-DD')
       < date_trunc('month', now() - interval '180 days')
    THEN
      -- Skip if any legal_hold matches dataset or subject scope
      IF NOT EXISTS (SELECT 1 FROM legal_holds WHERE dataset = 'events' AND released_at IS NULL) THEN
        EXECUTE format('DROP TABLE IF EXISTS %s CASCADE', part.p);
        INSERT INTO retention_audit(dataset, action, details)
        VALUES ('events','drop_partition', jsonb_build_object('partition', part.p::text));
      END IF;
    END IF;
  END LOOP;
END$$;

-- Simple non-partitioned delete in batches
DELETE FROM sessions
WHERE created_at < now() - interval '90 days'
AND NOT EXISTS (
  SELECT 1 FROM legal_holds h
  WHERE h.dataset = 'sessions' AND h.released_at IS NULL
)
LIMIT 10000;

Document stores

MongoDB TTL index

// Expire 30 days after "createdAt"
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 30 * 24 * 3600 });

// Disable per-subject via a hold flag
db.sessions.updateMany({ userId: u, hold: true }, { $unset: { createdAt: "" } }); // or move to a hold collection

Redis

# Set TTLs at write time and refresh only when policy allows
SET session:123 value EX 1800

Object storage lifecycle

Amazon S3

{
  "Rules": [
    { "ID": "uploads-365d-delete",
      "Filter": { "Prefix": "uploads/" },
      "Status": "Enabled",
      "Expiration": { "Days": 365 },
      "NoncurrentVersionExpiration": { "NoncurrentDays": 30 },
      "AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7 } }
  ]
}

Google Cloud Storage

{ "rule": [
  { "action": { "type": "Delete" },
    "condition": { "age": 180, "matchesPrefix": ["logs/"] } }
]}

Logs and search

Elasticsearch/OpenSearch ILM

{
  "policy": {
    "phases": {
      "hot":   { "actions": { "rollover": { "max_age": "7d", "max_size": "50gb" } } },
      "warm":  { "min_age": "7d", "actions": { "shrink": { "number_of_shards": 1 } } },
      "delete":{ "min_age": "90d", "actions": { "delete": {} } }
    }
  }
}

Analytics warehouses

BigQuery

-- Partitioned table with expiration
CREATE TABLE events_partitioned
PARTITION BY DATE(occurred_at)
OPTIONS (partition_expiration_days = 365) AS
SELECT * FROM staging_events;

-- Per-partition override
ALTER TABLE events_partitioned
SET OPTIONS (partition_expiration_days = 180);

Snowflake

Use tasks to purge old rows based on event time.
Set Time Travel and Fail-safe appropriately to avoid retaining data beyond policy.

CREATE TASK purge_events_90d
  WAREHOUSE = ETL_XS
  SCHEDULE = 'USING CRON 15 2 * * * America/New_York'
AS
DELETE FROM EVENTS WHERE OCCURRED_AT < DATEADD(day, -90, CURRENT_TIMESTAMP())
AND NOT EXISTS (SELECT 1 FROM LEGAL_HOLDS WHERE DATASET='EVENTS' AND RELEASED_AT IS NULL);

Orchestration and safety

Run deletes off-hours with small batches and backoff.
Use idempotent jobs keyed by dataset and window.
Add a dry-run mode that reports counts and sampled IDs before execution.
Maintain a quarantine path for mistaken deletes and ensure restores re-apply retention rules.

-- Minimal audit
CREATE TABLE retention_audit (
  at timestamptz DEFAULT now(),
  dataset text,
  action text,             -- dry_run, delete, drop_partition, lifecycle_set
  window text,             -- e.g., "< 2025-06-01"
  affected bigint,
  details jsonb
);

Coordination with rights requests

Deletion pipelines should mark records with delete_by <timestamp> so retention jobs can clean stragglers.
Retention jobs must respect per-subject legal holds and regulator-mandated retention.

ALTER TABLE users ADD COLUMN delete_by timestamptz;
DELETE FROM users
WHERE (created_at < now() - interval '24 months' OR delete_by <= now())
AND NOT EXISTS (SELECT 1 FROM legal_holds h WHERE h.dataset='users' AND h.subject_id = users.id AND h.released_at IS NULL);

Monitoring and SLAs

Track rows deleted per run, runtime, errors, and skipped due to holds.
Alert on job failures, zero-activity anomalies, or unexpected spikes.
Prove enforcement with monthly retention reports from retention_audit.

Quick retention checklist

Single source of truth for retention policies and legal holds
Time-based partitions or TTL indexes wherever possible
Batch deletes with dry-run and audit logs
Storage lifecycle rules for buckets and indices
Warehouse tasks tied to event time, not load time
Per-subject holds and coordination with deletion requests
Metrics, alerts, and monthly evidence reports

Conclusion

Retention is a control, not a note in a policy. By codifying durations, automating TTLs and partition drops, honoring legal holds, and auditing every run, you reduce risk, control costs, and align with GDPR and CPRA storage limitation requirements.

Sanitize backups and test data

Store and track consent effectively