Provide data portability exports

Give people a complete, portable copy of their data in a structured, commonly used, machine-readable format. Build a repeatable pipeline with strong verification, clear schemas, secure delivery, and proofs of what was sent.

Provide Data Portability Exports

Scope and format

Include data the person provided and data observed from their use of the service, where feasible.
Exclude secrets, internal risk scores, and other users’ personal data. Redact third-party identifiers or replace with pseudonyms.
Prefer JSON Lines for records and CSV for simple tables. Bundle large media as files. Provide a data dictionary.
Use ISO-8601 UTC timestamps, stable IDs, and UTF-8 everywhere.

export_2025-09-03/
  manifest.json
  README.txt
  dictionary.md
  profile.json
  settings.json
  sessions.jsonl
  orders.jsonl
  messages.jsonl
  files/
    f_001.png
    f_002.pdf
  vendors/
    email_service_subscriptions.csv

// manifest.json (example)
{
  "export_id": "exp_01J9ABCDXYZ",
  "subject_id": "dsid_7b8c...",
  "created_at": "2025-09-03T21:10:22Z",
  "records": [
    {"path":"profile.json","count":1,"sha256":"..."},
    {"path":"messages.jsonl","count":1245,"sha256":"..."},
    {"path":"files/f_001.png","bytes":384221,"sha256":"..."}
  ],
  "generator": {"version":"2.4.1","host":"exporter-3"}
}

Intake, verification, and status

Provide self-service and admin tools to request an export.
Re-verify identity and require MFA for high-risk accounts.
Use idempotency keys and a simple state machine: received → verified → building → ready → delivered → deleted.

POST /privacy/exports
{ "type":"portability", "dsid":"hash...", "idempotency_key":"uuid" }

GET  /privacy/exports/:id/status
GET  /privacy/exports/:id/download?token=one_time_token

Export design

Create per-domain extractors with consistent field naming and null handling.
Stream large tables to JSONL to avoid memory spikes. Paginate by primary key.
Add joins by reference only. Do not inline other users’ PII.
Include attachments from object storage and rewrite URLs to relative file paths.

-- Stable selector pattern
SELECT id, created_at, updated_at, email, name
FROM users WHERE dsid = :dsid;

-- Messages without leaking other users’ PII
SELECT id, sender_pseudo_id, recipient_pseudo_id, sent_at, text
FROM messages WHERE dsid_owner = :dsid;

// Node sketch: stream table to JSONL and hash each file
import fs from "fs"; import crypto from "crypto";
async function writeJsonl(cursor, outPath) {
  const out = fs.createWriteStream(outPath);
  for await (const row of cursor) out.write(JSON.stringify(row) + "\n");
  out.end(); await new Promise(r => out.on("finish", r));
  return sha256File(outPath);
}
function sha256File(p){ return crypto.createHash("sha256").update(fs.readFileSync(p)).digest("hex"); }

Delivery and security

Package as ZIP with AES-GCM encryption or provide an expiring, single-use download link to object storage.
Require authenticated session plus one-time token. Expire links within 7 days. Auto-delete export artifacts after expiry.
Sign the manifest and store checksums for later verification.

# Example: short-lived S3 link
aws s3 presign s3://exports/exp_01J9ABCDXYZ.zip --expires-in 3600

Redaction and third-party references

Replace other users’ identifiers with consistent pseudonyms per export.
Truncate or mask risky fields by default, with a clear explanation in the dictionary.
Omit server logs and debug traces unless they are already privacy-safe.

Vendors and processors

Pull subject data you store in vendor systems via their APIs where contracts allow.
Include vendor source labels and timestamps. Keep the raw vendor receipts in the audit trail.

{ "vendor":"email_service_x", "dataset":"subscriptions",
  "fetched_at":"2025-09-03T21:06:10Z", "records": 3 }

Retention and cleanup

Keep export artifacts only as long as needed to deliver, commonly 7–14 days.
Record delivery events and delete the package. Never retain a permanent copy.

Monitoring and SLAs

Track median time to build and deliver, failure rates by dataset, and size distributions.
Alert on stalled exports, token reuse attempts, or checksum mismatches.

Data dictionary snippet

### messages.jsonl
- id: string, stable message ID
- sender_pseudo_id: string, per-export pseudonym of sender
- recipient_pseudo_id: string, per-export pseudonym of recipient
- sent_at: RFC3339 timestamp UTC
- text: string, message body with URLs left intact

Quick portability checklist

Stable DSID lookup and strong re-verification
Structured JSONL/CSV plus media, with schemas and a dictionary
Streaming exporters with pagination and memory safety
Redaction of other users’ data and risky fields
Tamper-evident manifest with per-file checksums
Secure delivery with short-lived, single-use links or encrypted ZIPs
Vendor pulls with receipts and clear source labels
Auto-deletion of export artifacts and full audit logging

Conclusion

A clear, secure export pipeline gives people control of their data and gives you proof of compliance. With consistent schemas, safe redaction, streaming builders, and verifiable delivery, you make data portability reliable for users and low risk for engineering and compliance.

Store and track consent effectively

Build deletion and rectification pipelines