Developer Implementation Guide

Table of Content

Table of Content

Table of Content

Provide data portability exports

Give people a complete, portable copy of their data in a structured, commonly used, machine-readable format. Build a repeatable pipeline with strong verification, clear schemas, secure delivery, and proofs of what was sent.

Provide Data Portability Exports

Give people a complete, portable copy of their data in a structured, commonly used, machine-readable format. Build a repeatable pipeline with strong verification, clear schemas, secure delivery, and proofs of what was sent.

Scope and format

  • Include data the person provided and data observed from their use of the service, where feasible.

  • Exclude secrets, internal risk scores, and other users’ personal data. Redact third-party identifiers or replace with pseudonyms.

  • Prefer JSON Lines for records and CSV for simple tables. Bundle large media as files. Provide a data dictionary.

  • Use ISO-8601 UTC timestamps, stable IDs, and UTF-8 everywhere.

export_2025-09-03/
  manifest.json
  README.txt
  dictionary.md
  profile.json
  settings.json
  sessions.jsonl
  orders.jsonl
  messages.jsonl
  files/
    f_001.png
    f_002.pdf
  vendors/
    email_service_subscriptions.csv
// manifest.json (example)
{
  "export_id": "exp_01J9ABCDXYZ",
  "subject_id": "dsid_7b8c...",
  "created_at": "2025-09-03T21:10:22Z",
  "records": [
    {"path":"profile.json","count":1,"sha256":"..."},
    {"path":"messages.jsonl","count":1245,"sha256":"..."},
    {"path":"files/f_001.png","bytes":384221,"sha256":"..."}
  ],
  "generator": {"version":"2.4.1","host":"exporter-3"}
}

Intake, verification, and status

  • Provide self-service and admin tools to request an export.

  • Re-verify identity and require MFA for high-risk accounts.

  • Use idempotency keys and a simple state machine: received → verified → building → ready → delivered → deleted.

POST /privacy/exports
{ "type":"portability", "dsid":"hash...", "idempotency_key":"uuid" }

GET  /privacy/exports/:id/status
GET  /privacy/exports/:id/download?token=one_time_token

Export design

  • Create per-domain extractors with consistent field naming and null handling.

  • Stream large tables to JSONL to avoid memory spikes. Paginate by primary key.

  • Add joins by reference only. Do not inline other users’ PII.

  • Include attachments from object storage and rewrite URLs to relative file paths.

-- Stable selector pattern
SELECT id, created_at, updated_at, email, name
FROM users WHERE dsid = :dsid;

-- Messages without leaking other users’ PII
SELECT id, sender_pseudo_id, recipient_pseudo_id, sent_at, text
FROM messages WHERE dsid_owner = :dsid;
// Node sketch: stream table to JSONL and hash each file
import fs from "fs"; import crypto from "crypto";
async function writeJsonl(cursor, outPath) {
  const out = fs.createWriteStream(outPath);
  for await (const row of cursor) out.write(JSON.stringify(row) + "\n");
  out.end(); await new Promise(r => out.on("finish", r));
  return sha256File(outPath);
}
function sha256File(p){ return crypto.createHash("sha256").update(fs.readFileSync(p)).digest("hex"); }

Delivery and security

  • Package as ZIP with AES-GCM encryption or provide an expiring, single-use download link to object storage.

  • Require authenticated session plus one-time token. Expire links within 7 days. Auto-delete export artifacts after expiry.

  • Sign the manifest and store checksums for later verification.

# Example: short-lived S3 link
aws s3 presign s3://exports/exp_01J9ABCDXYZ.zip --expires-in 3600

Redaction and third-party references

  • Replace other users’ identifiers with consistent pseudonyms per export.

  • Truncate or mask risky fields by default, with a clear explanation in the dictionary.

  • Omit server logs and debug traces unless they are already privacy-safe.

Vendors and processors

  • Pull subject data you store in vendor systems via their APIs where contracts allow.

  • Include vendor source labels and timestamps. Keep the raw vendor receipts in the audit trail.

{ "vendor":"email_service_x", "dataset":"subscriptions",
  "fetched_at":"2025-09-03T21:06:10Z", "records": 3 }

Retention and cleanup

  • Keep export artifacts only as long as needed to deliver, commonly 7–14 days.

  • Record delivery events and delete the package. Never retain a permanent copy.

Monitoring and SLAs

  • Track median time to build and deliver, failure rates by dataset, and size distributions.

  • Alert on stalled exports, token reuse attempts, or checksum mismatches.

Data dictionary snippet

### messages.jsonl
- id: string, stable message ID
- sender_pseudo_id: string, per-export pseudonym of sender
- recipient_pseudo_id: string, per-export pseudonym of recipient
- sent_at: RFC3339 timestamp UTC
- text: string, message body with URLs left intact

Quick portability checklist

  • Stable DSID lookup and strong re-verification

  • Structured JSONL/CSV plus media, with schemas and a dictionary

  • Streaming exporters with pagination and memory safety

  • Redaction of other users’ data and risky fields

  • Tamper-evident manifest with per-file checksums

  • Secure delivery with short-lived, single-use links or encrypted ZIPs

  • Vendor pulls with receipts and clear source labels

  • Auto-deletion of export artifacts and full audit logging

Conclusion

A clear, secure export pipeline gives people control of their data and gives you proof of compliance. With consistent schemas, safe redaction, streaming builders, and verifiable delivery, you make data portability reliable for users and low risk for engineering and compliance.