Monitor systems and automate breach alerts

Detect threats fast, escalate with context, and auto-contain where safe. Instrument the stack end to end, correlate signals, and route high-fidelity alerts to on-call with runbooks and evidence.

Monitor Systems and Automate Breach Alerts

Detect threats fast, escalate with context, and auto-contain where safe. Instrument the stack end to end, correlate signals, and route high-fidelity alerts to on-call with runbooks and evidence.

Strategy and coverage

Define incident severities and owners. Track MTTD, MTTR, and detection coverage by kill chain stage.
Centralize telemetry in a SIEM. Use structured logs, metrics, traces, and cloud audit events.
Build detections for authentication, authorization, data access, key use, egress, and admin changes.
Minimize PII in alerts. Use DSIDs and links to evidence, not raw personal data.

Key signals to monitor

Auth: brute force, password spraying, token replay, disabled MFA, new device geo anomalies.
Authz: spikes in 403 denies, privilege escalations, unexpected role grants.
Data access: high-volume reads, unusual exports, first-time access to sensitive tables.
Keys and secrets: KMS decrypt spikes, new JWKS issuers, vault access from new hosts.
Egress: sudden outbound bytes from app or DB subnets, object storage bulk downloads.
Infra: container escapes, exec shells in pods, changes to security groups, new public buckets.
Vendors: webhook retries, bulk API pulls, new IPs, failed DPA checks in vendor syncs.
Deception: honeytoken account sign-ins, access to canary records or files.

Example detections and rules

-- Mass export from reports (access_audit schema from earlier sections)
SELECT date_trunc('minute', at) AS m, count(*) AS c
FROM access_audit
WHERE action = 'export' AND resource LIKE 'report:%'
GROUP BY 1 HAVING count(*) > 5 * avg(count(*)) OVER ();  -- simple spike vs avg

-- KMS decrypt spike per key (key_audit from earlier)
SELECT key_id, count(*) AS decrypts_last_5m
FROM key_audit
WHERE operation = 'decrypt' AND at > now() - interval '5 minutes'
GROUP BY key_id
HAVING count(*) > 3 * (
  SELECT coalesce(avg(c), 0) FROM (
    SELECT date_trunc('hour', at) h, count(*) c
    FROM key_audit WHERE key_id = key_audit.key_id
      AND at > now() - interval '24 hours' GROUP BY 1
  ) t
);

# WAF-ish gate: alert on login error bursts
map $status $login_fail { default 0; 401 1; 403 1; }
log_format json escape=json '{ "ts":"$time_iso8601","path":"$request_uri","fail":$login_fail }';
access_log /var/log/nginx/access.json json;
# SIEM query: count where path ~ "/login" and fail=1 > threshold per minute

# Prometheus alert: unusual egress from app namespace
groups:
- name: breach-egress
  rules:
  - alert: HighEgressBytes
    expr: rate(container_network_transmit_bytes_total{namespace="app"}[5m]) > 5e7
    for: 10m
    labels: { severity: critical }
    annotations:
      summary: "High egress from app namespace"
      runbook: "https://runbooks/egress-contain"

# Falco: interactive shell spawned in container
- rule: Container Shell Spawned
  desc: Detect shell in container
  condition: container and proc.name in (bash, sh, zsh)
  output: "Shell in container (user=%user.name container=%container.id proc=%proc.name)"
  priority: CRITICAL

// Cloud object storage: alert on many GETs by new principal (pseudo event pattern)
{ "source": ["s3"], "detail-type": ["Object Access"],
  "detail": { "eventName": ["GetObject"], "countWindow": "5m", "threshold": 100, "principal": "new" } }

Alert routing and auto-response

Route by severity to PagerDuty or equivalent. Include request_id, subject DSID, actor, resource, decision, and evidence links.
Auto-contain where safe: revoke tokens, disable suspicious accounts, rotate API keys, block offending IPs, lock buckets from public access.
Require human confirmation for destructive steps. Log all automation actions to an append-only audit.

{
  "alert_id": "alrt_9f2c",
  "severity": "critical",
  "signal": "kms.decrypt_spike",
  "request_id": "2b5f3c0f...",
  "key_id": "alias/pii",
  "links": { "kibana": "...", "runbook": "https://runbooks/kms-decrypt" },
  "proposed_actions": ["rotate_dek","disable_service_principal","block_asn"]
}

Honeytokens and canaries

Create a fake admin user and fake S3 object with unique markers. Any access is critical.
Plant a canary API key in build artifacts that calls back to a controlled endpoint if used.

Testing and drills

Run monthly breach game days: simulate key theft, token replay, bulk export, and vendor exfiltration.
Validate that alerts fire, on-call is paged, and automations execute safely.
Keep post-incident reviews and detection improvements in a backlog.

Privacy and regulatory triggers

Suppress PII in alerts. Provide links to evidence gated by least privilege.
Track incidents against regulatory thresholds and notify within statutory timelines where required.

Quick monitoring and alerting checklist

Central SIEM with structured logs, metrics, traces, and cloud audit events
Detections for auth, data access, key use, egress, and admin changes
Prometheus or provider alerts with sensible baselines and spike rules
Container and host runtime sensors with Falco or EDR
Honeytokens for early compromise detection
Alert routing with runbooks and safe auto-containment
Monthly drills and post-incident improvements

Proactive monitoring and automated, reversible response shrink breach impact. With layered telemetry, well-tuned detections, honeytokens, and scripted containment, you cut time to detect, speed remediation, and meet GDPR and CPRA expectations for security and accountability.

Sanitize backups and test data