Skip to main content

CC7.3 — Incident response and recovery

Status: Partially implemented (escalation matrix + audit chain live; written incident-response runbook + tabletop drills target Phase 0 exit through Phase 3) Owner: Agent #38 (Senior Compliance Lead, SOC 2 + ISO 27001) Last reviewed: 2026-05-28 Next review: 2026-08-28

Trust Services Criteria reference

The entity evaluates security events to determine whether they constitute a security incident, and responds to identified security incidents by executing a defined incident-response programme. The control covers the incident-detection-to-declaration path, the response runbook, the role-based responsibilities during an incident, the communication plan, the recovery procedure, and the post-incident learning loop.

How ZeroAuth meets this control

The escalation matrix in docs/plan/bfsi-v1/06-ways-of-working.md "Escalation" is the spine of the incident-response programme. Severity-1 production incidents page Roles 5, 21, 26 with a 15-minute SLA, escalating to Role 1 if unresponded. Customer escalations route via Role 42 → Role 46 within 4 hours. A sub-agent REQUEST_CHANGES that goes unaddressed escalates to Role 1 within 24 hours. Phase-exit-gate-at-risk escalates to Role 1 + line VPs one week before the gate.

The Phase 0 + Phase 1 incident-response-related deliverables on the compliance roadmap:

  • DPDP §8 breach-notification playbook — target week 6 (2026-07-06), per compliance-roadmap-v1.md §6.1 SoW. Codifies the 72-hour DPB notification workflow.
  • First DPDP §8 tabletop exercise — week 33 (2026-12-28), per D-Q3-07. Output: docs/compliance/dpdp/tabletop-2026-q3.md.
  • Tabletop after-action report — week 34 (2027-01-04), D-Q3-08.

During an incident, the audit trail is the artefact of record. Every action against the production system writes an audit_events row through appendAuditEvent (commit a475ed8); the hash chain (ADR 0013, commit 27ed93c) makes the trail tamper-evident in real time; the daily on-chain anchor (ADR 0014, commit d6c6a4e) survives even a hostile-actor scenario where the production DB is itself compromised. An attacker cannot quietly "undo" an incident — the chain head will diverge from the on-chain anchor at the next anchor window, surfacing the tampering.

Incident-detection signals from the monitoring layer (see CC7.2):

  • Audit-chain drift — hourly chain-replay job catches in-DB tampering. Severity-1 alert.
  • Missed on-chain anchor — anchor cron retries for 6 hours, then pages on-call. Two consecutive missed days → "anchor-degraded" state surfaces a dashboard banner for the affected tenant.
  • Boot-time vkey mismatch — verifier refuses to start. Service down is loud.
  • High-severity CVE — nightly monitor pages on-call.
  • CI red on main — push-time signal; deploy.yml refuses to publish.
  • Caddy 5xx rate spike — measured at the Caddy layer (Grafana stack pending — see CC7.2 gap).

The recovery procedure is partitioned by incident class:

  • Service outagedocs/operations/deployment.md + docs/operations/admin-dashboard.md cover the deploy + rollback paths. ADR 0011's hotfix-via-dev-then-PR-to-main (commit 51bc705) is the in-cycle path.
  • Vkey-mismatch boot failure — ADR 0015 §"Rollback path" — flip the version constant back to the prior version + keep the prior verification_key.json and *.zkey in circuits/legacy/. 30-min wall-clock from "new vkey lives in test env" to "old verifier on live env retired".
  • Audit-chain divergenceverify-audit-chain.sh replays the chain off-DB + queries Basescan to localise the divergence; incident commander triages whether it's a write-path bug, a serializer poisoning (mitigated by external cryptographer review of src/services/audit.ts per ADR 0013 compensating control), or a hostile-actor scenario.
  • DPDP breach — 72-hour DPB notification path lands week 6 per the playbook.
  • Smart-contract compromisenpm run wallet:rotate rotates the deployer wallet; HSM signer migration (D-Q4-06, week 48) eliminates the human-custody dimension.
  • DR failover (Mumbai → Hyderabad) — D-Q4-04 + D-Q4-05, target week 46 + week 47 (first exercise).

Post-incident learning loop: incidents are captured in the audit-findings doc with a unique ID (C-NN pattern) and a closing-commit + regression-test reference, mirroring the Phase 0 finding format. The quarterly compliance retrospective (compliance-roadmap-v1.md §8.1) is the formal post-incident learning surface.

R-COMP-04 (bank pilot 1 contract slip blocks SOC 2 customer-touchpoint controls including CC7.5 incident-customer-communication evidence) is the named risk; mitigation in compliance-roadmap-v1.md §7.4 is "narrow the Type I scope at the auditor scoping call (week 22) rather than miss the report deadline" if pilots slip.

Evidence references

  • docs/plan/bfsi-v1/06-ways-of-working.md "Escalation" — sev-1 / customer / sub-agent / phase-exit-gate escalation paths with SLAs.
  • ADR 0013-audit-log-hash-chain.md (commit 27ed93c) — tamper-evident incident trail.
  • ADR 0014-on-chain-anchor-cadence.md (commit 27ed93c) — anchor-failure detection.
  • ADR 0015-circuit-version-pinning.md (commit 27ed93c) — circuit-version rollback path.
  • Commit a475ed8appendAuditEvent real-time incident trail.
  • Commit d634b2d/api/admin/audit-integrity (incident-detection endpoint).
  • Commit e98d158 — boot vkey check (loud-failure-mode incident detection).
  • Commit d6c6a4eAuditAnchor contract (anchor-failure detection sink).
  • Commit f8a756c — nightly CVE monitor (incident detection).
  • docs/operations/deployment.md, docs/operations/admin-dashboard.md — recovery runbooks.
  • docs/compliance/compliance-roadmap-v1.md D-Q3-07 + D-Q3-08 — tabletop exercise schedule.
  • docs/compliance/compliance-roadmap-v1.md §7.4 (R-COMP-04) — customer-touchpoint incident risk.

Open gaps + remediation roadmap

  • docs/operations/incident-response-runbook.md — written runbook (severity decision, IC role, communications plan, recovery checklist). Target Phase 0 exit week 2 (2026-06-05) for v0; v1 by week 14.
  • DPDP §8 breach-notification playbook (docs/compliance/dpdp/breach-notification-playbook.md) — target week 6 (2026-07-06).
  • First tabletop exercise — D-Q3-07, target week 33.
  • Hyderabad DR failover exercise — D-Q4-04 / D-Q4-05, target week 46–47.
  • On-call tool (PagerDuty / Opsgenie) onboarding — pairs with CC7.2 monitoring gap; target week 14.

Test or audit query

grep -r "Severity-1" docs/plan/bfsi-v1/06-ways-of-working.md returns the 15-minute pageable SLA. cat docs/compliance/compliance-roadmap-v1.md | grep -A 2 "D-Q3-07" shows the first tabletop exercise schedule. Once the runbook lands, cat docs/operations/incident-response-runbook.md should carry a severity rubric + IC role + comms plan.