CC7.3 — Incident response and recovery
Status: Partially implemented (escalation matrix + audit chain live; written incident-response runbook + tabletop drills target Phase 0 exit through Phase 3) Owner: Agent #38 (Senior Compliance Lead, SOC 2 + ISO 27001) Last reviewed: 2026-05-28 Next review: 2026-08-28
Trust Services Criteria reference
The entity evaluates security events to determine whether they constitute a security incident, and responds to identified security incidents by executing a defined incident-response programme. The control covers the incident-detection-to-declaration path, the response runbook, the role-based responsibilities during an incident, the communication plan, the recovery procedure, and the post-incident learning loop.
How ZeroAuth meets this control
The escalation matrix in docs/plan/bfsi-v1/06-ways-of-working.md "Escalation" is the spine of the incident-response programme. Severity-1 production incidents page Roles 5, 21, 26 with a 15-minute SLA, escalating to Role 1 if unresponded. Customer escalations route via Role 42 → Role 46 within 4 hours. A sub-agent REQUEST_CHANGES that goes unaddressed escalates to Role 1 within 24 hours. Phase-exit-gate-at-risk escalates to Role 1 + line VPs one week before the gate.
The Phase 0 + Phase 1 incident-response-related deliverables on the compliance roadmap:
- DPDP §8 breach-notification playbook — target week 6 (2026-07-06), per
compliance-roadmap-v1.md§6.1 SoW. Codifies the 72-hour DPB notification workflow. - First DPDP §8 tabletop exercise — week 33 (2026-12-28), per D-Q3-07. Output:
docs/compliance/dpdp/tabletop-2026-q3.md. - Tabletop after-action report — week 34 (2027-01-04), D-Q3-08.
During an incident, the audit trail is the artefact of record. Every action against the production system writes an audit_events row through appendAuditEvent (commit a475ed8); the hash chain (ADR 0013, commit 27ed93c) makes the trail tamper-evident in real time; the daily on-chain anchor (ADR 0014, commit d6c6a4e) survives even a hostile-actor scenario where the production DB is itself compromised. An attacker cannot quietly "undo" an incident — the chain head will diverge from the on-chain anchor at the next anchor window, surfacing the tampering.
Incident-detection signals from the monitoring layer (see CC7.2):
- Audit-chain drift — hourly chain-replay job catches in-DB tampering. Severity-1 alert.
- Missed on-chain anchor — anchor cron retries for 6 hours, then pages on-call. Two consecutive missed days → "anchor-degraded" state surfaces a dashboard banner for the affected tenant.
- Boot-time vkey mismatch — verifier refuses to start. Service down is loud.
- High-severity CVE — nightly monitor pages on-call.
- CI red on main — push-time signal; deploy.yml refuses to publish.
- Caddy 5xx rate spike — measured at the Caddy layer (Grafana stack pending — see CC7.2 gap).
The recovery procedure is partitioned by incident class:
- Service outage —
docs/operations/deployment.md+docs/operations/admin-dashboard.mdcover the deploy + rollback paths. ADR 0011's hotfix-via-dev-then-PR-to-main(commit51bc705) is the in-cycle path. - Vkey-mismatch boot failure — ADR 0015 §"Rollback path" — flip the version constant back to the prior version + keep the prior
verification_key.jsonand*.zkeyincircuits/legacy/. 30-min wall-clock from "new vkey lives in test env" to "old verifier onliveenv retired". - Audit-chain divergence —
verify-audit-chain.shreplays the chain off-DB + queries Basescan to localise the divergence; incident commander triages whether it's a write-path bug, a serializer poisoning (mitigated by external cryptographer review ofsrc/services/audit.tsper ADR 0013 compensating control), or a hostile-actor scenario. - DPDP breach — 72-hour DPB notification path lands week 6 per the playbook.
- Smart-contract compromise —
npm run wallet:rotaterotates the deployer wallet; HSM signer migration (D-Q4-06, week 48) eliminates the human-custody dimension. - DR failover (Mumbai → Hyderabad) — D-Q4-04 + D-Q4-05, target week 46 + week 47 (first exercise).
Post-incident learning loop: incidents are captured in the audit-findings doc with a unique ID (C-NN pattern) and a closing-commit + regression-test reference, mirroring the Phase 0 finding format. The quarterly compliance retrospective (compliance-roadmap-v1.md §8.1) is the formal post-incident learning surface.
R-COMP-04 (bank pilot 1 contract slip blocks SOC 2 customer-touchpoint controls including CC7.5 incident-customer-communication evidence) is the named risk; mitigation in compliance-roadmap-v1.md §7.4 is "narrow the Type I scope at the auditor scoping call (week 22) rather than miss the report deadline" if pilots slip.
Evidence references
docs/plan/bfsi-v1/06-ways-of-working.md"Escalation" — sev-1 / customer / sub-agent / phase-exit-gate escalation paths with SLAs.- ADR
0013-audit-log-hash-chain.md(commit27ed93c) — tamper-evident incident trail. - ADR
0014-on-chain-anchor-cadence.md(commit27ed93c) — anchor-failure detection. - ADR
0015-circuit-version-pinning.md(commit27ed93c) — circuit-version rollback path. - Commit
a475ed8—appendAuditEventreal-time incident trail. - Commit
d634b2d—/api/admin/audit-integrity(incident-detection endpoint). - Commit
e98d158— boot vkey check (loud-failure-mode incident detection). - Commit
d6c6a4e—AuditAnchorcontract (anchor-failure detection sink). - Commit
f8a756c— nightly CVE monitor (incident detection). docs/operations/deployment.md,docs/operations/admin-dashboard.md— recovery runbooks.docs/compliance/compliance-roadmap-v1.mdD-Q3-07 + D-Q3-08 — tabletop exercise schedule.docs/compliance/compliance-roadmap-v1.md§7.4 (R-COMP-04) — customer-touchpoint incident risk.
Open gaps + remediation roadmap
docs/operations/incident-response-runbook.md— written runbook (severity decision, IC role, communications plan, recovery checklist). Target Phase 0 exit week 2 (2026-06-05) for v0; v1 by week 14.- DPDP §8 breach-notification playbook (
docs/compliance/dpdp/breach-notification-playbook.md) — target week 6 (2026-07-06). - First tabletop exercise — D-Q3-07, target week 33.
- Hyderabad DR failover exercise — D-Q4-04 / D-Q4-05, target week 46–47.
- On-call tool (PagerDuty / Opsgenie) onboarding — pairs with CC7.2 monitoring gap; target week 14.
Test or audit query
grep -r "Severity-1" docs/plan/bfsi-v1/06-ways-of-working.md returns the 15-minute pageable SLA. cat docs/compliance/compliance-roadmap-v1.md | grep -A 2 "D-Q3-07" shows the first tabletop exercise schedule. Once the runbook lands, cat docs/operations/incident-response-runbook.md should carry a severity rubric + IC role + comms plan.