Docs

Incident Response Log Checklist

A concise checklist for preparing sanitized logs and request traces during incident response.

Updated: 2026-02-24

Incident response log checklist

During an active incident, teams optimize for speed. That is necessary, but speed without process usually leads to missing context or accidental data exposure. A short checklist lets you move quickly while preserving evidence quality and minimizing leak risk.

Use this runbook for on-call escalations, cross-team handoff, and vendor support tickets.

Why it matters

Incident logs are operational evidence. If they are incomplete, root cause analysis becomes guesswork. If they include secrets, the incident footprint grows and can trigger additional security response work. A good checklist solves both problems by making teams collect high-value context and sanitize at the same time.

Consistent formatting also helps downstream responders. Security, support, and vendor engineers can compare incidents faster when each package includes the same fields: timeline, request IDs, sanitized snippets, impact statement, and owner.

Step-by-step checklist

Define incident scope first: impacted service, affected customers, severity, and start time.
Capture a narrow time window of logs around the first known failure.
Pull key request traces (cURL, HAR, or structured JSON) tied to representative failures.
Sanitize logs with Log Sanitizer and sanitize request traces with HAR Sanitizer.
Confirm redaction for auth headers, API keys, cookies, token query params, and private key blocks.
Check cloud/vendor secrets explicitly: AWS credentials, GitHub tokens, Slack tokens, Stripe keys, SendGrid keys.
Record stable identifiers: request ID, trace ID, correlation ID, endpoint, and status code.
Add environment labels (prod, staging) and deployment identifiers where available.
Produce a short timeline with UTC timestamps and state changes.
Document what changed recently: deploy, config, dependency, traffic pattern, cert/key rotation.
Add immediate mitigation status and rollback status.
Assign one current owner and one next update time.

Safe snippet examples

Incident note block:

Incident: Checkout API returns 500 for subset of users
Started: 2026-02-24T18:03:00Z
Env: production
Request ID: req_7b0c9
Endpoint: POST /v1/orders
Owner: platform-oncall
Next update: 2026-02-24T18:45:00Z

Sanitized trace:

{
  "request_id": "req_7b0c9",
  "authorization": "[REDACTED:AUTH]",
  "x-api-key": "[REDACTED:API_KEY]",
  "cookie": "[REDACTED:COOKIE]",
  "query": "token=[REDACTED:QP]",
  "status": 500,
  "error": "upstream timeout"
}

Sanitized command for reproduction:

curl -X POST "https://api.example.com/v1/orders?token=[REDACTED:QP]" \
  -H "Authorization: [REDACTED:AUTH]" \
  -H "x-api-key: [REDACTED:API_KEY]" \
  -d '{"order_id":"A-1001"}'

Minimal package for escalation

When opening a ticket, include this order:

What is broken now and customer impact.
Sanitized evidence (logs + one reproducible request).
What has already been tried.
Suspected blast radius and confidence level.
Owner, priority, and next decision point.

Avoid including:

Raw cookies, bearer tokens, or PATs.
Full database dumps when a few rows/fields are enough.
Unbounded HAR files if a small filtered trace can reproduce the issue.