Incident response

When something goes wrong, here is what happens.

Detection

We monitor the service through:

Uptime checks: External monitoring via Updown.io, checking the web interface, Git API, and CI runner API. Alerts fire on downtime.
Infrastructure metrics: Scaleway observability for compute, database, and storage. Alerts on resource exhaustion, error rate spikes, and latency.
Log analysis: Server logs retained for 30 days. Reviewed for anomalies.
User reports: Email to security@codebahn.net or via in-app support chat.

Classification

Incidents are classified by severity:

Severity	Definition	Examples
Critical	Data breach, unauthorized access to customer data, or complete service outage	Cross-tenant data leak, database compromise, full outage
High	Partial service degradation affecting multiple customers, or a vulnerability with high exploit potential	CI runners down, Git push failures, authentication bypass
Medium	Limited impact, single-customer issue, or a vulnerability requiring specific conditions	Slow API responses, backup verification failure, low-severity vulnerability
Low	Cosmetic, informational, or no direct user impact	Logging gap, documentation error, non-exploitable misconfiguration

Response

Severity	Response time	Communication
Critical	Immediate (within 1 hour)	Email to affected customers, status page update
High	Within 4 hours	Status page update, email if customer-facing
Medium	Within 1 business day	Status page if user-visible
Low	Within 5 business days	No external communication unless relevant

For incidents involving personal data, we notify affected data controllers within 48 hours per our Data Processing Agreement and GDPR Article 33.

Communication channels

Status page: status.codebahn.net for real-time service status and incident updates.
Email: Direct notification to affected customers for Critical and High incidents.
In-app: Support chat for individual follow-up.

We do not use social media for incident communication.

Post-incident review

Every Critical and High incident gets a post-incident review within 5 business days. The review covers:

Timeline: What happened and when.
Root cause: Why it happened.
Impact: What was affected and for how long.
Response: What we did and how fast.
Prevention: What changes prevent recurrence.

We publish a summary for incidents with broad user impact. The summary includes the timeline, root cause, and prevention measures. It does not include internal operational details that could aid future attacks.

Backup and recovery

If recovery from backup is needed:

Daily encrypted backups are available, stored on a separate provider in a separate EU region.
Backups are verified weekly with tested restore procedures.
Recovery target: restore service from backup within hours, not days.

For the full backup details, see the security overview.