Dash0

Report a problemSubscribe to updates
Powered by

Go to https://www.dash0.com

Privacy policy

·

Terms of service
Write-up
Experiencing delay in data processing
Degraded performance
View the incident

Incident: Experiencing delay in data processing

Duration: ~15 minutes (19:55–20:10 UTC, April 29, 2026)

Impact: Customers across all regions may have experienced a brief gap in telemetry data.

What happened

A sudden spike in internal API requests overwhelmed our Control Plane API's database connection pool. An inefficient database query amplified the load, exhausting available connections. This caused downstream data receivers to lose access to configuration and restart, resulting in a brief gap in data ingestion.

What we fixed

Multiple improvements were deployed immediately after the incident:

- Query optimization and caching - Rewrote expensive database queries and added caching to significantly reduce database load.

- Connection pool and capacity scaling - Increased database connection limits and scaled up replicas.

- Per-tenant rate limiting - Introduced rate limits for critical endpoints, preventing any single source of traffic from monopolizing resources.

- Receiver resilience - Fixed receivers to handle temporary control-plane-api unavailability instead of restarting.

These changes have been validated in production during subsequent traffic spikes of comparable magnitude with no customer impact.

We sincerely apologize for the disruption and remain committed to improving the resilience of our platform.