On January 18, 2024, between 01:12 am UTC and 02:12 am UTC, Atlassian customers using Confluence Cloud were unable to access core product functionality, seeing degraded performance in the APAC region. The event was triggered by a deployment of a downstream dependency service which could not scale with the increase in traffic. The incident was detected within 18 minutes by an automated monitoring system and mitigated by scaling out nodes manually which put Atlassian systems into a known good state. The total time to resolution was about one hour.
In response to this incident, we helped scale the service and put in a deployment block with the goal of preventing the service from being deployed to production again until the issue was resolved.
On January 25, 2024, between 01:05 am UTC and 01:42 am UTC, a separate automated deployment process ran to deploy services that were not deployed in the previous seven days. This deployment also caused the dependent service to run at a lower-than-desired capacity, resulting in degraded performance in Confluence. The issue was detected within 10 minutes by an automated monitoring system and mitigated by scaling out nodes manually. The total time to resolution was about 37 minutes.
The overall impact was on January 18, 2024, between 01:12 am UTC and 02:12 am UTC, and then on January 25, 2024, between 01:05 am UTC and 01:42 am UTC. These incidents caused service disruption to customers in the APAC region where they may have noticed timeouts and failed requests for viewing pages, creating pages, and other functionality of Confluence Cloud.
The issue was caused by a deployment of a downstream service that had not scaled to meet the growing traffic. As a result, Confluence Cloud saw timeouts and errors in their requests and the users received HTTP 500 errors.
We know that outages impact your productivity. We are prioritizing the following improvement actions to avoid repeating this type of incident.
We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.
Atlassian Customer Support