Here is the root cause analysis of the incident.
On November 21, 2024 9:11pm UTC, we deployed a production release of a Confluence service to one of the regions, using a new rollout technology. During the deployment, the team encountered an issue that necessitated a manual rollback. The rollback reverted the region to use the previous nodes, which weren’t pre-scaled to handle existing traffic at the time. Consequently, this caused stress on the service. In response, the team immediately scaled up the nodes in the region. By 9:24 PM UTC, the service was fully restored, resulting in an outage that lasted for 13 minutes.
Atlassian is conducting an internal post-incident review and will take the required actions to prevent this kind of incident from happening again.
We sincerely apologize for the inconvenience.