Confluence Outage

Incident Report for Confluence

Postmortem

Here is the root cause analysis of the incident.

On November 21, 2024 9:11pm UTC, we deployed a production release of a Confluence service to one of the regions, using a new rollout technology. During the deployment, the team encountered an issue that necessitated a manual rollback. The rollback reverted the region to use the previous nodes, which weren’t pre-scaled to handle existing traffic at the time. Consequently, this caused stress on the service. In response, the team immediately scaled up the nodes in the region. By 9:24 PM UTC, the service was fully restored, resulting in an outage that lasted for 13 minutes.

Atlassian is conducting an internal post-incident review and will take the required actions to prevent this kind of incident from happening again.

We sincerely apologize for the inconvenience.

Posted Nov 22, 2024 - 22:19 UTC

Resolved

There was Confluence outage in one of the regions on November 21, 2024 9:11pm UTC. By 9:24 PM UTC, the service was fully restored, resulting in an outage that lasted for 13 minutes.

Posted Nov 21, 2024 - 21:11 UTC