Degredaded Experience
Incident Report for Confluence
Postmortem

SUMMARY

On August 29, 2024, between 8:32 AM and 11:35 AM UTC, Atlassian customers experienced an unexpected disruption in service affecting Jira and Confluence cloud products. This was primarily due to connectivity issues with an external service provider, which impacted our ability to perform certain operations integral to our services. The incident was promptly identified through our monitoring systems, and our teams initiated response protocols to address the issue. The disruption resulted in increased error rates across Jira and Confluence products, temporarily affecting the experience for some of our users.

IMPACT

On August 29, 2024, between 8:32 AM and 11:35 AM UTC, there was an impact on Confluence and Jira cloud affecting customers in multiple regions. The total time to resolution was three hours and three minutes.

ROOT CAUSE

The issue was caused by an outage on our external security token provider. Services which requested security tokens experienced timeouts leading to outages for our cloud products.

Our automated monitoring system detected the incident within one minute and we confirmed the external service outage with the external provider. The provider resolved the outage at 10:02 AM UTC, and full system recovery for Atlassian was completed by 11:35 AM UTC.

REMEDIAL ACTIONS PLAN & NEXT STEPS

We are committed to reducing the risk of similar incidents in the future. To achieve this, we will enhance our system's ability to handle external service disruptions and improve our incident response strategies. Specifically, we will reduce our reliance on single points of failure by setting up fallback region support for the critical security service provider. Additionally, we will implement measures to minimize the recovery time of the Atlassian services once the external security provider has recovered.

Thanks,

Atlassian Customer Support

Posted Sep 12, 2024 - 15:44 UTC

Resolved
This incident has been resolved.
Posted Aug 29, 2024 - 11:22 UTC
Monitoring
We are seeing recovery and monitoring the situation
Posted Aug 29, 2024 - 10:54 UTC
Investigating
We are investigating cases of degraded performance and increased error rates for Atlassian products. We will provide more details within the next hour.
Posted Aug 29, 2024 - 09:43 UTC