Degraded performance for Atlassian Cloud services
Incident Report for Confluence
Postmortem

SUMMARY

On December 8, 2021, between 02:45 and 03:54 UTC, Atlassian customers using Jira and Confluence Cloud products were unable to access core functionality of the products. This event was triggered by a change to replace the Transport Layer Security (TLS) certificate for *.atlassian.net on Atlassian's Edge Network Infrastructure. The incident was detected by monitoring and mitigated by rolling back to the previous TLS certificate. The total time to resolution was one hour and nine minutes.

IMPACT

Jira and Confluence Cloud experienced service degradation on December 8, 2021, between 02:45 and 03:54 UTC. The incident caused service disruption to customers globally, and was caused by a failed change to renew the *.atlassian.net TLS certificate. As a result, API based clients could not perform TLS handshakes with the Atlassian Edge Network Infrastructure, and clients received HTTP 5xx errors which resulted in the following failure scenarios:

  • API and machine clients were unable to connect to Jira and Confluence Cloud
  • In-product experiences such as Atlassian Connect add-ons and Automation for Jira were impacted
  • Clients behind TLS / SSL interception proxies and firewalls were unable to connect to Jira and Confluence Cloud

ROOT CAUSE

Our mechanism for replacing TLS certificate failed to validate the certificate chain. We deployed the certificate with the incorrect certificate chain to Atlassian's Edge Network Infrastructure, resulting in failed TLS handshakes with *.atlassian.net services, and therefore, Jira and Confluence Cloud were unable to service public customer requests.

REMEDIAL ACTIONS PLAN & NEXT STEPS

We know that outages impact your productivity. While we have a number of tests and preventative processes in place, this specific issue wasn’t identified because the change was related to a very specific set of HTTP clients. This was not picked up by our automated continuous deployment suites and manual test scripts.

We are prioritizing the following improvement actions to avoid repeating this type of incident:

  • Introducing additional tests to TLS certificate replacements which will prevent the deployment of incorrect certificates.
  • Bringing further enhancements to increase the coverage of testing external dependencies and inbound endpoints. 

We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve Jira and Confluence Cloud's performance and availability.

Thanks,

Atlassian Customer Support

Posted Dec 15, 2021 - 22:33 UTC

Resolved
Between 02:45 UTC and 03:58 UTC, we experienced a problem with increased SSL/TLS errors which reduced availability of Confluence, Jira Work Management, Jira Software, and Atlassian Developer. The issue has been resolved and the service is operating normally.
Posted Dec 08, 2021 - 04:17 UTC
Investigating
We are investigating cases of degraded performance for Confluence, Jira Work Management, Jira Software, and Atlassian Developer Cloud customers. We will provide more details within the next hour.
Posted Dec 08, 2021 - 03:41 UTC
This incident affected: Marketplace Apps.