Unable to create or edit confluence pages
Incident Report for Confluence
Postmortem

SUMMARY

On Sept 27th 2021 between 08:11 and 13:06 UTC, customers using Confluence Cloud were unable to create or edit pages. A deployment to pick up an environment variable change in a critical collaborative editing service triggered the incident. The new stack was unable to handle the traffic transfer from the existing stack. Detection of the impact took 15 minutes by automated alerts. The response team blocked traffic to the service until it could return to a healthy state. The total time to resolution was about 4 hours and 55 minutes.

IMPACT

Disruption to creating and editing capabilities within Confluence Cloud existed between Sept 27th 2021, 08:11 UTC and Sept 27th 2021, 13:06 UTC. The response team had mitigated the impact for most customers by 10:40 UTC, approximately 2.5 hours after the initial detection. Some customers in the EU continued to see an impact to their experience until we resumed all traffic at 13:06 UTC.

ROOT CAUSE

The critical realtime collaboration service could not handle incoming requests after a redeployment. Architectural limitations of the critical service prevent progressive rollouts. Instead, traffic is cutover with some jitter to the new stack. The new stack was slow to respond, and retries put further pressure on the new stack, resulting in it failing to stabilize.

REMEDIAL ACTIONS PLAN & NEXT STEPS

We know that outages are disruptive to your productivity, and we apologize to all customers who were impacted by this incident. We are prioritizing the following improvement actions to avoid repeating this type of service disruption in the future::

  • Deploying new regions beyond our current three to further distribute load
  • Rebalancing tenants that are in the larger region to other more local regions
  • Increase the time the old stack persists after deployments
  • Improve runbooks to reduce time to resolution

To minimize the impact of breaking changes to our environments, we have a larger architectural change to this service in progress. This will enable progressive rollouts and address the long term scale/reliability limitations of the service.

Thank you,
Atlassian Customer Support

Posted Oct 05, 2021 - 11:37 UTC

Resolved
Between 08:11 AM UTC to 13:09 PM UTC, we experienced issue with creation and editing of pages for Confluence. The issue has been resolved and the service is operating normally.
Posted Sep 27, 2021 - 13:36 UTC
Monitoring
We have mitigated the impact for all the Confluence users who were unable to create and edit pages .
Posted Sep 27, 2021 - 13:28 UTC
Update
We have mitigated the problem with Confluence page creation and editing. We are monitoring closely
Posted Sep 27, 2021 - 12:59 UTC
Identified
We have identified a mitigation plan and are in the process of rolling out the mitigation. We will post updates within the next hour.
Posted Sep 27, 2021 - 11:01 UTC
Update
We are continuing to investigate this issue.
Posted Sep 27, 2021 - 09:02 UTC
Update
We are continuing to investigate this issue.
Posted Sep 27, 2021 - 09:01 UTC
Investigating
We are investigating an issue with page creation and editing that is impacting all the Confluence cloud customer. We will provide more details within the next hour.
Posted Sep 27, 2021 - 09:01 UTC
This incident affected: Create and Edit.