Automation rule execution are delayed

Incident Report for Confluence

Postmortem

Summary

Jira, Confluence and JSM Automation rules triggered or scheduled to run between 10am and 5pm UTC on March 17, 2025, and between 1pm UTC on March 18 and 12:30am UTC on March 19 were delayed on average by 1.5 hours and up to 12 hours maximum. The incident was triggered by the deployment of a monitoring library upgrade, which slowed the execution of all rules. This reduced the throughput of rule processing which resulted in rules being backed up and delayed. The change manifested in poor rule performance only during periods of high traffic. This incident occurred over two time windows. For the first incident window, backed-up rule executions began to reduce 4 hours 15 minutes after the first alert, and all rules had caught up 7 hours after the first alert. For the second incident window, backed-up rule executions began to reduce 1 hour, 50 minutes after the first alert, and all rules had caught up 10 hours after the first alert.

The root cause of both incidents was identified and a change to address it was deployed by 10am UTC on March 19, 2025.

IMPACT

The customer’s rules were delayed on average by 1.5 hours and up to 12 hours during both incident windows. A very small number of rules encountered the following error: “The rule actor doesn't have permission to view the event that triggered this rule.” This error occurred because of rate limiting implemented by an internal Atlassian service due to increased throughput resulting from our mitigation efforts. These rules failed to complete successfully. All other rules eventually ran successfully.

ROOT CAUSE

The issue was caused by a change introduced to an Atlassian monitoring library, which significantly degraded the Automation rule engine's performance. The performance degradation prevented Automation's system from keeping pace with processing throughput, causing a back-up of executions and subsequent customer rule delays.

REMEDIAL ACTIONS PLAN & NEXT STEPS

We know that outages impact your productivity. While we have a number of testing and preventative processes in place, this specific issue didn’t manifest itself until our systems were at peak load.

The change to the Atlassian monitoring library that was the root cause of the incident has been fixed.

We are prioritizing the following improvement actions that are designed to avoid a repeat of this type of incident:

Deploying the fixed monitoring library after thorough testing
Implementing additional monitoring and alerting to the area of our system affected with performance degradation
Introducing additional pre-deployment testing designed to identify performance degradations before they impact customers
Increasing rate limits of certain downstream Atlassian systems to reduce the likelihood of the rule failures that occurred in this incident
Increasing the processing capacity of parts of our system to reduce the impact of backed up rule executions

We apologize to customers whose automation rules were impacted during this incident; we are taking immediate steps designed to improve the platform’s performance and availability.

Thanks,

Atlassian Customer Support

Posted Apr 04, 2025 - 03:55 UTC

Resolved

Between 13:00 UTC to 16:30 UTC, we experienced delays in automation processing across Confluence, Jira Service Management, and Jira, which was causing rules to appear stuck.

The issue has been resolved and we expect new automations to continue processing without delay.

For automation processes initiated before 16:30 UTC, they may take until 08:00 UTC March 19th, 2025, however, we anticipate processing to be completed quicker. No action is required from users for these automations to be completed.

Posted Mar 18, 2025 - 17:28 UTC

Update

We have successfully identified and mitigated the issue affecting all automation customers, which was causing rules to appear stuck. New automations should now process as expected without delay.

For automation processes initiated from 13:00 UTC to 16:30 UTC on March 18th, 2025, in Jira, Jira Service Management, and Confluence, execution will continue but may experience delays of up to one day. No action is required from users for these automations to be completed. We are closely monitoring the situation and will provide further updates as more information becomes available.

Posted Mar 18, 2025 - 17:15 UTC

Monitoring

The delayed execution of automation rules is mitigated and recovery is in progress. We are now monitoring closely.

Posted Mar 18, 2025 - 16:51 UTC

Update

We are investigating cases of degraded performance regarding automation rules execution for Confluence, Jira Service Management, and Jira Cloud customers.
We are applying mitigation to speed up rule execution.
We will provide more details within the next hour.

Posted Mar 18, 2025 - 16:10 UTC

Investigating

We are investigating cases of degraded performance w.r.t automation rules execution for Confluence, Jira Service Management, and Jira Cloud customers. We will provide more details within the next hour.

Posted Mar 18, 2025 - 15:04 UTC

This incident affected: View Content, Create and Edit, Comments, Authentication and User Management, Search, Administration, Notifications, Marketplace Apps, Purchasing & Licensing, Signup, Confluence Automations, Cloud to Cloud Migrations - Copy Product Data, Server to Cloud Migrations - Copy Product Data and Mobile (iOS App, Android App).