Platform Alerts

Incident Report for Anaplan

Postmortem

On May 16, 2024, at 00:00 UTC, the Anaplan platform encountered service disruptions due to an unexpected increase in load on the API Services. This surge in load caused saturation, limiting available resources to a core back-end service, which resulted in intermittent outages across the platform.

This incident affected the following regions:

· Anaplan Data Center - US East

· Anaplan Data Center - US West

· Anaplan Data Center – Germany

· Anaplan Data Center – Netherlands

· Anaplan Google Cloud Public - US East

· Anaplan Google Cloud Public - Japan

· Anaplan Amazon Cloud Public – US

· Anaplan Amazon Cloud Public – Europe

The intermittent outages, preventing access to the Anaplan platform, occurred at the following times:

· 00:00-01:33 UTC

· 06:40-07:14 UTC

· 10:03-10:15 UTC

· 14:02-14:09 UTC

To stabilize the platform, we took immediate action to scale down the API and CloudWorks™ services. This measure paused all integration activities, allowing the core back-end service to recover and enabling customer access to the platform. While the API and CloudWorks™ services were scaled down, all integrations to the platform were unavailable. This affected Anaplan Connect, the Administration service and third party integrations. Integration functionality was restored once the API and CloudWorks™ services were scaled back up.

Integrations were unavailable during the following periods:

· 00:00-02:18 UTC

· 06:40-09:04 UTC

· 10:03-11:38 UTC

· 14:02-14:19 UTC

To identify the source of the increased load, we isolated a specific group of integrations. Although this initially seemed to alleviate the issue, analysis revealed that they weren’t the root cause. Further steps were taken to reduce the load on the API service by updating the configuration to the API retry mechanism at 14:05 UTC. Post this mitigation step, we monitored the platform for three hours until we resolved the issue at 17:09 UTC.

Additional analysis identified that an upstream service’s time-out configuration had been inadvertently lowered. The time-out change caused a more aggressive failure to be sent to the API service. This led to a significant surge in API retry requests and the increase in load on API services. The time-out change, in combination with the aggressive API retry mechanism and top-of-the-hour integrations traffic, resulted in resource saturation and subsequent outages of the platform.

We identified that the time-out configuration was unintentionally lowered as part of an upgrade completed to an upstream service on May 15, at 15:00 UTC. We have since updated the configuration to the former value. A thorough review is being conducted to understand how the upgrade led to this unintended change. In addition, we are reviewing the API retry mechanism. We have suspended the API retry functionality that's used in this type of failure scenario as this isn't used for daily processes. Furthermore, we have added increased observability for this scenario, and are increasing resources to the impacted core back-end service to add additional resiliency.

We apologize for any impact this issue may have had on your business operations. We are continuously strengthening our systems and procedures to ensure we avoid future disruptions to your business and users.

If you have further questions or concerns, please visit the Anaplan Support website. Thank you for your patience during this situation and thank you for being an Anaplan customer.

Posted May 23, 2024 - 09:16 UTC

Resolved

We have confirmed that the issue is now resolved.

We deeply apologize for any impact this issue may have caused. We appreciate your patience and partnership as we worked through this issue.

We will follow up within 7 business days with a detailed root cause analysis (RCA) that will be shared on our Status Page. If you have any question or concerns, please do not hesitate to contact us at Anaplan Support.

Posted May 16, 2024 - 17:09 UTC

Update

Service has been restored and the platform has been stable for past two hours. We will continue to monitor the platform to ensure no additional issues arise. If you have any questions, concerns, or continue to experience issues, please do not hesitate to contact Anaplan Support. We will provide a final update to you when we consider this situation fully resolved.

Posted May 16, 2024 - 16:44 UTC

Monitoring

We are monitoring the platform as it continues to remain stable for the past hour and a half. In parallel, investigation continues as we monitor the platform for the next hour and provide an update in an hour or sooner as additional information becomes available.

Posted May 16, 2024 - 16:19 UTC

Update

Thank you for your patience as we continue to investigate this issue. We are monitoring the platform as it continues to remain stable for the past hour. In parallel, investigation continues and we will continue to provide updates every 30 minutes or sooner as additional information becomes available.

Posted May 16, 2024 - 15:39 UTC

Update

Thank you for your patience as we continue to investigate this issue. We have disabled functionality within the API service and are closely monitoring the impact. In parallel, investigation continues and we will continue to provide updates every 30 minutes or sooner as additional information becomes available.

Posted May 16, 2024 - 15:05 UTC

Update

Thank you for your patience as we continue to investigate this issue. We have now re-enabled our integration services (API Services & CloudWorks) and continue to closely monitor the impact of this on the platform. Investigation continues and we will continue to provide updates every 30 minutes or sooner as additional information becomes available.

Posted May 16, 2024 - 14:31 UTC

Update

Thank you for your patience as we continue to investigate this issue. We are aware that the platform has recently become inaccessible to customers. We have taken immediate remediation action to disable API Services and CloudWorks in order to recover access to the platform. We continue to closely monitor the impact of this on the platform while we further investigate the cause of the instability. We will continue to provide updates every 30 minutes or sooner as additional information becomes available.

Posted May 16, 2024 - 14:07 UTC

Update

Thank you for your patience as we continue to investigate this issue. We have deployed some remediation steps and are closely monitoring the impact of this on the platform. We will continue to provide updates every 30 minutes or sooner as additional information becomes available.

Posted May 16, 2024 - 13:33 UTC

Update

Thank you for your patience as we continue to investigate this issue. We have narrowed down the issue to a subsection of integration jobs and have temporarily disabled these while further investigation takes place. We are currently assessing what additional tolerance we can build into the system before we reintroduce these jobs gracefully. We will continue to provide updates every 30 minutes or sooner as additional information becomes available.

Posted May 16, 2024 - 13:07 UTC

Update

Posted May 16, 2024 - 12:36 UTC

Update

Thank you for your patience as we continue to investigate this issue. We are progressing mitigation steps to alleviate the issue with our integration services and have completed a full review of all recent changes made to the platform. Currently, we do not yet have a time to resolution. We will continue to provide updates every 30 minutes or sooner as additional information becomes available.

Posted May 16, 2024 - 12:02 UTC

Update

Thank you for your patience as we continue to investigate this issue. We continue to evaluate mitigation steps to alleviate the issue with our integration services. Currently, we do not yet have a time to resolution. We will continue to provide updates every 30 minutes or sooner as additional information becomes available.

Posted May 16, 2024 - 11:30 UTC

Update

Thank you for your patience as we continue to investigate this issue. We are evaluating mitigation steps to alleviate the issue with our integration services. Currently, we do not yet have a time to resolution. We will continue to provide updates every 30 minutes or sooner as additional information becomes available.

Posted May 16, 2024 - 11:01 UTC

Update

Thank you for your patience as we continue to work on mitigate this issue. Access to the platform has been restored but integrations (API & CloudWorks) are currently unavailable. We are continuing investigations and further updates will be provided every 30 minutes or upon resolution.

Posted May 16, 2024 - 10:32 UTC

Investigating

Thank you for your patience as we continue to investigate this issue. We have encountered further issues with customer's ability to access the platform. We have taken immediate remediation steps and are closely monitoring the situation. We continue to work on this as a priority and will provide an update every 30 minutes as we work to resolve this issue as quickly as possible.

Posted May 16, 2024 - 10:09 UTC

Identified

Thank You for your patience while we work towards mitigating the issue. Initial reports indicate positive outcomes of the mitigation activities. We are closely monitoring the situation while we continue with the mitigation steps to alleviate the issue. We will provide an update every 30 minutes as we work to resolve this issue as quickly as possible.

Posted May 16, 2024 - 09:43 UTC

Update

Thank you for your patience as we continue to investigate this issue. We are progressing mitigation steps to alleviate the issue with our integration services. We continue to work on this as a priority and will provide an update every 30 minutes as we work to resolve this issue as quickly as possible.

Posted May 16, 2024 - 09:13 UTC

Update

Thank you for your patience as we continue to investigate this issue. Currently, we do not yet have a time to resolution. We will continue to provide updates every 30 minutes as we work to resolve this issue as quickly as possible.

Posted May 16, 2024 - 09:09 UTC

Update

Posted May 16, 2024 - 08:43 UTC

Investigating

Posted May 16, 2024 - 08:12 UTC

Identified

Access to the platform has been restored but integrations (API & CloudWorks) are currently unavailable. We are continuing investigations and further updates will be provided every 30 minutes or upon resolution.

Posted May 16, 2024 - 07:41 UTC

Investigating

We are currently investigating an issue impacting customers’ ability to access the Anaplan Platform.

We are working to resolve this issue as quickly as possible and will provide updates every 30 minutes or upon resolution.

Posted May 16, 2024 - 07:18 UTC

This incident affected: us1: Data Center - US East, us2: Data Center - US West, eu1: Data Center - Netherlands, eu2: Data Center - Germany, us5: Cloud - US East, ap1: Cloud - Japan, and us7: Cloud - US.

Platform Alerts

Postmortem

Resolved

Update

Monitoring

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Investigating

Identified

Update

Update

Update

Investigating

Identified

Investigating

Need more help?

Visit Support portal

Register for Support portal

Call