Platform Alerts

Incident Report for Anaplan

Postmortem

On May 16, 2024, at 00:00 UTC, the Anaplan platform encountered service disruptions due to an unexpected increase in load on the API Services. This surge in load caused saturation, limiting available resources to a core back-end service, which resulted in intermittent outages across the platform.

This incident affected the following regions:

· Anaplan Data Center - US East

· Anaplan Data Center - US West

· Anaplan Data Center – Germany

· Anaplan Data Center – Netherlands

· Anaplan Google Cloud Public - US East

· Anaplan Google Cloud Public - Japan

· Anaplan Amazon Cloud Public – US

· Anaplan Amazon Cloud Public – Europe

The intermittent outages, preventing access to the Anaplan platform, occurred at the following times:

· 00:00-01:33 UTC

· 06:40-07:14 UTC

· 10:03-10:15 UTC

· 14:02-14:09 UTC

To stabilize the platform, we took immediate action to scale down the API and CloudWorks™ services. This measure paused all integration activities, allowing the core back-end service to recover and enabling customer access to the platform. While the API and CloudWorks™ services were scaled down, all integrations to the platform were unavailable. This affected Anaplan Connect, the Administration service and third party integrations. Integration functionality was restored once the API and CloudWorks™ services were scaled back up.

Integrations were unavailable during the following periods:

· 00:00-02:18 UTC

· 06:40-09:04 UTC

· 10:03-11:38 UTC

· 14:02-14:19 UTC

To identify the source of the increased load, we isolated a specific group of integrations. Although this initially seemed to alleviate the issue, analysis revealed that they weren’t the root cause. Further steps were taken to reduce the load on the API service by updating the configuration to the API retry mechanism at 14:05 UTC. Post this mitigation step, we monitored the platform for three hours until we resolved the issue at 17:09 UTC.

Additional analysis identified that an upstream service’s time-out configuration had been inadvertently lowered. The time-out change caused a more aggressive failure to be sent to the API service. This led to a significant surge in API retry requests and the increase in load on API services. The time-out change, in combination with the aggressive API retry mechanism and top-of-the-hour integrations traffic, resulted in resource saturation and subsequent outages of the platform.

We identified that the time-out configuration was unintentionally lowered as part of an upgrade completed to an upstream service on May 15, at 15:00 UTC. We have since updated the configuration to the former value. A thorough review is being conducted to understand how the upgrade led to this unintended change. In addition, we are reviewing the API retry mechanism. We have suspended the API retry functionality that's used in this type of failure scenario as this isn't used for daily processes. Furthermore, we have added increased observability for this scenario, and are increasing resources to the impacted core back-end service to add additional resiliency.

We apologize for any impact this issue may have had on your business operations. We are continuously strengthening our systems and procedures to ensure we avoid future disruptions to your business and users.

If you have further questions or concerns, please visit the Anaplan Support website. Thank you for your patience during this situation and thank you for being an Anaplan customer.

Posted May 23, 2024 - 09:15 UTC

Resolved

We have confirmed that the issue is now resolved.

We deeply apologize for any impact this issue may have caused. We appreciate your patience and partnership as we worked through this issue.

We will follow up within 7 business days with a detailed root cause analysis (RCA) that will be shared on our Status Page. If you have any question or concerns, please do not hesitate to contact us at Anaplan Support.

Posted May 16, 2024 - 03:11 UTC

Monitoring

Service has now been restored; you should now be able to resume normal activities.

We will continue to monitor the platform to ensure no additional issues arise. If you have any questions, concerns, or continue to experience issues, please do not hesitate to contact Anaplan Support. We will provide a final update to you when we consider this situation fully resolved.

Posted May 16, 2024 - 02:42 UTC

Update

We have identified the likely cause of the issue, and have restored access to the platform while we work on restoring integration functionality. We will provide further updates in 30 minutes or upon resolution.

Posted May 16, 2024 - 02:22 UTC

Identified

We have identified the likely cause of the issue, and we are focused right now on restoring service as quickly as possible. Currently, we do not yet have a time to resolution. We will provide further updates in 30 minutes or upon resolution.

Posted May 16, 2024 - 01:44 UTC

Update

We are currently investigating an issue impacting customers’ ability to access the Anaplan Platform.

We are working to resolve this issue as quickly as possible and will provide updates every 30 minutes or upon resolution.

Posted May 16, 2024 - 01:21 UTC

Update

Thank you for your patience as we continue to investigate this issue. Currently, we do not yet have a time to resolution. We will continue to provide updates every 30 minutes as we work to resolve this issue as quickly as possible.

Posted May 16, 2024 - 00:56 UTC

Update

We are currently investigating an issue resulting in customers experiencing intermittent issues when running API integrations.

We are working to resolve this issue as quickly as possible and will provide updates every 30 minutes or upon resolution.

Posted May 16, 2024 - 00:50 UTC

Update

Posted May 16, 2024 - 00:41 UTC

Investigating

Posted May 16, 2024 - 00:37 UTC

This incident affected: us1: Data Center - US East, us2: Data Center - US West, eu1: Data Center - Netherlands, eu2: Data Center - Germany, eu4: Cloud - Europe, us5: Cloud - US East, ap1: Cloud - Japan, us7: Cloud - US, us3: Cloud - US, us4: Cloud - US, ca1: Cloud - Canada, au1: Cloud - Australia, and eu3: Cloud - Europe.

Platform Alerts

Postmortem

Resolved

Monitoring

Update

Identified

Update

Update

Update

Update

Investigating

Need more help?

Visit Support portal

Register for Support portal

Call