Platform Alerts
Incident Report for Anaplan
Postmortem

On October 13, 2022, at approximately 10:58 UTC, we received multiple synthetic alerts that indicated an issue preventing customers from accessing the Anaplan Platform. Initial investigations identified an issue with a job that was running on one of the database nodes as part of a data migration. This issue caused the node to lock. The purpose of the migration was to backfill a new database for future functionality. The migration had been planned in a phased approach, and five steps had been successfully carried out previously.

We attempted to stop both the job and the node to restore service, but the issue persisted. The issue then started to cause the remaining nodes to also lock due to thread exhaustion. We, therefore, made the decision to rebuild the database cluster to restore service.

The rebuild of the first node completed at 12:27 UTC. Customers would have been able to access the platform, but service performance was still degraded, and Cloudworks and Plan IQ services were not available. We initiated the rebuild of the second node immediately. However, the rebuild failed halfway through due to a network blip caused by transferring a file. The build had to be initiated again. The rebuild of the second node completed successfully at 14:21 UTC. We were then able to balance the workload between both nodes, and service was fully restored at 14:42 UTC.

To avoid a reoccurrence of this issue, we have reviewed the migration process. We have enhanced the performance testing metrics. The migrations will be completed in smaller deliverables. Queries have been optimized to ensure jobs run more efficiently. We will also add stop gaps throughout the migrations to ensure each deliverable completes before another deliverable starts. Furthermore, we are reviewing alternative opportunities to restore service more quickly outside of rebuilding the database cluster.

We deeply apologize for the impact this had on your business operations. We understand the disruption these issues can cause to your business and users, and we are continuously strengthening our systems and procedures to avoid similar issues from happening again in the future. If you have any further questions or concerns, please contact Anaplan Support. Thank you for your patience during this situation and thank you for being an Anaplan customer.

Posted Oct 18, 2022 - 18:45 UTC

Resolved
We have confirmed that the issue is now resolved. We deeply apologize for any impact this issue may have caused.We appreciate your patience and partnership as we worked through this issue. We will follow up within 7 business days with a detailed root cause analysis (RCA) that will be shared on our Status Page. If you have any questions, concerns, or continue to experience errors, please do not hesitate to contact us at Anaplan Support.
Posted Oct 13, 2022 - 15:13 UTC
Monitoring
The platform is now fully operational; the cluster has been rebuilt and Cloudworks has also been enabled. We will continue to monitor for the next 30 minutes and provide a further update at that time.

If you have any questions, concerns, or continue to experience issues, please do not hesitate to contact Anaplan Support.
Posted Oct 13, 2022 - 14:42 UTC
Update
The cluster rebuild is in progress but limited access to the platform is now available. Please note that performance may be degraded while we continue to work on restoring all service, we would therefore recommend that customers only access the platform at this stage if this is critical for business operations.
Additionally, we have currently disabled CloudWorks to reduce load on the platform. This service is therefore currently unavailable and will be enabled as soon as possible.
We do not have an estimated time for full recovery at this stage, but we will provide a further update in 30 minutes.
Posted Oct 13, 2022 - 14:20 UTC
Update
The cluster rebuild is in progress but limited access to the platform is now available. Please note that performance may be degraded while we continue to work on restoring all service, we would therefore recommend that customers only access the platform at this stage if this is critical for business operations.
Additionally, we have currently disabled CloudWorks to reduce load on the platform. This service is therefore currently unavailable and will be enabled as soon as possible.
We do not have an estimated time for full recovery at this stage, but we will provide a further update in 30 minutes.
Posted Oct 13, 2022 - 13:51 UTC
Update
The cluster rebuild is in progress but limited access to the platform is now available. Please note that performance may be degraded while we continue to work on restoring all service, we would therefore recommend that customers only access the platform at this stage if this is critical for business operations. We do not have an estimated time for full recovery at this stage, but we will provide a further update in 30 minutes.
Posted Oct 13, 2022 - 13:22 UTC
Update
The cluster rebuild is in progress and we continue to look at options to restore core services while the rebuild completes. We do not have an estimated time to recovery at this stage, but we will provide a further update in 15 minutes.
Posted Oct 13, 2022 - 13:07 UTC
Update
The cluster rebuild is in progress and we continue to look at options to restore core services while the rebuild completes. We do not have an estimated time to recovery at this stage, but we will provide a further update in 15 minutes.
Posted Oct 13, 2022 - 12:53 UTC
Update
The cluster rebuild is in progress and we are looking at options to restore core services while the rebuild completes. We do not have an estimated time to recovery at this stage, but we will provide a further update in 15 minutes.
Posted Oct 13, 2022 - 12:39 UTC
Update
The cluster rebuild is in progress and it is currently in recovery mode. We do not have an estimated time to recovery at this stage, but we will provide a further update in 30 minutes.
Thank you for your patience while we work through this issue.
Posted Oct 13, 2022 - 12:16 UTC
Update
We have experienced an issue with our main MySQL cluster, resulting in a platform outage. We are in the process of rebuilding the cluster and expect to have a further update in the next 30 minutes. We are working hard to restore service as soon as possible and appreciate the impact this issue has on your business.
Posted Oct 13, 2022 - 11:49 UTC
Identified
We have identified the likely cause of the issue, and we are focused right now on restoring your service as quickly as possible. We will provide further updates within the next 30 minutes. Thank you for your patience while we work through this issue.
Posted Oct 13, 2022 - 11:22 UTC
Update
We are currently experiencing issues impacting customers’ ability to login to the platform. Additionally, we have received reports that users are currently being kicked out of the platform.
Our engineering team is working to identify the root cause and implement a solution.
We will continue to provide updates to you as we work to resolve this issue as quickly as possible.
Posted Oct 13, 2022 - 11:09 UTC
Investigating
We are investigating a potential issue impacting Anaplan Platform.
We will provide regular updates on this page until the issue is resolved.
Posted Oct 13, 2022 - 11:05 UTC