On January 28, 2025, at about 20:00 UTC, we started getting customer cases about problems when trying to load UX pages and Apps. This issue impacted customers in us7: Anaplan Amazon Cloud Public — US and resulted in time-out errors.
Although our alerting didn't immediately indicate a platform wide issue, a platform incident was declared at 21:46 UTC. Our investigation identified a number of failures in the logs relating to a dependent backend service. A review of the service identified that a release had been deployed earlier in the day. We rolled back the release and observed that the failures had ceased. Further checks were completed to ensure that the platform was fully functional. Full service was restored 22:51 UTC.
Post incident, we reviewed the root cause and identified that the issue was unrelated to the release. Instead, we found that the backend service was unable to communicate with key infrastructure. This was the source of the failures we observed during the incident. Part of the rollback process initiated a restart of the backend service, which recycled the infrastructure on which the service was hosted. This allowed connectivity to be restored and the issues subsided.
To prevent similar incidents and further improve service reliability, we are working to improve our alerting in this space. We are reviewing the possibility of simplifying the network path between the backend service and the infrastructure. We are also carrying out a thorough assessment of the backend service to improve tolerance of network connectivity issues.
We understand the significant impact service interruptions may have on your business operations and we remain committed to maintaining stable, reliable performance. Our team continues to monitor all systems closely to ensure consistent service quality.
If you have any further questions or concerns, please contact Anaplan Customer Care. Thank you for your patience during this situation and thank you for being an Anaplan customer.