Service unavailable across multiple instances Friday 24th January 2020 10:47:00
We are experiencing an issue affecting the service available across multiple customer instances
On Friday, 24th January, at 10:47, our monitoring systems alerted us to a problem with our configuration servers. The root cause has been identified and relates to an error that caused a partial replication of our configuration database. During this time, some customer instances were effected and exhibited extended API response times, which for all intents made the affected instances unavailable.
The problem was detected and rectified by 10:58, 11 minutes after the first alert was raised.
We have levels of redundancy built into our configuration database deployment, but this was a new failure scenario we have not previously seen, and our resilience strategy failed under these conditions. We have now implemented a fix and deployed to production to ensure that in the future, under similar circumstances, we will not see the same failure mode.
We apologize for any inconvenience this caused.
Posted 3 years ago
The issue which caused the service outage has been fixed. We are investigating to identify what caused the issue and will come back with updates. We are truly sorry for all the troubles this issue has caused.