Overview
On the 20th January 2023 Currencycloud clients experienced issues accessing the production environment.
Timeline
11:36 UTC - Our monitoring systems alerted us to a large number of errors across multiple endpoints
11:39 UTC - Investigations pointed towards a recent deployment in an internal service
11:43 UTC - Database CPU hit high CPU load
11:52 UTC - Change was rolled back to the previous version
12:10 UTC - Issue Resolved
Resolution
The change that caused the issue was reverted.
Root Cause Analysis
A recent MySQL upgrade performed on the weekend of the 14th January 2023, caused a degradation in performance of an SQL query used by internal services. As part of the MySQL upgrade, the behaviour of the query optimiser changed, creating a performance issue on a query, which caused performance issues on the platform. On the evening of Thursday 19th January 2023, a configuration change was made to revert this optimiser change and stabilised the service temporarily.
On the morning of Friday 20th January 2023, our development team prepared a fix to resolve this issue permanently, to ensure the right optimisation of queries. This change had unintended consequences that significantly degraded the performance of an internal service database, causing high CPU load and degradation of service. The change was rolled back to resolve the issue.
Remediation Items