Overview
On 31/03/21 a failure with the authentication API impacted customers authenticating via APIs and Currencycloud Direct.
Timeline
31/03/2021
11:56 UTC - Currencycloud monitoring detected a spike in errors on the /authenticate/api endpoint causing some requests to that endpoint to fail.
12:00: UTC - Currencycloud identified 5XX errors being returned on the authentication endpoint.
12:13 UTC - Currencycloud investigated a large number of requests to the /authenticate/api/ in a very short period of time which caused a row-level lock on the database. Rate limiting was introduced to reduce the impact on that endpoint.
12:48 - Currencycloud attempted to terminate processes to free up database table locks.
12:52 UTC - All requests to /authenticate/api continued to fail after the previous solution.
13:03 UTC - A rolling restart of api-v2 was attempted to clear the database locks.
13:33 UTC - Service Restored.
13:40 UTC - Rate limiting mitigations removed.
15:40 UTC - Currencycloud performed an emergency code change to prevent the issue from recurring.
Resolution
A restart of the api-v2 cleared the database locks along with a code change to prevent future issues.
Root Cause Analysis
A sudden increase in authentication requests to the authenticate API, caused a row-level lock in the corresponding table in the database, multiple authorisation requests timed out and failed causing a snowball effect of more authentications attempts. This maxed out the thread pool available, extending the problem to other customers.
Remediation Items