Automated monitoring detected a crash in the API microservice.
The engineering team started investigating and identified an issue with messages coming from another microservice.
A poorly formed message between two microservices caused the recipient service to crash. Automation restored the crashed service, but as it started re-processing the queue a new crash occurred. This caused intermittent issues accessing and operating Admin Portal.
The engineering team isolated the poorly formed message type and added them to be ignored to prevent further crashing restoring the service to a fully operational state.
The development team had already identified this potential issue and implemented fixes in the code which will allow the system to be resilient for poorly formed messages.
As this is not possible to be caused via user interaction, the permanent fix will be deployed in the next release to prevent similar situations from occurring in the future.