Intermittent issues using Admin Portal

Incident Report for Beyond Identity

Postmortem

Leadup

Automated monitoring detected a crash in the API microservice.

Detection

The engineering team started investigating and identified an issue with messages coming from another microservice.

Root Cause

A poorly formed message between two microservices caused the recipient service to crash. Automation restored the crashed service, but as it started re-processing the queue a new crash occurred. This caused intermittent issues accessing and operating Admin Portal.

Mitigation and Resolution

The engineering team isolated the poorly formed message type and added them to be ignored to prevent further crashing restoring the service to a fully operational state.

The development team had already identified this potential issue and implemented fixes in the code which will allow the system to be resilient for poorly formed messages.

As this is not possible to be caused via user interaction, the permanent fix will be deployed in the next release to prevent similar situations from occurring in the future.

Posted Oct 16, 2020 - 21:14 CDT

Resolved

The incident has been resolved. Preventive actions have been implemented. A permanent fix will be deployed in the next release.

Posted Oct 15, 2020 - 14:04 CDT

Identified

The issue has been identified and the engineering team is working towards a stable fix. The Admin Portal is currently functioning normally.

Posted Oct 15, 2020 - 13:29 CDT

Investigating

We are currently investigating an issue with Admin Portal. There are some intermittent login issues.

Posted Oct 15, 2020 - 12:53 CDT

This incident affected: USA - Admin Console.