Leadup
While performing a routine deployment of a new version to production, an anomaly occurred, causing the authentication service to fail to start up.
Fault
A database migration failed, causing subsequent database migration tasks to not execute.
Detection
QA immediately ran a test after the deployment to log out and log back into production using Beyond Identity Authenticator, and they observed the issue.
Root cause
Failed database migration left the database in a state not expected by the cloud services.
Mitigation and resolution
The Site Reliability Team escalated to the cloud engineering team, who identified the root cause and prepared a manual fix to the content in the database. After implementing the data repair, the SRE team re-executed the routine deployment successfully, restoring all services back to their normal state.