Home Security Cybersecurity How 3 glitches in Azure Active Directory MFA caused a 14-hour long...

How 3 glitches in Azure Active Directory MFA caused a 14-hour long multi-factor authentication outage in Office 365, Azure and Dynamics services

November 29, 2018 - 9:37 am

2350

3 min read

Early this week, Microsoft posted a report on what caused the multi-factor authentication outage in its Office 365 and Azure last week, which prevented users from signing into their cloud services for 14 hours.

Microsoft researchers reported that they found out three issues that combined to cause the log-in glitch. Interestingly, all these three glitches occurred within a single system, i.e. Azure Active Directory Multi-Factor Authentication, a service which Microsoft uses to monitor and manage multi-factor login for the Azure, Office 365, and Dynamics services.

According to the Microsoft researchers, “There were three independent root causes discovered. In addition, gaps in telemetry and monitoring for the MFA services delayed the from identification and understanding of these root causes which caused an extended mitigation time.”

All three glitches occurred within a single system: Azure Active Directory Multi-Factor Authentication. Microsoft uses that service to handle multi-factor login for the Azure, Office 364, and Dynamics services.

The three root causes for the multi-factor authentication outage

Microsoft, in their report, discovered three independent root causes. They said that the gaps in telemetry and monitoring for the MFA services delayed the identification and understanding of these root causes, which caused an extended mitigation time.
1. The first root cause manifested as latency issue in the MFA frontend’s communication to its cache services. This issue began under high load once a certain traffic threshold was reached. Once the MFA services experienced this first issue, they became more likely to trigger second root cause.

2. The second root cause is a race condition in processing responses from the MFA backend server that led to recycles of the MFA frontend server processes which can trigger additional latency and the third root cause (below) on the MFA backend.

The third identified root cause was previously undetected issue in the backend MFA server that was triggered by the second root cause. This issue causes accumulation of processes on the MFA backend leading to resource exhaustion on the backend at which point it was unable to process any further requests from the MFA frontend while otherwise appearing healthy in our monitoring.

On the day of the outage, these glitches first hit EMEA and APAC customers, and the US subscribers.

According to The Register, “Microsoft would eventually solve the problem by turning the servers off and on again after applying mitigations. Because the services had presented themselves as healthy, actually identifying and mitigating the trio of bugs took some time.”

Microsoft said, “The initial diagnosis of these issues was difficult because the various events impacting the service were overlapping and did not manifest as separate issues”. The company is further looking into ways to prevent the repetition of such an outage in the future by reviewing how it handles updates and testing. They also plan to review its internal monitoring services and how it contains failures once they begin.

To know more about this in detail, head over to Microsoft Azure’s official page.

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Your Quick Introduction to Extended Events in Analysis Services from Blog…

Logging the history of my past SQL Saturday presentations from Blog…

Storage savings with Table Compression from Blog Posts – SQLServerCentral

Daily Coping 31 Dec 2020 from Blog Posts – SQLServerCentral

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring the Strategy Behavioral Design Pattern in Node.js

How to integrate a Medium editor in Angular 8

Implementing memory management with Golang’s garbage collector

How to create sales analysis app in Qlik Sense using DAR…

How 3 glitches in Azure Active Directory MFA caused a 14-hour long multi-factor authentication outage in Office 365, Azure and Dynamics services

The three root causes for the multi-factor authentication outage

Read Next

Must Read in Security

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Win-KeX Version 2.0 from Kali Linux

Kali Linux 2020.3 Release (ZSH, Win-Kex, HiDPI & Bluetooth Arsenal) from...

Interviews

Learn Transformers for Natural Language Processing with Denis Rothman

Clean Coding in Python with Mariano Anaya

Bringing AI to the B2B world: Catching up with Sidetrade CTO Mark Sheldon [Interview]

On Adobe InDesign 2020, graphic designing industry direction and more: Iman Ahmed, an Adobe Certified Partner and Instructor [Interview]

Is DevOps experiencing an identity crisis? [Interview]

MobilePro

datapro

Programming

Subscribe to our newsletter