sFlow: Microsoft Office 365 outage

Tuesday, June 24, 2014

Microsoft Office 365 outage

6/24/2014 Information Week - Microsoft Exchange Online Suffers Service Outage, "Service disruptions with Microsoft's Exchange Online left many companies with no email on Tuesday."

The following entry on the Microsoft 365 community forum describes the incident:

====================================

Closure Summary: On Tuesday, June 24, 2014, at approximately 1:11 PM UTC, engineers received reports of an issue in which some customers were unable to access the Exchange Online service. Investigation determined that a portion of the networking infrastructure entered into a degraded state. Engineers made configuration changes on the affected capacity to remediate end-user impact. The issue was successfully fixed on Tuesday, June 24, 2014, at 9:50 PM UTC.

Customer Impact: Affected customers were unable to access the Exchange Online service.

Incident Start Time: Tuesday, June 24, 2014, at 1:11 PM UTC

Incident End Time: Tuesday, June 24, 2014, at 9:50 PM UTC

=====================================

The closure summary shows that operators took 8 hour 39 minutes to manually diagnose and remediate the problem with degraded networking infrastructure. The network related outage described in this example is not an isolated incident; other incidents described on this blog include: Packet loss, Amazon EC2 outage, Gmail outage, Delay vs utilization for adaptive control, and Multi-tenant performance isolation.

The incidents demonstrate two important points:

Cloud services are critically dependent on the physical network
Manually diagnosing problems in large scale networks is a time consuming process that results in extended service outages.

The article, SDN fabric controller for commodity data center switches, describes how the performance and resilience of the physical core can be enhanced through automation. The SDN fabric controller leverages the measurement and control capabilities of commodity switches to rapidly detect and adapt to changing traffic, reducing response times from hours to seconds.

Tuesday, June 24, 2014

Microsoft Office 365 outage

No comments:

Post a Comment