Intermedia Reason for Outage Report

Intermedia Reason for Outage Report

Service Issue Description

Date

  • Tuesday, September 3rd,2013

Service Impact

  • Customers were unable to connect to Intermedia services and experienced mail delays
  • Intermedia’s customer communications systems were not available

Event Start Time (approximate)

  • 7:00 AM ET: Virginia data center
  • 8:00 AM ET: New Jersey, California and Colorado data centers
  • 8:00 AM ET: Intermedia’s customer communications systems

Event Resolution Time (approximate)

  • 12:00 PM ET: Service connectivity restored for California and Colorado data centers
  • 12:30 PM ET: Service connectivity restored for New Jersey data center
  • 12:30 PM ET: Service connectivity restored for Intermedia’s customer communication systems
  • 3:00 PM ET: Service connectivity restored for Virginia data center
  • End of day: All mail queues were cleared

Reason for Outage (RFO)

These service issues were the result of two separate internal network events.

The first issue occurred between our Internet edge routers and core data center switches in the Virginia data center. A soft failure on a line card prevented the redundant equipment from passing traffic properly between the Internet and Intermedia’s services.

The second issue occurred due to a series of individual events:

  • As part of our integration work for a recent acquisition, we’d previously implemented a temporary modification to our routing policies as per our standard change management policy.
  • While our network engineers worked to restore connectivity to the Virginia data center after the failure of the line card, an error was introduced into this temporary routing policy.
  • This error caused an overflow of the hardware routing tables across our data center core switches, creating packet loss and preventing customer connectivity to Intermedia services.

This second issue also caused Intermedia to lose its customer communication capabilities, including our phones and customer notification tool (SMS, email and alternate email notification, service status page). This is due to the fact that these systems reside on the same production network as our customers’ services.

There was no data loss during the issue. All mail was queued on our systems and delivered to customer mailboxes once service was restored.

Resolution Actions

Connectivity to our California, Colorado and New Jersey data centers as well as our customer communication systems was restored through a reload of the affected core data center switches, which cleared the hardware overflow condition.

Connectivity to the Virginia data center was restored by a reload of the affected core switch. A reload of the affected switch cleared the hardware errors on the line card that had created the soft failure scenario. Once the hardware errors were eliminated, the traffic routed as expected, resolving the issue.

Preventative Actions

We have permanently removed the temporary routing policy that had been implemented, making a reoccurrence of the same issue impossible. Additionally, although our network hardware vendor has confirmed that there are no current hardware issues with the line card since the switch was reloaded, we are proactively replacing the line card on the affected switch in Virginia. This action will be performed this weekend during our regularly scheduled maintenance window.

Additional Remediation Actions

We have initiated a project to isolate Intermedia’s customer communication systems on a separate network from our production services. We expect to complete this project shortly.

About Jonathan McCormick

Jonathan McCormick is Intermedia's COO.