How Verizon and a BGP Optimizer caused a major internet outage affecting Amazon, Facebook, CloudFlare among others

Yesterday, many parts of the Internet faced an unprecedented outage as Verizon, the popular Internet transit provider accidentally rerouted IP packages after it wrongly accepted a network misconfiguration from a small ISP in Pennsylvania, USA.

According to The Register, “systems around the planet were automatically updated, and connections destined for Facebook, Cloudflare, and others, ended up going through DQE and Allegheny, which buckled under the strain, causing traffic to disappear into a black hole”.

According to Cloudflare, “What exacerbated the problem today was the involvement of a “BGP Optimizer” product from Noction. This product has a feature that splits up received IP prefixes into smaller, contributing parts (called more-specifics). For example, our own IPv4 route 104.20.0.0/20 was turned into 104.20.0.0/21 and 104.20.8.0/21”.

Many Google users were unable to access the web using the Google browser. Some users say the Google Calendar went down too. Amazon users were also unable to use some services such as Amazon books, as users were unable to reach the site.

how-verizon-and-a-bgp-optimizer-caused-a-major-internet-outage-affecting-amazon-facebook-cloudflare-among-others-img-0

Source: Downdetector

how-verizon-and-a-bgp-optimizer-caused-a-major-internet-outage-affecting-amazon-facebook-cloudflare-among-others-img-1

Source:Downdetector

how-verizon-and-a-bgp-optimizer-caused-a-major-internet-outage-affecting-amazon-facebook-cloudflare-among-others-img-2

Source:Downdetector

Also, in another incident, on June 6, more than 70,000 BGP routes were leaked from Swiss colocation company Safe Host to China Telecom in Frankfurt, Germany, which then announced them on the global internet. “This resulted in a massive rerouting of internet traffic via China Telecom systems in Europe, disrupting connectivity for netizens: a lot of data that should have gone to European cellular networks was instead piped to China Telecom-controlled boxes”, The Register reports.

BGP caused a lot of blunder in this outage

The Internet is made up of networks called Autonomous Systems (AS), and each of these networks has a unique identifier, called an AS number. All these networks are interconnected using a Border Gateway Protocol (BGP), which joins these networks together and enables traffic to travel from an ISP to a popular website at a far off location, for example.

how-verizon-and-a-bgp-optimizer-caused-a-major-internet-outage-affecting-amazon-facebook-cloudflare-among-others-img-3

Source: Cloudflare

With the help of BGP, networks exchange route information that can either be specific, similar to finding a specific city on your GPS, or very general, like pointing your GPS to a state.

DQE Communications with an AS number AS33154, an Internet Service Provider in Pennsylvania was using a BGP optimizer in their network. It announced these specific routes to its customer, Allegheny Technologies Inc (AS396531), a steel company based in Pittsburgh.

This entire routing information was sent to Verizon (AS701), who further accepted and passed this information to the world.

“Verizon’s lack of filtering turned this into a major incident that affected many Internet services”, Cloudfare mentions.

“What this means is that suddenly Verizon, Allegheny, and DQE had to deal with a stampede of Internet users trying to access those services through their network. None of these networks were suitably equipped to deal with this drastic increase in traffic, causing disruption in service”

Job Snijders, an internet architect for NTT Communications, wrote in a network operators' mailing list, “While it is easy to point at the alleged BGP optimizer as the root cause, I do think we now have observed a cascading catastrophic failure both in process and technologies.”

https://twitter.com/bgpmon/status/1143149817473847296

Cloudflare's CTO Graham-Cumming told El Reg's Richard Speed, "A customer of Verizon in the US started announcing essentially that a very large amount of the internet belonged to them. For reasons that are a bit hard to understand, Verizon decided to pass that on to the rest of the world."

"but normally [a large ISP like Verizon] would filter it out if some small provider said they own the internet", he further added.

“If Verizon had used RPKI, they would have seen that the advertised routes were not valid, and the routes could have been automatically dropped by the router”, Cloudflare said.

https://twitter.com/eastdakota/status/1143182575680143361

https://twitter.com/atoonk/status/1143139749915320321

Rerouting is highly dangerous as criminals, hackers, or government-spies could be lurking around to grab such a free flow of data. However, this creates security distension among users as their data can be used for surveillance, disruption, and financial theft.

Cloudflare was majorly affected by this outage, “It is unfortunate that while we tried both e-mail and phone calls to reach out to Verizon, at the time of writing this article (over 8 hours after the incident), we have not heard back from them, nor are we aware of them taking action to resolve the issue”, the company said in their blogpost.

One of the users commented, “BGP needs a SERIOUS revamp with Security 101 in mind.....RPKI + ROA's is 100% needed and the ISPs need to stop being CHEAP. Either build it by Federal Requirement, at least in the Nation States that take their internet traffic as Citizen private data or do it as Internet 3.0 cause 2.0 flaked! Either way, "Path Validation" is another component of BGP that should be looked at but honestly, that is going to slow path selection down and to instrument it at a scale where the internet would benefit = not worth it and won't happen. SMH largest internet GAP = BGP "accidental" hijacks”

Verizon in a statement to The Register said, "There was an intermittent disruption in internet service for some [Verizon] FiOS customers earlier this morning. Our engineers resolved the issue around 9 am ET."

https://twitter.com/atoonk/status/1143145626516914176

To know more about this news in detail head over to CloudFlare’s blog.

OpenSSH code gets an update to protect against side-channel attacks

Red Badger Tech Director Viktor Charypar talks monorepos, lifelong learning, and the challenges facing open source software [Interview]

Facebook signs on more than a dozen backers for its GlobalCoin cryptocurrency including Visa, Mastercard, PayPal and Uber