Yesterday, many parts of the Internet faced an unprecedented outage as Verizon, the popular Internet transit provider accidentally rerouted IP packages after it wrongly accepted a network misconfiguration from a small ISP in Pennsylvania, USA.
According to The Register, “systems around the planet were automatically updated, and connections destined for Facebook, Cloudflare, and others, ended up going through DQE and Allegheny, which buckled under the strain, causing traffic to disappear into a black hole”.
According to Cloudflare, “What exacerbated the problem today was the involvement of a “BGP Optimizer” product from Noction. This product has a feature that splits up received IP prefixes into smaller, contributing parts (called more-specifics). For example, our own IPv4 route 18.104.22.168/20 was turned into 22.214.171.124/21 and 126.96.36.199/21”.
Many Google users were unable to access the web using the Google browser. Some users say the Google Calendar went down too. Amazon users were also unable to use some services such as Amazon books, as users were unable to reach the site.
Also, in another incident, on June 6, more than 70,000 BGP routes were leaked from Swiss colocation company Safe Host to China Telecom in Frankfurt, Germany, which then announced them on the global internet. “This resulted in a massive rerouting of internet traffic via China Telecom systems in Europe, disrupting connectivity for netizens: a lot of data that should have gone to European cellular networks was instead piped to China Telecom-controlled boxes”, The Register reports.
BGP caused a lot of blunder in this outage
The Internet is made up of networks called Autonomous Systems (AS), and each of these networks has a unique identifier, called an AS number. All these networks are interconnected using a Border Gateway Protocol (BGP), which joins these networks together and enables traffic to travel from an ISP to a popular website at a far off location, for example.
With the help of BGP, networks exchange route information that can either be specific, similar to finding a specific city on your GPS, or very general, like pointing your GPS to a state.
DQE Communications with an AS number AS33154, an Internet Service Provider in Pennsylvania was using a BGP optimizer in their network. It announced these specific routes to its customer, Allegheny Technologies Inc (AS396531), a steel company based in Pittsburgh.
This entire routing information was sent to Verizon (AS701), who further accepted and passed this information to the world.
“Verizon’s lack of filtering turned this into a major incident that affected many Internet services”, Cloudfare mentions.
“What this means is that suddenly Verizon, Allegheny, and DQE had to deal with a stampede of Internet users trying to access those services through their network. None of these networks were suitably equipped to deal with this drastic increase in traffic, causing disruption in service”
Job Snijders, an internet architect for NTT Communications, wrote in a network operators’ mailing list, “While it is easy to point at the alleged BGP optimizer as the root cause, I do think we now have observed a cascading catastrophic failure both in process and technologies.”
we can confirm that earlier today there was a large BGP incident, causing 20k prefixes for 2400 network to be rerouted through AS396531 (a steel plant). and then on to its transit provider: Verizon (AS701)
Start time: 10:34:21 (UTC) End time: 13:26:07 (UTC)
— BGPmon.net (@bgpmon) June 24, 2019
Cloudflare’s CTO Graham-Cumming told El Reg’s Richard Speed, “A customer of Verizon in the US started announcing essentially that a very large amount of the internet belonged to them. For reasons that are a bit hard to understand, Verizon decided to pass that on to the rest of the world.”
“but normally [a large ISP like Verizon] would filter it out if some small provider said they own the internet”, he further added.
“If Verizon had used RPKI, they would have seen that the advertised routes were not valid, and the routes could have been automatically dropped by the router”, Cloudflare said.
The teams at @verizon and @noction should be incredibly embarrassed at their failings this morning which impacted @Cloudflare and other large chunks of the Internet. It’s absurd BGP is so fragile. It’s more absurd Verizon would blindly accept routes without basic filters.
— Matthew Prince 🌥 (@eastdakota) June 24, 2019
wow waking up to some BGP madness! Looks like many (all) of @Cloudflare AS 13335 prefixes are being rerouted through AS396531 (Allegheny Technologies) and Verizon AS701 is providing that transit via that path! Ugh not good! will dig more … #BGPleak
— Andree Toonk (@atoonk) June 24, 2019
Rerouting is highly dangerous as criminals, hackers, or government-spies could be lurking around to grab such a free flow of data. However, this creates security distension among users as their data can be used for surveillance, disruption, and financial theft.
Cloudflare was majorly affected by this outage, “It is unfortunate that while we tried both e-mail and phone calls to reach out to Verizon, at the time of writing this article (over 8 hours after the incident), we have not heard back from them, nor are we aware of them taking action to resolve the issue”, the company said in their blogpost.
One of the users commented, “BGP needs a SERIOUS revamp with Security 101 in mind…..RPKI + ROA’s is 100% needed and the ISPs need to stop being CHEAP. Either build it by Federal Requirement, at least in the Nation States that take their internet traffic as Citizen private data or do it as Internet 3.0 cause 2.0 flaked! Either way, “Path Validation” is another component of BGP that should be looked at but honestly, that is going to slow path selection down and to instrument it at a scale where the internet would benefit = not worth it and won’t happen. SMH largest internet GAP = BGP “accidental” hijacks”
Verizon in a statement to The Register said, “There was an intermittent disruption in internet service for some [Verizon] FiOS customers earlier this morning. Our engineers resolved the issue around 9 am ET.”
— Andree Toonk (@atoonk) June 24, 2019
To know more about this news in detail head over to CloudFlare’s blog.