On July 2, 2019, Cloudflare suffered a major outage due to massive spike in CPU utilization in the network. Ten days after the outage, on July 12, Cloudflare’s CTO John Graham-Cumming, has released a report highlighting the details about how the Cloudflare service went down for 27 minutes.
During the outage, the company speculated the reason to be a single misconfigured rule within the Cloudflare Web Application Firewall (WAF), deployed during a routine deployment of new Cloudflare WAF Managed rules. This speculation turns out to be true and caused CPUs to become exhausted on every CPU core that handles HTTP/HTTPS traffic on the Cloudflare network worldwide.
Graham-Cumming said they are “constantly improving WAF Managed Rules to respond to new vulnerabilities and threats”.
The CPU exhaustion was caused by a single WAF rule that contained a poorly written regular expression that ended up creating excessive backtracking.
Source: Cloudflare report
The regular expression that was at the heart of the outage is :
Graham-Cumming says Cloudflare deploys dozens of new rules to the WAF every week, and also have numerous systems in place to prevent any negative impact of that deployment. He shared a list of vulnerabilities that caused the major outage.
Graham-Cumming said they had stopped all release work on the WAF completely and are following some processes:
Users have appreciated Cloudflare’s efforts in taking immediate calls for the outage and being completely transparent about the root cause of it with a complete post mortem report.
“We are ashamed of the outage and sorry for the impact on our customers. We believe the changes we’ve made mean such an outage will never recur,” Graham-Cumming writes.
Read the complete in-depth report by Cloudflare on their blog post.
Cloudflare adds Warp, free VPN to 1.1.1.1 DNS app to improve internet performance and security
Cloudflare raises $150M with Franklin Templeton leading the latest round of funding
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…