Cloud & Networking

Google Kubernetes Engine was down last Friday, users left clueless of outage status and RCA

2 min read

On the 9th of November, at 4.30 am US/Pacific time,  the Google Kubernetes Engine faced a service disruption. It was questionable whether or not a user would be able to launch a node pool through Cloud Console UI. The team responded to the issue saying that they would get back to users with more information by Friday, 9th November 04:45 am US/Pacific time.

However, this was not solved by the given time. Another status update was posted by the team assuring users that mitigation work was underway by the Engineering Team. Users were to be posted with another update by 06:00 pm US/Pacific with current details.

In the meantime, affected customers were advised to use gcloud command to create new Node Pools.

An update for the issue being finally resolved was posted on Sunday, the 11th of November, stating that services were restored on Friday at 14:30 US/Pacific time.  . However, no proper explanation has been provided regarding what led to the service disruption. They did mention that an internal investigation of the issue will be done and appropriate improvements to the systems will be implemented to help prevent or minimize future recurrence of the issue.

According to a user’s summary on Hacker News, “Some users here are reporting that other GCP services not mentioned by Google’s blog are experiencing problems. Some users here are reporting that they have received no response from GCP support, even over a time span of 40+ hours since the support request was submitted.

According to another user, “When everything works, GCP is the best. Stable, fast, simple, reliable. When things stop working, GCP is the worst. They require way too much work before escalating issues or attempting to find a solution”.
We can’t help but agree looking at the timeline of the service downtime.

Users have also expressed disappointment over how the outage was managed.

Source:Hacker News

With users demanding a root cause analysis of the situation, it is only fitting that Google provides one so users can trust the company better.
You can check out Google Cloud’s blog post detailing the timeline of the downtime.

Read Next

Machine Learning as Service (MLaaS): How Google Cloud Platform, Microsoft Azure, and AWS are democratizing Artificial Intelligence

Google’s Cloud Robotics platform, to be launched in 2019, will combine the power of AI, robotics and the cloud

Build Hadoop clusters using Google Cloud Platform [Tutorial]

 

Melisha Dsouza

Share
Published by
Melisha Dsouza

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago