Written by:
by Damian Jennings
Date Posted:
23 February 2018

Well done Google

93 minutes & counting
Google’s admitted the 93 minute Compute Engine outage was the fault of autoscale not working. So, it was more of an autofail really. Ha ha ha.

Still, at least it wasn’t a real life human person that cocked up. For once.

The Big G put it down to a “network programming failure” and said the autoscaler didn’t work as it should have. This led to Compute Engine being out for over an hour and a half.

Too late to apologise? 
In an extraordinarily long quasi-apology, they dribbled:

“Propagation of Google Compute Engine networking configuration for newly created and migrated VMs is handled by two components. The first is responsible for providing a complete list of VM’s, networks, firewall rules, and scaling decisions. The second component provides a stream of updates for the components in a specific zone.”

OK mate, but what does that mean? Well, it means that during the outage, the first step of the process sent no data. Which means VMs in other zones didn’t know how to speak to their mates. The Autoscaler also needed this initial data, so it too fell over.

Why though? Apparently a good old “stuck process” (maybe they should have turned it all off and on again?).

What should have happened is the fail over should have essentially ctrl alt deleted and killed the errant process. But instead, it decided not to bother. Maybe it was busy. Maybe it just couldn’t be bothered. Who can tell.

False promises
Goog said:

“The engineering team was alerted when the propagation of network configuration information stalled. They manually failed over to the replacement task to restore normal operation of the data persistence layer.”

They went on to promise they would:

“Stop VM migrations if the configuration data is stale” and “the data persistence layer to re-resolve their peers during long-running processes, to allow failover to replacement tasks.”

<shamelessplug>Sometimes, having a fully managed host like, er, well, I dunno, Hyve seems like a very good idea compared to AWS…</shamelessplug>

Rating: 5.0. From 1 vote.
Please wait...

Leave a Reply

Be the First to Comment!

Notify of

Hyve are 100% carbon neutral. We use carbon offsetting to balance out the release of carbon dioxide from our offices and infrastructure.