Friday, April 19, 2024
No menu items!
HomeCloud ComputingRoblox’s cloud-native catastrophe: A post mortem

Roblox’s cloud-native catastrophe: A post mortem

In late October Roblox’s global online game network went down, an outage that lasted three days. The site is used by 50 million gamers daily. Figuring out and fixing the root causes of this disruption would take a massive effort by engineers at both Roblox and their main technology supplier, HashiCorp.

Roblox eventually provided an amazing analysis in a blog post at the end of January. As it turned out, Roblox was bitten by a strange coincidence of several events. The processes Roblox and HashiCorp went through to diagnose and ultimately fix things are instructive to any company running a large-scale infrastructure-as-code installation or making heavy use of containers and microservices across their infrastructure.

To read this article in full, please click here

InfoWorld Cloud ComputingRead More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments