It’s hard to believe that it’s already been six years since we published Site Reliability Engineering: How Google Runs Production Systems with O’Reilly Media. We’ve been both humbled and pleasantly surprised by how popular the book has been, and continues to be. You may already be familiar with the two related books Google published after the SRE Book became a bestseller: The Site Reliability Workbook and Building Secure and Reliable Systems. All three books are available for free at sre.google/books.
It’s perhaps harder to find and explore the numerous journal articles, longer format reports, blog posts, and trainings that Google SREs have published since 2016. Google SREs have also given dozens of talks at conferences about the topics covered in the SRE Book in the intervening years. While the content in the book remains largely evergreen, SRE is a dynamic field, and we’ve had a lot more to say as our practices have evolved and gained depth.
To make this body of work more discoverable, we’ve put together a compendium of this material, mapped by topic to each chapter of the book on sre.google: SRE Book Updates, by Topic. Here you’ll find dozens more resources on some of our most popular topics, such as SLOs, Monitoring and Alerting, Canarying, Incident Management and Postmortem Culture, and Training SREs. Please explore away!
Of course, SREs have also spoken and written about topics beyond what’s covered in the SRE Book (for example: Machine Learning, Capacity Planning, Innovations, and Security and Privacy); stay tuned for a catalog of those resources.
Cloud BlogRead More