Site Reliability Engineering

What Does Site Reliability Engineering Mean?

Site reliability engineering (SRE) is an approach to website operations that uses techniques from software engineering to build more reliable websites. Site reliability engineering was first developed at Google in 2003. The term is related to DevOps, which also mixes software engineering with system administration, but DevOps involves automating manual tasks.

Advertisements

Techopedia Explains Site Reliability Engineering

Site reliability engineering involves the use of software engineering techniques, including algorithms, data structures, performance and programming languages to achieve web applications that are highly reliable. The approach was first developed at Google in 2003.

In an interview, Google vice president of engineering Ben Traynor said that the company hired 50-50 mix of people with backgrounds in both software engineering and system administration for its SRE teams. Google assigns small SRE teams to major projects. Traynor attributed Google's remarkable uptime to the automation of many site operations activities. While failures occasionally happen, they are fixed quickly because the SRE team has automated so many tasks beforehand.

Google has also taken inspiration from role-playing games in the way that it has structured operation readiness drills to test engineers in the case of failures that do require automation. The company calls these exercises "Wheel of Misfortune," where one employee plays the role of the system and one plays the role of the on-call engineer. Traynor said this approach engaged engineers to think about reliability more than conventional drills did.

SRE is similar to DevOps, but the latter focuses on automating the deployment of systems generally, while SRE focuses specifically on reliability.

Advertisements

Related Terms

Margaret Rouse

Margaret Rouse is an award-winning technical writer and teacher known for her ability to explain complex technical subjects to a non-technical, business audience. Over the past twenty years her explanations have appeared on TechTarget websites and she's been cited as an authority in articles by the New York Times, Time Magazine, USA Today, ZDNet, PC Magazine and Discovery Magazine.Margaret's idea of a fun day is helping IT and business professionals learn to speak each other’s highly specialized languages. If you have a suggestion for a new definition or how to improve a technical explanation, please email Margaret or contact her…