What's commonly involved in site reliability engineering?
The work involved in site reliability engineering (SRE) can vary quite a bit, depending on the companies and systems being worked on.
The basic definition of site reliability engineering is the process of putting people with software development experience in charge of operations, or mixing or combining the work of development and operations in some key way. That said, the role of the site reliability engineer often involves applying top-level design approaches to operations.
The approach of using site reliability engineering is similar to another approach called devops – both of them aim to combine development and operations. Where devops is often described as the process of merging the two departments, site reliability engineer is often used as a job title, replacing the traditional system administrator job title. The difference is that along with monitoring and serving systems, a site reliability engineer will also apply those development concepts, which is critical for making sure that developed programs work the way they're supposed to.
In practical terms, a site reliability engineer may be on call to monitor systems at any given time. This individual may write automation tools or assist in developing quality assurance features. Teams in SRE may evaluate uptime for an application, or otherwise look at how developed applications are practically used in the field.
Within the general concept of combining development and operations, the role of SRE is very flexible. Some would say that this approach also attempts to “bridge the gap” between the two departments in terms of communications and philosophy. So a person in SRE may end up in quite a number of meetings to talk practically about the use of developed products and services. SRE may be seen as a “stakeholder” in the devops process, someone who provides critical feedback on engineering and design with an eye toward operational performance.
Although some see SRE as a kind of dressed-up system administrator role, companies like Google are embracing the concept of SRE and investing a lot more in defining the role of this type of professional. Google engineers talk about some of the very important input that can be provided in the SRE process, and describe these professionals as being highly skilled and experienced in ways that traditional system administrators may not have been.