You are here

You are here

What it takes to be a site reliability engineer

Eveline Oehrlich Chief Research Officer, DevOps Institute

Site reliability engineering (SRE) is an occupation on the rise. In fact, some 22% of organizations have adopted the SRE role, up from 15% last year, according to the DevOps Institute's "Upskilling 2021: Enterprise DevOps Skills Report."

IT organizations are adopting the SRE discipline to create ultra-scalable and exceptionally reliable distributed software systems. Even regulated industries are embracing SRE to ensure resiliency. Pressures related to the current demands for digital services due to the pandemic have accelerated the call for resiliency even more.

If you're an IT Ops professional, or part of a DevOps team, this is a great time to think about a new career as an SRE. Here's what you need to know to make the move.

Why become an SRE?

If you're curious, love to learn about new things, and are passionate about designing, building, and running complex systems, then becoming an SRE makes perfect sense.

Let's face it, your job in operations doesn't lend itself to creativity. Operations is very reactive. You do the same work every day—chasing after and fixing problems. Not to belittle IT support and IT Ops teams, but the work can be boring, stressful, and not at all challenging.

The SRE role, on the other hand, allows you to affect both the future of your organization and the customer experience. From a career perspective, the job of an SRE is much more rewarding than most IT Ops positions, because you can use your abilities to create, design, improve, and re-engineer.

Essentially, an SRE replaces human labor with automation, generally by creating self-service tools for developers. An SRE team enhances the availability, performance, efficiency, monitoring, emergency response, and planning of production services and software.

Ops people who have an engineering mindset can function as SREs because they know how everything operates. They know how to reduce toil, automate, suggest test data, work with security teams, and so on. No one is better able to step into this engineering role than someone who supports the application.

On a practical level, the salary of an SRE is much higher than that of an IT Ops pro. The average annual pay for an IT operations specialist in the United States is $69,477, while the average annual salary for an SRE is $130,021, according to ZipRecruiter.

Skills you'll need

If you want to become an SRE, you have to solve problems effectively, learn continuously, make decisions quickly, collaborate/communicate closely with developers, and keep your cool under pressure.

Although there's no single answer when it comes to technical skills required, you'll likely need to know some common programming languages—such as Ruby, PHP, JavaScript, and SQL—and observability and monitoring tools.

You should have knowledge about the major cloud providers, AWS, Azure, and Google; be familiar with containerization, Docker, and Kubernetes; and understand Linux. You'll also need the ability to look at processes and re-engineer them with automation capabilities.

The key task for SREs is automating. SREs reduce toil, meaning manual work that doesn't add anything to the business and just causes problems. SREs are absolute automation machines. You can see how creativity, collaboration, communication, and various technical skills fit in.

SREs are Jacks of all trades. Because IT Ops pros know operations and have system administrator perspectives after doing troubleshooting for years, they're a match made in heaven for SRE positions.

Getting started

You should first reflect on your current skills in terms of experience and expertise. Think about your experiences (breadth), expertise (depth), and what tangible execution examples you've gained throughout your current or previous positions or during past DevOps journeys and engagements.

Take this inventory and compare it to the DevOps must-have skills domains to examine where you already have sufficient experience and/or expertise or where you could improve.

Understand where you've made an impact through your ability to execute. List some key tangible achievements and contributions you've made throughout your IT journey. These could be specific results, improvements, or other achievements. Remember to include the times when you've influenced results or outcomes.

Are you comfortable with the current experience, expertise, and contributions you've made to your team and to your organization? Are there opportunities that you see where you could add additional value or take on a different role?

Try to capture and understand your tolerance for exploration and your ability to change, which will make it easier if you want to explore other opportunities or roles. Your willingness and ability to be flexible in expanding, learning, and acquiring new skills are essential in the current and future evolution of SRE.

Reflect on your motivations and where you see yourself today and in the future.

Your next steps

After you've done this internal work, determine if your organization has some kind of upskilling training program, then let management know that you want to pursue a career in SRE. You can also look into obtaining some SRE certifications to enhance your resume.

Some companies also offer on-the-job training. There are also a lot of vendors in the IT automation community that offer training courses associated with their tools.

The SRE profession is a great opportunity for existing IT operation team members to step into engineering roles. Your future as an SRE is determined by the actions you take today. Maybe it's time to disrupt yourself.

Don’t miss my presentation during the DevOps Institute's one-day micro-conference on Site Reliability Engineering, on May 20, 2021. The conference, which features three speakers, is part of the DevOps Institute's ongoing SKILup Days event series.

Keep learning

Read more articles about: Enterprise ITIT Ops