Posted in

Director of Site Reliability Engineering

Director of Site Reliability Engineering

CompanyStellar Development Foundation
LocationSan Francisco, CA, USA
Salary$210000 – $310000
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • 3+ years of experience working as a Site Reliability Engineer
  • 3+ years of experience managing an SRE team
  • Strong track record of collaborating with dev teams at all stages of product development (design, development/CI, beta testing, production)
  • Strong track record collaborating on defining, measuring and driving improvements in KPIs
  • Strong track record assisting teams during Root Cause Analysis and post mortems
  • Designing and building out the infrastructure for large distributed systems
  • Maintaining highly-available infrastructure
  • Troubleshooting and understanding complex technical problems
  • Using configuration Management or IaC tooling such as Terraform, Ansible, Puppet
  • Building and maintaining infrastructure using Kubernetes
  • Highly autonomous; able to find clarity in ambiguous circumstances
  • Excellent communicator; comfortable working with remote team members

Responsibilities

  • Establish a clear vision and mandate for the Site Reliability Engineering team
  • Define the SRE team’s quarterly OKRs to best align with the company’s goals
  • Define processes of collaboration between SREs and development teams throughout the software development lifecycle
  • Define a career growth path for the SRE team, as well as coach and mentor individual contributors on the team
  • Define and track metrics across engineering and help hold engineering teams accountable for their KPIs
  • Coordinate priorities with other teams and areas of the organization
  • Participate in sprint planning and execution, track progress and oversee day-to-day tactical decisions
  • Design and build reliable systems, and infrastructure that is easy to use by software engineers
  • Monitor and troubleshoot systems in production
  • Define and participate in 24/7 on-call rotations alongside the team
  • Mediate technical discussions and review PRs
  • Jump in as needed with code fixes, troubleshooting and hands-on contributions
  • Collaborate across the Stellar ecosystem, engaging with key partners and advising on their integration to set them up for success

Preferred Qualifications

  • 3+ years of experience writing code in a major programming language
  • You have worked on an open source project
  • You have managed a distributed team
  • You build things for fun in your spare time