Posted in

Senior Staff Software Engineer – Reliability Engineering

Senior Staff Software Engineer – Reliability Engineering

CompanyAirbnb
LocationUnited States
Salary$244000 – $304000
TypeFull-Time
DegreesBachelor’s, Master’s, PhD
Experience LevelSenior, Expert or higher

Requirements

  • BS, MS, or PhD in computer science, related field, or equivalent work experience.
  • 12+ years of software engineering experience, with a significant portion dedicated to system architecture and design in consumer-facing technology companies.
  • Strong leadership skills, with 5+ years of experience as a senior-level technical lead or architect, driving the technical direction and strategy across multiple teams or projects.
  • Excellent communication and collaboration skills, with a proven track record of working effectively across teams and organizations.
  • Demonstrated expertise in building and scaling high-availability systems and platforms, with a deep understanding of multi-cloud environments.

Responsibilities

  • Develop a roadmap with a longer-term vision for Reliability and serve as a strategic thought partner within the organization.
  • Design, implement and influence company-wide SRE architecture, innovation, engineering, and standards.
  • Create incident management processes that can scale with the organization as it continues its rapid growth.
  • Foster the SRE/Reliability model that takes into consideration the nuances of an engineering culture that has a great sense of ownership over their services.
  • Bring a strong customer focus to the Reliability function, centered on optimizing the infrastructure and platform, and ensuring systems are highly available and performant.
  • Develop Production Readiness standards to ensure service reliability.
  • Automate as much as possible and always configure as code.
  • Predict future failures and work proactively to mitigate them.
  • Advocate and implement reliable design patterns (circuit breakers, graceful degradation, etc.).
  • Create a culture where Reliability is a state of mind, instilling a proactive approach to seeing patterns and opportunities to increase leverage and tooling.
  • Build deep partnerships with engineering leaders.
  • Work closely with product engineering teams on design and implementation choices of large-scale distributed systems.
  • Partner with the broader organization to learn from incidents through a blameless post mortem process.
  • Mentor and lead other Site Reliability Engineers. Uplevel and support others with servant leadership, mentorship, advocacy, and allyship.

Preferred Qualifications

    No preferred qualifications provided.