Posted in

Site Reliability Engineer

Site Reliability Engineer

CompanyZoox
LocationSan Mateo, CA, USA
Salary$165000 – $222000
TypeFull-Time
Degrees
Experience LevelSenior, Expert or higher

Requirements

  • 6+ years of experience in site reliability engineering or a similar role, with a strong background in working with large-scale distributed systems.
  • Proven experience with cloud platforms such as AWS, GCP, or Azure.
  • Expertise in container orchestration technologies like Kubernetes.
  • Deep understanding of networking, storage, and database technologies.
  • Strong programming skills in languages such as Python, Go, C/C++ or Java.
  • Experience with infrastructure as code tools such as Ansible, Salt, Terraform or CloudFormation.

Responsibilities

  • Design and implement highly scalable and reliable systems to support Zoox’s autonomous vehicle platform.
  • Optimize system performance, reliability, and scalability.
  • Develop and maintain monitoring, alerting, and reporting systems to ensure proactive identification and resolution of issues.
  • Collaborate with software engineering teams to improve deployment processes and automation.
  • Conduct root cause analysis of production issues and implement corrective actions.
  • Implement disaster recovery and business continuity plans.

Preferred Qualifications

  • Experience in the automotive or autonomous vehicle industry.
  • Knowledge of security best practices and compliance requirements.
  • Previous experience in a leadership or mentorship role.