Posted in

Site Reliability Engineer III

Site Reliability Engineer III

CompanyBlue Origin
LocationSeattle, WA, USA
Salary$148014 – $207218.55
TypeFull-Time
DegreesBachelor’s
Experience LevelSenior, Expert or higher

Requirements

  • Bachelor’s degree in computer science, information technology, or a related technical field.
  • Strong coding skills (Golang or Python preferred), with a proven track record of developing reliable automation tools.
  • Practical experience implementing and maintaining SCM and artifact management systems (e.g., Git, GitLab, GitHub, Artifactory, Nexus).
  • Experience developing, maintaining, and troubleshooting complex CI/CD pipelines.
  • Experience with VM image building automation tools such as Packer or AWS Image Builder.
  • Experience with systems monitoring, alerting tools, and incident management.
  • Knowledge of cloud infrastructure and services, AWS, Azure, or GCP.
  • Comfortable with on-call responsibilities as a part of a 24/7 rotation.
  • Excellent written and verbal communication skills, with the ability to document processes and platform architecture effectively.
  • Familiarity with IaC tools such as Terraform, CloudFormation, or Ansible.
  • A passion for learning new technologies and continuously improving existing systems.
  • Ability to earn trust, maintain positive and professional relationships, and contribute to a culture of inclusion.
  • Must be a U.S. citizen or national, U.S. permanent resident (current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum.

Responsibilities

  • Configure, deploy, scale, and administer open source and commercial software.
  • Administer and scale our SCM and Artifacts repositories, ensuring best practices in branching, tagging, and versioning are followed.
  • Design, implement, and maintain CI/CD pipelines, optimizing build and deployment processes to increase developer productivity.
  • Develop and maintain scripts and automation tools using Golang or Python to streamline development operations.
  • Monitor system performance, proactively identifying and resolving bottlenecks in collaboration with the development teams.
  • Manage artifact repositories and ensure the secure storage and retrieval of build artifacts.
  • Engage in on-call rotation duties to troubleshoot, diagnose, and resolve urgent issues affecting the developer platforms, minimizing downtime.
  • Continuously evaluate and recommend improvements to our source code management and automation practices.
  • Document systems, processes, and procedures to enhance the knowledge base and foster a learning culture.
  • Collaborate closely with software engineering teams to align SRE principles with the entire software development lifecycle.

Preferred Qualifications

  • In-depth knowledge of security best practices in Git operations and artifact handling.
  • Hands-on experience with containerization technologies and orchestration systems, such as Docker and Kubernetes.
  • Experience in analyzing and troubleshooting distributed systems.
  • Experience in web development frameworks, especially React.
  • Strong problem-solving skills and the capability to work independently or as part of a team to meet tight deadlines.
  • Demonstrated track record of delivering projects successfully in a result-driven environment.