Posted in

Staff Site Reliability Engineer

Staff Site Reliability Engineer

CompanyCyberark
LocationSalt Lake City, UT, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesBachelor’s
Experience LevelSenior, Expert or higher

Requirements

  • B.S. in Computer Science or equivalent experience
  • Minimum 4 years of experience managing AWS infrastructure
  • Minimum of 7 years in a senior, architect or a technical lead role of site reliability, systems engineering or software development
  • A deep understanding of Site Reliability, infrastructure and Cloud Platform
  • Expert understanding/experience of containerization services such as Docker/Kubernetes
  • Expert in observability tooling such as Datadog, NewRelic, Logstash, Elasticsearch
  • Solid understanding/experience of web services, databases and relating infrastructure/architectures
  • Solid understanding of backup/restore best practices
  • Strong level of expertise programming writing configuration management languages
  • Strong level of expertise programming in Python / Java or equivalent language
  • Excellent Troubleshooting Skills
  • Experience supporting an enterprise-level SaaS environment

Responsibilities

  • Design Implementation of AWS infrastructure components such as VPCs, EC2, EKS, S3, tagging schemes, CloudFormation, etc.
  • Lead architecture, designs and feature analysis of deployment and management automation of cloud-based infrastructure and software
  • Provide guidance to Site Reliability and DevOps Engineers on managing the reliability and performance of SaaS environments as well as on building automation to prevent problem reoccurrence
  • Architecting and guiding the team with the use of configuration management tools in both Windows and Linux – CloudFormation, Helm, Terraform, Salt, Ansible
  • Ensuring cloud-based architectures meet availability and recoverability requirements
  • Architecture and implementation of cloud-based monitoring, alerting and reporting – Datadog, Logz.io, CloudWatch, Catchpoint, ELK
  • Support and guidance on tooling that helps to enable teams for greater output and reliability
  • Deep understanding of the latest tech solutions, trends, and ability to dive into the details of the architecture as needed
  • Work with the Team Leads within the group to identify areas of improvement, prepare architecture road maps, and advocate to the Product Management group.

Preferred Qualifications

  • Security Experience a plus
  • Experience with AI/ML models to improve system performance and reliability a plus.