Staff Site Reliability Engineer
Company | Cyberark |
---|---|
Location | Salt Lake City, UT, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s |
Experience Level | Senior, Expert or higher |
Requirements
- B.S. in Computer Science or equivalent experience
- Minimum 4 years of experience managing AWS infrastructure
- Minimum of 7 years in a senior, architect or a technical lead role of site reliability, systems engineering or software development
- A deep understanding of Site Reliability, infrastructure and Cloud Platform
- Expert understanding/experience of containerization services such as Docker/Kubernetes
- Expert in observability tooling such as Datadog, NewRelic, Logstash, Elasticsearch
- Solid understanding/experience of web services, databases and relating infrastructure/architectures
- Solid understanding of backup/restore best practices
- Strong level of expertise programming writing configuration management languages
- Strong level of expertise programming in Python / Java or equivalent language
- Excellent Troubleshooting Skills
- Experience supporting an enterprise-level SaaS environment
Responsibilities
- Design Implementation of AWS infrastructure components such as VPCs, EC2, EKS, S3, tagging schemes, CloudFormation, etc.
- Lead architecture, designs and feature analysis of deployment and management automation of cloud-based infrastructure and software
- Provide guidance to Site Reliability and DevOps Engineers on managing the reliability and performance of SaaS environments as well as on building automation to prevent problem reoccurrence
- Architecting and guiding the team with the use of configuration management tools in both Windows and Linux – CloudFormation, Helm, Terraform, Salt, Ansible
- Ensuring cloud-based architectures meet availability and recoverability requirements
- Architecture and implementation of cloud-based monitoring, alerting and reporting – Datadog, Logz.io, CloudWatch, Catchpoint, ELK
- Support and guidance on tooling that helps to enable teams for greater output and reliability
- Deep understanding of the latest tech solutions, trends, and ability to dive into the details of the architecture as needed
- Work with the Team Leads within the group to identify areas of improvement, prepare architecture road maps, and advocate to the Product Management group.
Preferred Qualifications
- Security Experience a plus
- Experience with AI/ML models to improve system performance and reliability a plus.