Senior Systems Reliability Engineer - SRE

Senior Systems Reliability Engineer – SRE

Company	Aristocrat Leisure
Location	Austin, TX, USA
Salary	$111793 – $207616
Type	Full-Time
Degrees	Bachelor’s, Master’s
Experience Level	Senior, Expert or higher

Requirements

8+ years proven experience as a Site Reliability/DevOps/Infrastructure Engineer working in a production environment
Hands-on experience designing and implementing deployment pipelines using CI/CD best practices, methodologies and tools such as Jenkins, ArgoCD, CircleCI, and GitHub Actions
Solid understanding of cloud architecture – components, networking, and design principles – specializing in Google Cloud Platform (certification preferred)
Experience working with Configuration Management tools (e.g. Chef, Puppet, Ansible)
Experience with monitoring and log analysis tools such as ELK, Prometheus, Grafana, New Relic, Splunk
Expertise in scripting/programming languages such as Java, Python, Ruby, Bash along with experience in Github
Proven experience working with and troubleshooting in Unix/Linux and Windows servers in virtualized environments
Solid experience implementing production-grade Kubernetes Clusters with containerized environments and microservices (Docker, Kubernetes, helm charts, service meshes)
Experience creating infra-as code solutions using tools such as Terraform, Cloudformation
Bachelor or Master of Technology / Bachelor of Engineering in Computer Science or equivalent Master of Computer Applications required
Must have strong analytical and creative problem-solving skills
Demonstrate an extremely high level of accuracy and attention to detail
Must have strong communication skills, and proven ability to work autonomously with little oversight
Ability to work effectively within a globally dispersed team
Ability to learn deep knowledge of our complex applications.

Responsibilities

Drive the SRE function within Aristocrat Labs working with leadership to refine the SRE role and responsibilities over time
Educate and collaborate with delivery teams to drive strategy, features and enhancements that improve observability and reliability back into our products
Develop, streamline and improve tools, processes, and best practices to reduce overall cost and manual effort, improve our ability to rapidly recover and effectively monitor custom applications in a large-scale UNIX environment
Design, develop, implement, and maintain the CI/CD/CT framework to support software product development and deployments
Design, build, maintain, and automate deployment of multiple environments on GCP using infrastructure-as-code approach
Work closely with partner teams, various R&D groups, and stakeholders to establish, track and report on SLO’s, SLI’s and SLA’s for our products and services
Monitor, support and troubleshoot issues with A-Labs services and cloud infrastructure, responding to incidents, participating in root cause analysis, and isolating build/deployment issues due to code issues
Plan and support the growth of A-Labs GDK infrastructure as you assist in the roll-out and deployment of new product features and installations to new cloud infrastructure.

Preferred Qualifications

Google Cloud Platform certification preferred