Posted in

Senior Systems Reliability Engineer – SRE

Senior Systems Reliability Engineer – SRE

CompanyAristocrat Leisure
LocationAustin, TX, USA
Salary$111793 – $207616
TypeFull-Time
DegreesBachelor’s, Master’s
Experience LevelSenior, Expert or higher

Requirements

  • 8+ years proven experience as a Site Reliability/DevOps/Infrastructure Engineer working in a production environment
  • Hands-on experience designing and implementing deployment pipelines using CI/CD best practices, methodologies and tools such as Jenkins, ArgoCD, CircleCI, and GitHub Actions
  • Solid understanding of cloud architecture – components, networking, and design principles – specializing in Google Cloud Platform (certification preferred)
  • Experience working with Configuration Management tools (e.g. Chef, Puppet, Ansible)
  • Experience with monitoring and log analysis tools such as ELK, Prometheus, Grafana, New Relic, Splunk
  • Expertise in scripting/programming languages such as Java, Python, Ruby, Bash along with experience in Github
  • Proven experience working with and troubleshooting in Unix/Linux and Windows servers in virtualized environments
  • Solid experience implementing production-grade Kubernetes Clusters with containerized environments and microservices (Docker, Kubernetes, helm charts, service meshes)
  • Experience creating infra-as code solutions using tools such as Terraform, Cloudformation
  • Bachelor or Master of Technology / Bachelor of Engineering in Computer Science or equivalent Master of Computer Applications required
  • Must have strong analytical and creative problem-solving skills
  • Demonstrate an extremely high level of accuracy and attention to detail
  • Must have strong communication skills, and proven ability to work autonomously with little oversight
  • Ability to work effectively within a globally dispersed team
  • Ability to learn deep knowledge of our complex applications.

Responsibilities

  • Drive the SRE function within Aristocrat Labs working with leadership to refine the SRE role and responsibilities over time
  • Educate and collaborate with delivery teams to drive strategy, features and enhancements that improve observability and reliability back into our products
  • Develop, streamline and improve tools, processes, and best practices to reduce overall cost and manual effort, improve our ability to rapidly recover and effectively monitor custom applications in a large-scale UNIX environment
  • Design, develop, implement, and maintain the CI/CD/CT framework to support software product development and deployments
  • Design, build, maintain, and automate deployment of multiple environments on GCP using infrastructure-as-code approach
  • Work closely with partner teams, various R&D groups, and stakeholders to establish, track and report on SLO’s, SLI’s and SLA’s for our products and services
  • Monitor, support and troubleshoot issues with A-Labs services and cloud infrastructure, responding to incidents, participating in root cause analysis, and isolating build/deployment issues due to code issues
  • Plan and support the growth of A-Labs GDK infrastructure as you assist in the roll-out and deployment of new product features and installations to new cloud infrastructure.

Preferred Qualifications

  • Google Cloud Platform certification preferred