Skip to content

Senior Systems Reliability Engineer – SRE
Company | Aristocrat Leisure |
---|
Location | Austin, TX, USA |
---|
Salary | $111793 – $207616 |
---|
Type | Full-Time |
---|
Degrees | Bachelor’s, Master’s |
---|
Experience Level | Senior, Expert or higher |
---|
Requirements
- 8+ years proven experience as a Site Reliability/DevOps/Infrastructure Engineer working in a production environment
- Hands-on experience designing and implementing deployment pipelines using CI/CD best practices, methodologies and tools such as Jenkins, ArgoCD, CircleCI, and GitHub Actions
- Solid understanding of cloud architecture – components, networking, and design principles – specializing in Google Cloud Platform (certification preferred)
- Experience working with Configuration Management tools (e.g. Chef, Puppet, Ansible)
- Experience with monitoring and log analysis tools such as ELK, Prometheus, Grafana, New Relic, Splunk
- Expertise in scripting/programming languages such as Java, Python, Ruby, Bash along with experience in Github
- Proven experience working with and troubleshooting in Unix/Linux and Windows servers in virtualized environments
- Solid experience implementing production-grade Kubernetes Clusters with containerized environments and microservices (Docker, Kubernetes, helm charts, service meshes)
- Experience creating infra-as code solutions using tools such as Terraform, Cloudformation
- Bachelor or Master of Technology / Bachelor of Engineering in Computer Science or equivalent Master of Computer Applications required
- Must have strong analytical and creative problem-solving skills
- Demonstrate an extremely high level of accuracy and attention to detail
- Must have strong communication skills, and proven ability to work autonomously with little oversight
- Ability to work effectively within a globally dispersed team
- Ability to learn deep knowledge of our complex applications.
Responsibilities
- Drive the SRE function within Aristocrat Labs working with leadership to refine the SRE role and responsibilities over time
- Educate and collaborate with delivery teams to drive strategy, features and enhancements that improve observability and reliability back into our products
- Develop, streamline and improve tools, processes, and best practices to reduce overall cost and manual effort, improve our ability to rapidly recover and effectively monitor custom applications in a large-scale UNIX environment
- Design, develop, implement, and maintain the CI/CD/CT framework to support software product development and deployments
- Design, build, maintain, and automate deployment of multiple environments on GCP using infrastructure-as-code approach
- Work closely with partner teams, various R&D groups, and stakeholders to establish, track and report on SLO’s, SLI’s and SLA’s for our products and services
- Monitor, support and troubleshoot issues with A-Labs services and cloud infrastructure, responding to incidents, participating in root cause analysis, and isolating build/deployment issues due to code issues
- Plan and support the growth of A-Labs GDK infrastructure as you assist in the roll-out and deployment of new product features and installations to new cloud infrastructure.
Preferred Qualifications
- Google Cloud Platform certification preferred