Staff Site Reliability Engineer

4+ years of Kubernetes Knowledge (operate)
2+ years of Terraform Knowledge
Experience both setting up and utilizing Monitoring and observability tools (e.g. New Relic, Nagios/Icinga, Grafana, Prometheus)
2+ years of experience Programming/Scripting – one of the following (e.g. Perl, Python, PHP, GoLang, Java, etc)
8+ years of experience with modern Linux Operating systems
6+ years of experience with modern cloud infrastructure, preferably AWS
Availability to be in on-call rotation for Production issues
Availability to work with a distributed team in different timezones
Advanced communication skills
Experience leading efforts and reporting up

Supervise a team of Site Reliability Engineers who provide first-line support to Customer Clouds. Deployments, On-call, Application Provisioning are some of the routine tasks.
Run stand ups for the team, ticket management
Participate in the Sprints and close tickets with the team
Attend and conduct customer Meetings for Project and Roadmap specification.
Be able to step in and execute or triage issues. Some examples are as follows: Provision and scale Kubernetes Infrastructure and Applications (EKS), Deploy Software in multiple Production Environments, Own monitoring and alerting to production systems, improvements and changes, Contribute improvements to the current automation, Contribute improvements to our on-call process and alerting.

10+ Years of experience with Production Troubleshooting
4+ Years of experience leading teams
Executive Communication skills
Bachelor’s degree in related field or equivalent experience, Advanced degree preferred.