Posted in

Staff Site Reliability Engineer

Staff Site Reliability Engineer

CompanyPlume
LocationPalo Alto, CA, USA
Salary$177000 – $208000
TypeFull-Time
Degrees
Experience LevelSenior, Expert or higher

Requirements

  • 4+ years of Kubernetes Knowledge (operate)
  • 2+ years of Terraform Knowledge
  • Experience both setting up and utilizing Monitoring and observability tools (e.g. New Relic, Nagios/Icinga, Grafana, Prometheus)
  • 2+ years of experience Programming/Scripting – one of the following (e.g. Perl, Python, PHP, GoLang, Java, etc)
  • 8+ years of experience with modern Linux Operating systems
  • 6+ years of experience with modern cloud infrastructure, preferably AWS
  • Availability to be in on-call rotation for Production issues
  • Availability to work with a distributed team in different timezones
  • Advanced communication skills
  • Experience leading efforts and reporting up

Responsibilities

  • Supervise a team of Site Reliability Engineers who provide first-line support to Customer Clouds. Deployments, On-call, Application Provisioning are some of the routine tasks.
  • Run stand ups for the team, ticket management
  • Participate in the Sprints and close tickets with the team
  • Attend and conduct customer Meetings for Project and Roadmap specification.
  • Be able to step in and execute or triage issues. Some examples are as follows: Provision and scale Kubernetes Infrastructure and Applications (EKS), Deploy Software in multiple Production Environments, Own monitoring and alerting to production systems, improvements and changes, Contribute improvements to the current automation, Contribute improvements to our on-call process and alerting.

Preferred Qualifications

  • 10+ Years of experience with Production Troubleshooting
  • 4+ Years of experience leading teams
  • Executive Communication skills
  • Bachelor’s degree in related field or equivalent experience, Advanced degree preferred.