Posted in

Site Reliability Engineer

Site Reliability Engineer

CompanyFortinet
LocationSunnyvale, CA, USA
Salary$150000 – $195000
TypeFull-Time
Degrees
Experience LevelMid Level

Requirements

  • 3 years of Devops/SRE experience with production systems (depending on level)
  • Strong development and automation skills.
  • Extensive experience with Infrastructure as Code (Terraform, etc), as well as supporting tooling (Atlantis, ArgoCD, etc)
  • Extensive experience with Kubernetes and supporting tooling (Helm, operators, etc)
  • Extensive experience with a variety of cloud managed services and providers
  • Experience building production quality cloud infrastructure that enables reliable and rapid deployment of microservices with effective monitoring and built in high availability and/or fault tolerance.
  • Strong passion for using automation to create simple repeatable dev and ops patterns that ensures a stable, reliable experience for customers.
  • Strong cross-team communication skills.
  • Experience with the building blocks of large-scale systems including load balancing, distributed/cloud computing, containers, instrumentation, and monitoring.
  • Knowledge of cloud networking, including VPC configuration and cross-cloud connectivity.
  • Familiarity with one or more programming languages (Python, Golang, etc).

Responsibilities

  • Automate as much as reasonable to significantly improve operational efficiency of the Lacework platform
  • Design, build and improve our infrastructure to enhance service scalability, resiliency, and efficiency across the company.
  • Identify mission-critical problems and solve them via automation, tooling, communication, and informed design.
  • Build and improve monitoring and instrumentation to predict future scalability or failure risks and solve them before they manifest into customer-facing issues.
  • Facilitate company-wide visibility into key metrics, SLAs, and milestones so that scale and resiliency are a part of every conversation.
  • Develop best practices alongside engineering/operations teams to improve the scalability and reliability of internal processes.
  • Participate in an on-call rotation.

Preferred Qualifications

  • Experience with monitoring and observability systems and tools (Prometheus, Grafana, New Relic, DataDog, etc.)
  • Believe everything should be ‘as code’
  • Experience with Java application servers and JVM configuration