Site Reliability Engineer

3 years of Devops/SRE experience with production systems (depending on level)
Strong development and automation skills.
Extensive experience with Infrastructure as Code (Terraform, etc), as well as supporting tooling (Atlantis, ArgoCD, etc)
Extensive experience with Kubernetes and supporting tooling (Helm, operators, etc)
Extensive experience with a variety of cloud managed services and providers
Experience building production quality cloud infrastructure that enables reliable and rapid deployment of microservices with effective monitoring and built in high availability and/or fault tolerance.
Strong passion for using automation to create simple repeatable dev and ops patterns that ensures a stable, reliable experience for customers.
Strong cross-team communication skills.
Experience with the building blocks of large-scale systems including load balancing, distributed/cloud computing, containers, instrumentation, and monitoring.
Knowledge of cloud networking, including VPC configuration and cross-cloud connectivity.
Familiarity with one or more programming languages (Python, Golang, etc).

Automate as much as reasonable to significantly improve operational efficiency of the Lacework platform
Design, build and improve our infrastructure to enhance service scalability, resiliency, and efficiency across the company.
Identify mission-critical problems and solve them via automation, tooling, communication, and informed design.
Build and improve monitoring and instrumentation to predict future scalability or failure risks and solve them before they manifest into customer-facing issues.
Facilitate company-wide visibility into key metrics, SLAs, and milestones so that scale and resiliency are a part of every conversation.
Develop best practices alongside engineering/operations teams to improve the scalability and reliability of internal processes.
Participate in an on-call rotation.

Experience with monitoring and observability systems and tools (Prometheus, Grafana, New Relic, DataDog, etc.)
Believe everything should be ‘as code’
Experience with Java application servers and JVM configuration