Skip to content

Staff Site Reliability Engineer
Company | Illumio |
---|
Location | Sunnyvale, CA, USA |
---|
Salary | $192000 – $230000 |
---|
Type | Full-Time |
---|
Degrees | Bachelor’s |
---|
Experience Level | Senior, Expert or higher |
---|
Requirements
- Bachelor’s degree in Computer Science, Engineering, or related field; or equivalent work experience
- 8+ years of relevant SRE experience
- Strong hands-on experience with AWS and Azure
- Familiarity with Kubernetes and containerized environments
- Knowledge of networking concepts, such as DNS, load balancing, and firewalls
- Proficient in diagnosing and resolving complex issues in SaaS environments, including performance bottlenecks and application errors
- Proficiency in at least one programming language (e.g., Python, Go, Java) and scripting languages (e.g., Bash, PowerShell)
- Experience with tools like Datadog, New Relic, Prometheus, Grafana, ELK, or Azure Monitor
- Familiarity with tools like Ansible, Terraform, or CloudFormation
- Knowledge of debugging and optimizing relational databases (e.g., PostgreSQL, MySQL) and caching systems (e.g., Redis, Memcached)
- Experience with incident management tools and processes, including conducting RCAs and improving on-call processes
Responsibilities
- Investigate and resolve production incidents and escalations to ensure minimal downtime and impact to customers
- Work closely with engineering and support teams to troubleshoot application and infrastructure issues
- Proactively monitor application health, performance, and reliability using modern observability tools
- Analyze trends in system behavior and suggest performance improvements
- Develop and maintain automation scripts and tools to improve operational efficiency and incident resolution
- Create and enhance runbooks to streamline troubleshooting and reduce mean time to resolution (MTTR)
- Conduct thorough post-incident reviews to identify root causes and implement preventive measures
- Drive a culture of continuous improvement by documenting lessons learned and improving system designs
- Partner with software engineers, QA, and product teams to improve application stability and user experience
- Act as a bridge between development and operations, ensuring smooth and reliable service delivery
Preferred Qualifications
No preferred qualifications provided.