Skip to content

Senior Site Reliability Engineer – Observability – Fedramp
Company | Splunk |
---|
Location | California, USA |
---|
Salary | $139840 – $240350 |
---|
Type | Full-Time |
---|
Degrees | |
---|
Experience Level | Senior |
---|
Requirements
- Extensive experience as a Linux system administrator supporting enterprise computing platforms and systems.
- Expertise in public cloud (AWS, GCP, Azure) and container orchestration tools (Kubernetes, Docker).
- Knowledge and understanding of OpenTelemetry.
- Deep understanding of logging, monitoring, tracing, and alerting practices in large-scale distributed systems.
- Proficiency with programming languages like Python along with shell scripting to automate tasks
- Experience supporting customer facing SaaS infrastructure or similar cloud related services.
- Experience in administering or architecting distributed Splunk and Observability environments.
- Experience in setting up SLOs & SLIs.
Responsibilities
- Support and build Splunk’s large scale Cloud offering.
- Work with a diverse, geographically distributed team to deliver an excellent product and extraordinary customer experience.
- Build and run distributed systems at scale in production, understanding the challenges and trade-offs involved.
- Automate processes where possible.
- Apply knowledge of best practices related to security, performance, and disaster recovery.
- Identify performance bottlenecks, spot anomalous system behavior, and determine the root cause of incidents.
- Monitor cloud environments using tools like Splunk, VictorOps, and SignalFx.
- Ensure good documentation to facilitate team function.
- Tackle complex problems, resolve operational issues, and interact with vendors for solutions.
- Handle critical, customer-facing issues and prioritize quickly during escalations.
Preferred Qualifications
No preferred qualifications provided.