Posted in

Senior Site Reliability Engineer – Observability – Fedramp

Senior Site Reliability Engineer – Observability – Fedramp

CompanySplunk
LocationCalifornia, USA
Salary$139840 – $240350
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • Extensive experience as a Linux system administrator supporting enterprise computing platforms and systems.
  • Expertise in public cloud (AWS, GCP, Azure) and container orchestration tools (Kubernetes, Docker).
  • Knowledge and understanding of OpenTelemetry.
  • Deep understanding of logging, monitoring, tracing, and alerting practices in large-scale distributed systems.
  • Proficiency with programming languages like Python along with shell scripting to automate tasks
  • Experience supporting customer facing SaaS infrastructure or similar cloud related services.
  • Experience in administering or architecting distributed Splunk and Observability environments.
  • Experience in setting up SLOs & SLIs.

Responsibilities

  • Support and build Splunk’s large scale Cloud offering.
  • Work with a diverse, geographically distributed team to deliver an excellent product and extraordinary customer experience.
  • Build and run distributed systems at scale in production, understanding the challenges and trade-offs involved.
  • Automate processes where possible.
  • Apply knowledge of best practices related to security, performance, and disaster recovery.
  • Identify performance bottlenecks, spot anomalous system behavior, and determine the root cause of incidents.
  • Monitor cloud environments using tools like Splunk, VictorOps, and SignalFx.
  • Ensure good documentation to facilitate team function.
  • Tackle complex problems, resolve operational issues, and interact with vendors for solutions.
  • Handle critical, customer-facing issues and prioritize quickly during escalations.

Preferred Qualifications

    No preferred qualifications provided.