Posted in

Site Reliability Engineer

Site Reliability Engineer

CompanyBrillio
LocationSt. Louis, MO, USA
Salary$55 – $60
TypeFull-Time
DegreesBachelor’s
Experience LevelJunior, Mid Level

Requirements

  • Bachelor’s degree in computer science, Engineering, or a related field (or equivalent experience)
  • 2-3 years’ experience as an Observability Engineer or a similar role in a production environment
  • Deep understanding of observability principles, methodologies, and tools such as Prometheus, Grafana, Jaeger, ELK stack, etc.
  • Proficiency in programming/scripting languages like Java, Python, Go, or similar for automation and tooling development
  • Strong knowledge of cloud computing platforms (AWS preferred) and container orchestration systems (e.g., Kubernetes)
  • Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems
  • Strong communication skills and the ability to collaborate effectively with cross-functional teams

Responsibilities

  • Design and develop robust observability solutions to monitor, analyze, and troubleshoot distributed systems
  • Familiar with OTEL standards and tools
  • Previous experience working with application teams to implement ‘self-healing’ i.e. alerting that triggers automated remediation
  • Implement and configure monitoring, logging, tracing, and alerting systems to ensure comprehensive coverage of our infrastructure and applications
  • Collaborate with software engineers to instrument code for telemetry data collection and analysis
  • Optimize observability tooling and processes to improve system reliability, performance, and scalability
  • Create dashboards, reports, and visualizations to provide actionable insights into system health and performance
  • Investigate and resolve incidents by analyzing telemetry data and identifying root causes
  • Stay current with industry trends and best practices in observability and recommend improvements to our observability strategy and infrastructure

Preferred Qualifications

  • AWS preferred