Posted in

Sr. Site Reliability Engineer I

Sr. Site Reliability Engineer I

CompanyPax8
LocationUnited States
Salary$125000 – $155000
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • At least five (5) to eight (8) years of experience supporting application development, preferably microservices and Java based web platforms.
  • Substantial, proven, software development experience.
  • Ability to show advanced proficiency of a relevant programming language (Java, TypeScript, Python, Groovy, Kotlin, VueJS, etc.).
  • Advanced experience with one or more of the following frameworks (Spring, Spring Boot, JUnit, Mockito, Kotest, Stripe, Kafka, ElasticSearch, Netsuite, Oauth).
  • Experience using AI within the SDLC to quickly deliver reliable solutions.
  • Strong experience with observability platforms, such as New Relic, Sumologic, Honeycomb, and similar tools to track performance and detect issues.
  • Solid understanding of core AWS services, including EKS, RDS, and MSK (Azure knowledge is a plus).
  • Extensive experience with container technologies such as Docker and Kubernetes, with an emphasis on operational reliability.
  • Proficient in Tomcat, Groovy, Kotlin, and Spring.
  • Proven experience in debugging and troubleshooting applications, using both manual and automated methods.
  • Database and SQL development experience.
  • Understanding of IaC and configuration management using Terraform and Git.
  • Understanding of CI/CD pipelines using GitHub Actions and ArgoCD.
  • Experience working in a Lean/Agile environment using tools such as Jira, ClickUp, Asana or similar.
  • Focus on meeting project commitments with predictability and urgency.
  • Strong desire for automation.
  • Ability to build strong customer relationships and deliver customer-centric solutions.
  • Ability to take on new opportunities and tough challenges with a sense of urgency, high energy, and enthusiasm.
  • Ability to gain the confidence and trust of others through honesty, integrity, and authenticity.
  • Ability to maneuver comfortably through complex policy, process, and people-related organizational dynamics.
  • Ability to anticipate and adopt innovations in business-building digital and technology applications.

Responsibilities

  • Increase developer velocity and system reliability by utilizing software development expertise, collaborating with engineering teams to address reliability concerns, analyzing the sources of issues and the impact on Cloud infrastructure to help the engineering community to work in a reliable, scalable environment (25%)
  • Standardize and implement baseline visibility across systems. Leverage programmatic monitoring to proactively address visibility gaps. Collaborate with teams to embed observability in the design phase, ensuring resilient and dependable systems. (20%)
  • Collaborate with Architecture and Platform teams to design automated solutions that eliminate repetitive tasks, enhance self-healing capabilities, improve service reliability, and enable developers to focus on delivering product features using proven, predictable frameworks (15%)
  • Prioritize security by collaborating with the engineering community to implement secure solutions, address issues proactively and reactively, and use lessons learned to establish best practices that minimize disruptions to product development (15%)
  • Elevate team capabilities through mentorship, project work assistance, design guidance, and participation in support and on-call rotations (15%)
  • Participate in incident response and post-incident analysis to drive improvements in system reliability by contributing to rapid recovery, conducting root cause analysis, and implementing changes based on post-mortem findings. (10%)

Preferred Qualifications

  • Strong desire for automation.
  • Ability to build strong customer relationships and deliver customer-centric solutions.