Posted in

Senior SRE Engineer

Senior SRE Engineer

CompanyM&T Bank
LocationBuffalo, NY, USA
Salary$93581.1 – $155968.51
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • Combined minimum of 6 years’ higher education and/or work experience in systems design, management and/or architecture
  • 5+ years of experience in Site Reliability Engineering, DevOps or system design and/or architecture similar roles
  • 3+ years of experience leading or managing observability initiatives
  • Strong hands-on experience with monitoring tools like Kibana, Dynatrace, Datadog, or similar
  • Solid understanding of observability concepts (metrics, logging, tracing, alerting) and frameworks (e.g., OpenTelemetry)
  • Experience with cloud environments such as AWS, Google Cloud, or Azure
  • Familiarity with containerization (Docker, Kubernetes) and orchestration platforms
  • Excellent problem-solving skills and ability to troubleshoot complex distributed systems
  • Mid-level programming skills in Python, Jason, PowerShell, or other relevant languages
  • Experience with incident response and post-mortem analysis
  • Excellent communication and collaboration skills
  • Advanced analytical skills
  • Advanced troubleshooting skills
  • Advanced problem solving skills

Responsibilities

  • Lead the development and implementation of observability tools and practices across multiple platforms, including monitoring, logging, tracing, and alerting
  • Work closely with product and engineering teams to define observability standards, goals, and best practices
  • Design and optimize the architecture of observability infrastructure to provide clear insights into the health, performance, and scalability of services
  • Troubleshoot and diagnose complex issues related to performance and availability, offering actionable insights and solutions
  • Mentor and guide junior SREs on observability tools and practices, fostering a culture of reliability and proactive monitoring
  • Manage incidents and post-incident reviews to continuously improve monitoring systems and practices
  • Partner with DevOps, Software Engineers, and other stakeholders to ensure seamless integration of observability tools with CI/CD pipelines
  • Implement and maintain high-availability monitoring and alerting systems
  • Ensure automation of observability tooling to scale with the growth of systems and services

Preferred Qualifications

  • Familiarity with infrastructure as code (Terraform, CloudFormation)
  • Login and enrollment instrumentation using SLO/SLI and measuring FCI and FSI
  • Experience in building and maintaining distributed systems at scale
  • Knowledge of security best practices in observability
  • Certifications in Cloud (AWS, GCP, Azure), SRE or DevOps are a plus
  • Process-oriented, Logical thinker
  • Strong knowledge of server/client and virtual technologies
  • Adaptable, Able to learn quickly in a rapid pace environment