Skip to content

Senior SRE Engineer
Company | M&T Bank |
---|
Location | Buffalo, NY, USA |
---|
Salary | $93581.1 – $155968.51 |
---|
Type | Full-Time |
---|
Degrees | |
---|
Experience Level | Senior |
---|
Requirements
- Combined minimum of 6 years’ higher education and/or work experience in systems design, management and/or architecture
- 5+ years of experience in Site Reliability Engineering, DevOps or system design and/or architecture similar roles
- 3+ years of experience leading or managing observability initiatives
- Strong hands-on experience with monitoring tools like Kibana, Dynatrace, Datadog, or similar
- Solid understanding of observability concepts (metrics, logging, tracing, alerting) and frameworks (e.g., OpenTelemetry)
- Experience with cloud environments such as AWS, Google Cloud, or Azure
- Familiarity with containerization (Docker, Kubernetes) and orchestration platforms
- Excellent problem-solving skills and ability to troubleshoot complex distributed systems
- Mid-level programming skills in Python, Jason, PowerShell, or other relevant languages
- Experience with incident response and post-mortem analysis
- Excellent communication and collaboration skills
- Advanced analytical skills
- Advanced troubleshooting skills
- Advanced problem solving skills
Responsibilities
- Lead the development and implementation of observability tools and practices across multiple platforms, including monitoring, logging, tracing, and alerting
- Work closely with product and engineering teams to define observability standards, goals, and best practices
- Design and optimize the architecture of observability infrastructure to provide clear insights into the health, performance, and scalability of services
- Troubleshoot and diagnose complex issues related to performance and availability, offering actionable insights and solutions
- Mentor and guide junior SREs on observability tools and practices, fostering a culture of reliability and proactive monitoring
- Manage incidents and post-incident reviews to continuously improve monitoring systems and practices
- Partner with DevOps, Software Engineers, and other stakeholders to ensure seamless integration of observability tools with CI/CD pipelines
- Implement and maintain high-availability monitoring and alerting systems
- Ensure automation of observability tooling to scale with the growth of systems and services
Preferred Qualifications
- Familiarity with infrastructure as code (Terraform, CloudFormation)
- Login and enrollment instrumentation using SLO/SLI and measuring FCI and FSI
- Experience in building and maintaining distributed systems at scale
- Knowledge of security best practices in observability
- Certifications in Cloud (AWS, GCP, Azure), SRE or DevOps are a plus
- Process-oriented, Logical thinker
- Strong knowledge of server/client and virtual technologies
- Adaptable, Able to learn quickly in a rapid pace environment