Posted in

Lead Site Reliability Engineer

Lead Site Reliability Engineer

CompanyRoyal Bank of Canada
LocationVancouver, BC, Canada
Salary$Not Provided – $Not Provided
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • 5+ years of experience in Application Support, Software Development (SDLC), and Operations.
  • Strong proficiency in at least two programming languages (Java, Python, .NET, SQL, Databases)
  • Deep expertise in SRE, DevOps, OnPrem, Hybrid, Cloud native platforms, Job Scheduling, Managed File Transfers, and Data Services.
  • Proven track record of implementing resilient IT solutions, driving continuous service improvements, and enhancing production reliability through automation and best practices.
  • Advanced experience in a variety of environments (Linux, Windows, Databases, Cloud, distributed and mainframe, business workflows, and Services/APIs).
  • Hands-on experience in a variety of DevOps / SRE tools (Ansible, Dynatrace, Moogsoft, PagerDuty, ServiceNow, Elastic, Logstash, Kibana, Logic Monitor, Jenkins, Cucumber, CA Work Automation, Power BI, ETL related tools etc.)
  • Excellent communication, analytical and problem-solving skills to diagnose, resolve complex production incidents and lead blameless postmortems to identify & address root causes.

Responsibilities

  • Advocate for automation and DevOps best practices, fostering an SRE mindset within the team.
  • Lead the development of SRE solutions, focusing on monitoring, alerting, machine learning-based anomaly detection, self-healing, and reliability testing.
  • Implement advanced monitoring, alerting, and automated remediation strategies to prevent incidents before they impact business operations.
  • Collaborate with teams to enhance platform infrastructure, improving service resilience, reliability, quality, and time-to-market for software solutions.
  • Improve and optimize Incident, Problem, and Change management processes, to improve MTTR, Incident avoidance and resilience.
  • Oversee technology lifecycle management (server patching, certificate renewals, risk remediation) with a strong focus on automation-first principles.
  • Define and maintain Service Level Objectives (SLOs) and ensure availability targets for mission-critical applications.
  • Ensure compliance with regulatory and security requirements, including segregation of duties for sensitive environments.
  • Stay ahead of emerging technologies, leveraging continuous learning opportunities to drive innovation and efficiency.
  • Provide hands-on application production support, including off-hours coverage as needed.

Preferred Qualifications

  • Prior experience leading SRE functions in the financial services industry.
  • Knowledge of Digital Identity Access Management, Internet / Mobile Banking Platforms, Microservices, Data Services, Test Automation and Corporate applications (HR, Finance, Risk, Compliance etc) is preferred.