Lead Site Reliability Engineer
Company | Royal Bank of Canada |
---|---|
Location | Vancouver, BC, Canada |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | |
Experience Level | Senior |
Requirements
- 5+ years of experience in Application Support, Software Development (SDLC), and Operations.
- Strong proficiency in at least two programming languages (Java, Python, .NET, SQL, Databases)
- Deep expertise in SRE, DevOps, OnPrem, Hybrid, Cloud native platforms, Job Scheduling, Managed File Transfers, and Data Services.
- Proven track record of implementing resilient IT solutions, driving continuous service improvements, and enhancing production reliability through automation and best practices.
- Advanced experience in a variety of environments (Linux, Windows, Databases, Cloud, distributed and mainframe, business workflows, and Services/APIs).
- Hands-on experience in a variety of DevOps / SRE tools (Ansible, Dynatrace, Moogsoft, PagerDuty, ServiceNow, Elastic, Logstash, Kibana, Logic Monitor, Jenkins, Cucumber, CA Work Automation, Power BI, ETL related tools etc.)
- Excellent communication, analytical and problem-solving skills to diagnose, resolve complex production incidents and lead blameless postmortems to identify & address root causes.
Responsibilities
- Advocate for automation and DevOps best practices, fostering an SRE mindset within the team.
- Lead the development of SRE solutions, focusing on monitoring, alerting, machine learning-based anomaly detection, self-healing, and reliability testing.
- Implement advanced monitoring, alerting, and automated remediation strategies to prevent incidents before they impact business operations.
- Collaborate with teams to enhance platform infrastructure, improving service resilience, reliability, quality, and time-to-market for software solutions.
- Improve and optimize Incident, Problem, and Change management processes, to improve MTTR, Incident avoidance and resilience.
- Oversee technology lifecycle management (server patching, certificate renewals, risk remediation) with a strong focus on automation-first principles.
- Define and maintain Service Level Objectives (SLOs) and ensure availability targets for mission-critical applications.
- Ensure compliance with regulatory and security requirements, including segregation of duties for sensitive environments.
- Stay ahead of emerging technologies, leveraging continuous learning opportunities to drive innovation and efficiency.
- Provide hands-on application production support, including off-hours coverage as needed.
Preferred Qualifications
- Prior experience leading SRE functions in the financial services industry.
- Knowledge of Digital Identity Access Management, Internet / Mobile Banking Platforms, Microservices, Data Services, Test Automation and Corporate applications (HR, Finance, Risk, Compliance etc) is preferred.