Senior Director - Site Reliability Engineering

Senior Director – Site Reliability Engineering

Company	Visa
Location	San Mateo, CA, USA
Salary	$Not Provided – $Not Provided
Type	Full-Time
Degrees	Bachelor’s, Master’s, MBA, PharmD, PhD
Experience Level	Senior, Expert or higher

12 or more years of work experience with a Bachelor’s Degree or at least 10 years of work experience with an Advanced degree (e.g. Masters/MBA /JD/MD), or a minimum of 5 years of work experience with a PhD
Minimum of 10 years in a site reliability engineering role with at least 5 years in a leadership position managing large SRE teams
Proficiency in system design and architecture, particularly in a cloud environment
Expertise in automation and orchestration systems like Kubernetes, Terraform, and Ansible
Strong coding skills in languages such as Go, Python, Ruby, or Java
Deep understanding of networking concepts and protocols
Experience with continuous integration and continuous deployment (CI/CD) pipelines and tools
Proven track record of leading teams through complex system outages and scalability challenges
Ability to mentor and grow an SRE team, fostering a culture of continuous learning and innovation
Strong project management skills, with experience in Agile methodologies
Excellent verbal and written communication abilities
Proficient in creating technical documentation and system diagrams
Experience presenting to C-level executives and stakeholders
Demonstrated experience in incident management and post-mortem analysis
Commitment to high availability, fault tolerance, and reliability in all aspects of work
Knowledge of compliance and security best practices in a highly regulated industry

Lead and scale the SRE team, setting objectives and key results that align with the company’s strategic goals
Develop and implement SRE policies, standards, and best practices for enterprise-wide systems
Define standards for building reliable applications that are highly available and resilient
Drive the adoption of a DevSecOps culture, fostering collaboration between development and operations teams
Oversee the design and implementation of solutions for system monitoring, logging, alerting, and incident response
Collaborate with product development teams to ensure reliability and scalability are considered at the design phase
Manage on-call rotations, incident management processes, and post-mortem analyses to ensure continuous improvement
Define Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets for all critical services
Work closely with the security team to ensure compliance with industry standards and regulatory requirements
Lead initiatives to improve CI/CD pipelines and automate infrastructure provisioning and deployment
Provide technical leadership and mentorship to team members, encouraging professional growth and technical excellence

15 or more years of experience with a Bachelor’s Degree or 12 years of experience with an Advanced Degree (e.g. Masters, MBA, JD, or MD), PhD with 9+ years of experience in Computer Science, Engineering, or a related technical field
Certifications in cloud technologies (AWS, GCP, Azure)
Contributions to open-source projects or public speaking at relevant tech conferences
Strategic thinker with a vision for the future of SRE within the organization
Resilient and adaptable in the face of changing technology landscapes
Collaborative mindset with a focus on cross-functional partnerships