Staff Site Reliability Engineer

Company	Visa
Location	Ashburn, VA, USA
Salary	$Not Provided – $Not Provided
Type	Full-Time
Degrees	Bachelor’s, Master’s, MBA, PharmD
Experience Level	Senior, Expert or higher

Requirements

5+ years of relevant work experience with a Bachelor’s Degree or at least 2 years of work experience with an Advanced degree (e.g. Masters, MBA, JD, MD) or 0 years of work experience with a PhD, OR 8+ years of relevant work experience.
Hands on experience in Linux and Windows systems and good understanding of distributed computing environments.
Intermediate level programming and/or scripting in 3 or more of the following: Python, Java, Go, PowerShell, JavaScript, Terraform, Ansible, Helm, Chef, Cloud Formation.
2+ years of experience managing CI/CD tooling such as Jenkins, Github, Bitbucket, ArgoCD, Artifactory, Bitbucket, Azure DevOps in a large-scale environment.
3+ Years experience managing observability tooling such as Grafana, Prometheus, Splunk, Datadog, New Relic, DynaTrace, Sentry, etc. in a large-scale environment.
Advanced understanding of YAML, JSON, HTML, XML.
2+ years of work experience supporting relational and non-relational databases [MySQL, MongoDB, PostgreSQL, etc.), including creating and running queries, managing performance and scaling.
Experience managing container infrastructure and supporting development transformation to a container first model.
3 or more years working in a Platform, SRE or Production Engineering group for high availability/critical platforms/applications.
Exposure to Virtualization (Hyper-V, VMware, scvmm etc).
Experience managing a distributed container platform including but not limited to deployment/release management, provisioning, capacity management, workload management.

Responsibilities

Guide the instrumentation of monitoring for the Visa Cloud Platform (IaaS/PaaS/Container as a service).
Ensure the platform target SLAs are met and implement appropriate SLIs for supporting services.
Work with developers during service transition, evaluating reliability and operability of the applications and ensuring adequate monitoring, alerting and observability.
Partner with peers within Operations & Infrastructure supporting ongoing maintenance and enhancement of the platform.
Focus on setting standards for automating routine tasks and workflows in support of the larger DevEx SRE team.
Support multiple internal stakeholders with a variety of technical challenges, analyze and discern patterns in issues, and propose solutions to these problems.
Work in a 24/7/365 operation model, including shift or on-call support (weekend required).

Preferred Qualifications

6 or more years of work experience with a Bachelors Degree or 4 or more years of relevant experience with an Advanced Degree (e.g. Masters, MBA, JD, MD) or up to 3 years of relevant experience with a PhD.
Master’s Degree in IT, CS or related field and/or 5+ years relevant work experience.