Posted in

Staff Site Reliability Engineer

Staff Site Reliability Engineer

CompanyVisa
LocationAshburn, VA, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesBachelor’s, Master’s, MBA, PharmD
Experience LevelSenior, Expert or higher

Requirements

  • 5+ years of relevant work experience with a Bachelor’s Degree or at least 2 years of work experience with an Advanced degree (e.g. Masters, MBA, JD, MD) or 0 years of work experience with a PhD, OR 8+ years of relevant work experience.
  • Hands on experience in Linux and Windows systems and good understanding of distributed computing environments.
  • Intermediate level programming and/or scripting in 3 or more of the following: Python, Java, Go, PowerShell, JavaScript, Terraform, Ansible, Helm, Chef, Cloud Formation.
  • 2+ years of experience managing CI/CD tooling such as Jenkins, Github, Bitbucket, ArgoCD, Artifactory, Bitbucket, Azure DevOps in a large-scale environment.
  • 3+ Years experience managing observability tooling such as Grafana, Prometheus, Splunk, Datadog, New Relic, DynaTrace, Sentry, etc. in a large-scale environment.
  • Advanced understanding of YAML, JSON, HTML, XML.
  • 2+ years of work experience supporting relational and non-relational databases [MySQL, MongoDB, PostgreSQL, etc.), including creating and running queries, managing performance and scaling.
  • Experience managing container infrastructure and supporting development transformation to a container first model.
  • 3 or more years working in a Platform, SRE or Production Engineering group for high availability/critical platforms/applications.
  • Exposure to Virtualization (Hyper-V, VMware, scvmm etc).
  • Experience managing a distributed container platform including but not limited to deployment/release management, provisioning, capacity management, workload management.

Responsibilities

  • Guide the instrumentation of monitoring for the Visa Cloud Platform (IaaS/PaaS/Container as a service).
  • Ensure the platform target SLAs are met and implement appropriate SLIs for supporting services.
  • Work with developers during service transition, evaluating reliability and operability of the applications and ensuring adequate monitoring, alerting and observability.
  • Partner with peers within Operations & Infrastructure supporting ongoing maintenance and enhancement of the platform.
  • Focus on setting standards for automating routine tasks and workflows in support of the larger DevEx SRE team.
  • Support multiple internal stakeholders with a variety of technical challenges, analyze and discern patterns in issues, and propose solutions to these problems.
  • Work in a 24/7/365 operation model, including shift or on-call support (weekend required).

Preferred Qualifications

  • 6 or more years of work experience with a Bachelors Degree or 4 or more years of relevant experience with an Advanced Degree (e.g. Masters, MBA, JD, MD) or up to 3 years of relevant experience with a PhD.
  • Master’s Degree in IT, CS or related field and/or 5+ years relevant work experience.