Posted in

Site Reliability Engineer – Cloud

Site Reliability Engineer – Cloud

CompanyNVIDIA
LocationSanta Clara, CA, USA
Salary$136000 – $212750
TypeFull-Time
DegreesBachelor’s, Master’s
Experience LevelSenior

Requirements

  • MS or BS in Computer Science/Engineering or a related field or equivalent experience.
  • 5+ years of experience supporting technical operations in a live-site production environment with a real passion for automation and tooling.
  • Built and ran critical production services packaged or custom python/java on Windows or Linux.
  • Strong knowledge of Kubernetes Platform, deployments, automation.
  • SRE On call experience is a must.
  • Advance level experience with scripting and development in (Python).
  • Shown strengths in problem-solving and root causing issues.

Responsibilities

  • Rapidly debug and triage user-reported issues on the Digital Marketing Organization.
  • On-board new applications and services on AWS Infrastructure.
  • Make valuable contribution to the overall health, performance, and uptime of our services running in Linux and Windows.
  • Implement monitors, alerts and SOPs to ensure early detection, and accurate response to service-impacting issues.
  • Taking ownership of automating, scripting, and tooling of new/existing scripts to help the team achieve 100% automation of daily tasks.

Preferred Qualifications

  • Strong Experience with AWS Cloud Platform, Kubernetes as a platform.
  • Excellent communication, presentation, social, and analytical skills; the ability to communicate sophisticated interaction concepts clearly and persuasively across different audiences and varying levels of the organization.