Skip to content

Site Reliability Engineer – Cloud
Company | NVIDIA |
---|
Location | Santa Clara, CA, USA |
---|
Salary | $136000 – $212750 |
---|
Type | Full-Time |
---|
Degrees | Bachelor’s, Master’s |
---|
Experience Level | Senior |
---|
Requirements
- MS or BS in Computer Science/Engineering or a related field or equivalent experience.
- 5+ years of experience supporting technical operations in a live-site production environment with a real passion for automation and tooling.
- Built and ran critical production services packaged or custom python/java on Windows or Linux.
- Strong knowledge of Kubernetes Platform, deployments, automation.
- SRE On call experience is a must.
- Advance level experience with scripting and development in (Python).
- Shown strengths in problem-solving and root causing issues.
Responsibilities
- Rapidly debug and triage user-reported issues on the Digital Marketing Organization.
- On-board new applications and services on AWS Infrastructure.
- Make valuable contribution to the overall health, performance, and uptime of our services running in Linux and Windows.
- Implement monitors, alerts and SOPs to ensure early detection, and accurate response to service-impacting issues.
- Taking ownership of automating, scripting, and tooling of new/existing scripts to help the team achieve 100% automation of daily tasks.
Preferred Qualifications
- Strong Experience with AWS Cloud Platform, Kubernetes as a platform.
- Excellent communication, presentation, social, and analytical skills; the ability to communicate sophisticated interaction concepts clearly and persuasively across different audiences and varying levels of the organization.