Senior Systems Engineer – AV Infrastructure Cloud Platform
Company | NVIDIA |
---|---|
Location | Seattle, WA, USA, Austin, TX, USA, Jackson Township, NJ, USA, Santa Clara, CA, USA, New York, NY, USA |
Salary | $184000 – $356500 |
Type | Full-Time |
Degrees | Bachelor’s, Master’s |
Experience Level | Senior, Expert or higher |
Requirements
- BS/MS in Computer Science or Engineering (or equivalent experience) or BS/MS in STEM related field
- 8+ years of professional experience in related field
- At least 4+ years of experience in Kubernetes-based platform tooling development
- At least 4+ years of experience in cloud infrastructure automation and management
- Strong programming fundamentals with expertise in Go and Python
- Ability to seamlessly shift between Linux system environments to Python programming
- Deep AWS expertise across core services (VPC, IAM, EC2, S3, RDS, CloudFront, EKS) with proven experience in designing and managing scalable cloud infrastructure
- Comprehensive understanding of Kubernetes and Cloud Native Architecture, with hands-on experience managing large-scale production clusters
- Good understanding of the SRE best practices, alerting and observability
- Advanced Kubernetes workload management expertise, including traffic management, deployment strategies, observability, and security implementation
- Strong Infrastructure as Code (IaC) fundamentals with experience in developing infrastructure CI/CD pipelines, automation frameworks, and IaC libraries
Responsibilities
- Applying strong programming skills to develop cloud platform tooling and automation to enhance developer productivity and operational efficiency across our cloud infrastructure
- Lead the development of infrastructure automation frameworks and CI/CD pipelines, ensuring robust, scalable, and secure cloud-native applications deployment
- Engaging directly with engineering users to understand their needs and improve their experience by recommending robust, scalable cloud solutions
- Contribute to the design and architecture of the cloud infrastructure and networking components to meet the evolving needs of our internal developer platform
- Play pivotal role in improving cloud infrastructure and services reliability and performance
Preferred Qualifications
- Working experience with Agentic AI tools for computing infrastructure management
- Motivated self-starter with an equal balance of strong problem-solving skills and customer-facing communication skills
- Excellent written and verbal interpersonal skills
- Contributions to open-source projects in the cloud-native ecosystem, particularly in areas of Kubernetes tooling, infrastructure automation, or cloud-native applications
- Previous experience with building sophisticated tooling and SRE automation on the large GPU/CPU clusters