Skip to content

Senior Devops Infrastructure Engineer – Open-Source CI and CD
Company | NVIDIA |
---|
Location | Newark, NJ, USA |
---|
Salary | $168000 – $333500 |
---|
Type | Full-Time |
---|
Degrees | Bachelor’s, Master’s |
---|
Experience Level | Senior |
---|
Requirements
- B.S. or M.S. in Computer Science, Computer Engineering, or a related field (or equivalent experience)
- 7+ years of proven experience in infrastructure, DevOps, or platform engineering
- Strong Kubernetes expertise (running, debugging, and scaling workloads)
- Experience with GitOps tools (ArgoCD or similar)
- Proficiency in Linux administration and troubleshooting
- Experience with Infrastructure as Code using Terraform/Terragrunt
- Proficiency in Golang, Python, and TypeScript
- Hands-on experience with monitoring, logging, and tracing (Prometheus, Grafana, OpenTelemetry, etc.)
- Solid understanding of CI/CD pipelines, particularly GitHub Actions
- Ability to work and collaborate effectively with a fully remote, distributed team
Responsibilities
- Manage and scale self-hosted GitHub Actions runners using Kubernetes
- Help expand runner support for various hardware and operating system combinations, including Linux, Windows, single-GPU, multi-GPU, NVLink, and more
- Use Infrastructure as Code (Terraform and ArgoCD) to deploy and maintain infrastructure both on-premise and in AWS
- Build and maintain runner VM images using HashiCorp Packer
- Connect distributed services securely using mTLS, PKI, and HashiCorp Vault
- Develop, package, and deploy custom Golang tools to support platform observability, stability, and efficiency
- Configure alerting and monitoring to identify and address issues quickly, using tools like Prometheus and Grafana
- Contribute upstream to open-source tools and libraries that our team depends on
- Periodically update platform dependencies and address CVEs
Preferred Qualifications
- Experience instrumenting telemetry for distributed systems
- Strong background in GPU workloads on Kubernetes with experience writing custom Kubernetes controllers
- Deep understanding of KubeVirt and/or virtualization
- Experience with self-hosted GitHub Actions runners
- Contributions to open-source Kubernetes-related projects