Skip to content

Software Engineer – Fleet Health Instrumentation Intern
Company | NVIDIA |
---|
Location | Santa Clara, CA, USA |
---|
Salary | $18 – $71 |
---|
Type | Internship |
---|
Degrees | Bachelor’s, Master’s |
---|
Experience Level | Internship |
---|
Requirements
- Actively pursuing a BS or MS in Computer Science, Computer Engineering, or a closely related quantitative field (e.g., Physics or Mathematics)
- Solid understanding of distributed‑systems fundamentals, modern software‑engineering practices, and data‑modeling principles
- Proficiency in at least one programming language—preferably Python or Go
- Working knowledge of Linux, basic networking concepts, and Kubernetes container orchestration.
Responsibilities
- Design and build software that collects, transforms, and publishes health data about our global GPU fleet.
- Develop micro-services and data pipelines in Go or Python that ingest and normalize data from many diverse sources—routing millions of records per day (Kafka, Airflow, Kinesis).
- Instrument production infrastructure and workloads running on Kubernetes and bare-metal clusters; add tracing and metrics hooks for deeper insights.
- Automate deployments and testing with CI/CD (GitLab, Argo) and IaC (Terraform), ensuring repeatable, low-touch releases.
- Participate in the full lifecycle of cloud services—from design docs and code reviews through deployment, monitoring, and continuous improvement.
- Collaborate with other engineers to debug live issues and turn post-incident insights into durable code fixes.
- Contribute to internal tooling and dashboards that help engineers visualize fleet health, utilization, and capacity trends.
Preferred Qualifications
- A systematic, analytical problem‑solving approach paired with clear written and verbal communication skills and a strong sense of ownership.
- Demonstrated ability to debug, optimize, and automate code or workflows with minimal guidance.
- Hands‑on experience building, deploying, and operating services in a public‑cloud or large on‑prem environment.